Use the 'ffdownload'-package to download Fama-French datasets in R
Get FFdownload from CRAN or https://github.com/sstoeckl/ffdownload| Project Status | CRAN Status | CRAN downloads | Lifecycle | Website |
|---|---|---|---|---|
Literally tens of thousands of papers use and cite data from Kenneth French’s famous data library providing academia with US and international Asset Pricing factors and portfolios. However, due to their composition, the CSV files on the website are tedious to import and usually require a lot of manual labor. This prohibits researchers from all over the world to automatically update and use these files.
For this purpose, many years ago I have commissioned the initial files
of a package that has now (with much additional work from my side)
become FFdownload and is available on
CRAN as well as my
github repository.
We install either the official release or the development version using
install("ffdownload")
# or development version
devtools::install_github("sstoeckl/ffdownload")
As there are many different files such as monthly files that additionally contain annual data as well as daily and sometimes even weekly files, the algorithm needs very clear specifications which I will detail in the next subsection:
Downloading one ore more specific datasets
In this case, we download the Fama and French
(1992), Fama and French
(1993) 3-Factor-Dataset, process it
(automatically) and plot the resulting factors. To do this, we use the
optional argument listinput specifying ‘F-F_Research_Data_Factors’
and consequently only downloading and processing this specific dataset.
The FFdownload() function thereby takes the following arguments:
output_filename of the .RData file to be saved (include path if necessary)tempdirspecify if you want to save downloaded files at a specific location. Necessary for reproducible research as the files on the website do change from time to timeexclude_dailyexcludes the daily datasets (are not downloaded) ==> speeds the process up considerablydownloadset to TRUE if you actually want to do the download again (e.g. you want to update data). set to false and specifytempdirto keep processing the already downloaded filesdownload_onlyset to FALSE if you want to process all your downloaded files at oncelistsaveif not NULL, the list of unzipped files is saved here (good for processing only a limited number of files throughinputlist). Is written beforeinputlistis processedinputlistif not NULL, FFdownload tries to match the names from the list with the list of downloadable files (zipped CSV) on the website
library(FFdownload)
tempf <- tempfile(fileext = ".RData")
inputlist <- c("F-F_Research_Data_Factors")
FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE)
load(tempf)
fig <- exp(cumsum(FFdata$`x_F-F_Research_Data_Factors`$monthly$Temp2["1960-01-01/",c("Mkt.RF","SMB","HML")]/100))
plotFF <- plot(fig[,"Mkt.RF"],main="FF 3 Factors",major.ticks = "years",format.labels="%Y",col="black",lwd=2,lty=1,cex=0.8)
plotFF <- lines(fig[,"SMB"],on=NA,main="Size",col="darkgreen",lwd=2,lty=1,ylim=c(0,5),cex=0.8)
plotFF <- lines(fig[,"HML"],on=NA,main="Value",col="darkred",lwd=2,lty=1,ylim=c(0,15),cex=0.8)
plotFF

We could also add momentum (Carhart
1997) and the additional two factors of
the Fama and French (2014) 5-factor model
by additionally specifying ‘F-F_Momentum_Factor’,
‘F-F_ST_Reversal_Factor’ and ‘F-F_LT_Reversal_Factor’. We do this
and make use of the ggplot package to create another plot.
library(tidyverse);library(timetk)
tempf <- tempfile(fileext = ".RData")
inputlist <- c('F-F_Research_Data_Factors','F-F_Momentum_Factor', 'F-F_ST_Reversal_Factor', 'F-F_LT_Reversal_Factor')
FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE)
load(tempf)
FFfive <- FFdata$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date") %>%
left_join(FFdata$`x_F-F_Momentum_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
left_join(FFdata$`x_F-F_ST_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
left_join(FFdata$`x_F-F_LT_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>%
pivot_longer(Mkt.RF:LT_Rev,names_to="FFVar",values_to="FFret") %>% mutate(FFret=FFret/100,date=as.Date(date))
FFfive %>% filter(date>="1960-01-01",!FFVar=="RF") %>% group_by(FFVar) %>% arrange(FFVar,date) %>%
mutate(FFret=ifelse(date=="1960-01-01",1,FFret),FFretv=cumprod(1+FFret)-1) %>%
ggplot(aes(x=date,y=FFretv,col=FFVar,type=FFVar)) + geom_line(lwd=1.2) + scale_y_log10() +
labs(title="FF5 Factors plus Momentum", subtitle="Cumulative wealth plots",ylab="cum. returns") +
scale_colour_viridis_d("FFvar") +
theme_bw() + theme(legend.position="bottom")

If you want a Snapshot of all the files saved on your hard drive (before
they change again) I recommend specifying a permanent tempdir where
the downloaded files will not be deleted on restart. Also, if you have
already downloaded a Snapshot of the data without processing
(download=TRUE and download_only=TRUE), you can post-process without
re-downloading by setting download=FALSE and download_only=FALSE.
listsave to a specific location and keep download=FALSE as
well as download_only=TRUE.References
Carhart, Mark M. 1997. “On Persistence in Mutual Fund Performance.” The Journal of Finance 52 (1): 57–82. https://doi.org/10.2307/2329556.
Fama, Eugene F., and Kenneth R. French. 1992. “The Cross-Section of Expected Stock Returns.” The Journal of Finance 47 (2): 427–65. https://doi.org/10.1111/j.1540-6261.1992.tb04398.x.
———. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics 33 (1): 3–56. https://doi.org/10.1016/0304-405X(93)90023-5.
———. 2014. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics. https://doi.org/10.1016/j.jfineco.2014.10.010.