Use the 'ffdownload'-package to download Fama-French datasets in R
Literally tens of thousands of papers use and cite data from Kenneth French’s famous data library providing academia with US and international Asset Pricing factors and portfolios. However, due to their composition, the CSV files on the website are tedious to import and usually require a lot of manual labor. This prohibits researchers from all over the world to automatically update and use these files.
For this purpose, many years ago I have commissioned the initial files of a package that has now (with much additional work from my side) become
ffdownload and is available on CRAN as well as my github repository.
We install either the official release or the development version using
install("ffdownload") # or development version devtools::install_github("sstoeckl/ffdownload")
As there are many different files such as monthly files that additionally contain annual data as well as daily and sometimes even weekly files, the algorithm needs very clear specifications which I will detail in the next subsection:
Downloading one ore more specific datasets
In this case, we download the Fama and French (1992), Fama and French (1993) 3-Factor-Dataset, process it (automatically) and plot the resulting factors. To do this, we use the optional argument
listinput specifying ‘F-F_Research_Data_Factors’ and consequently only downloading and processing this specific dataset. The
FFdownload() function thereby takes the following arguments:
output_filename of the .RData file to be saved (include path if necessary)
tempdirspecify if you want to save downloaded files at a specific location. Necessary for reproducible research as the files on the website do change from time to time
exclude_dailyexcludes the daily datasets (are not downloaded) ==> speeds the process up considerably
downloadset to TRUE if you actually want to do the download again (e.g. you want to update data). set to false and specify
tempdirto keep processing the already downloaded files
download_onlyset to FALSE if you want to process all your downloaded files at once
listsaveif not NULL, the list of unzipped files is saved here (good for processing only a limited number of files through
inputlist). Is written before
inputlistif not NULL, FFdownload tries to match the names from the list with the list of downloadable files (zipped CSV) on the website
library(FFdownload) tempf <- tempfile(fileext = ".RData") inputlist <- c("F-F_Research_Data_Factors") FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE) load(tempf) fig <- exp(cumsum(FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2["1960-01-01/",c("Mkt.RF","SMB","HML")]/100)) plotFF <- plot(fig[,"Mkt.RF"],main="FF 3 Factors",major.ticks = "years",format.labels="%Y",col="black",lwd=2,lty=1,cex=0.8) plotFF <- lines(fig[,"SMB"],on=NA,main="Size",col="darkgreen",lwd=2,lty=1,ylim=c(0,5),cex=0.8) plotFF <- lines(fig[,"HML"],on=NA,main="Value",col="darkred",lwd=2,lty=1,ylim=c(0,15),cex=0.8) plotFF
We could also add momentum (Carhart 1997) and the additional two factors of the Fama and French (2014) 5-factor model by additionally specifying ‘F-F_Momentum_Factor’, ‘F-F_ST_Reversal_Factor’ and ‘F-F_LT_Reversal_Factor’. We do this and make use of the
ggplot package to create another plot.
library(tidyverse);library(timetk) tempf <- tempfile(fileext = ".RData") inputlist <- c('F-F_Research_Data_Factors','F-F_Momentum_Factor', 'F-F_ST_Reversal_Factor', 'F-F_LT_Reversal_Factor') FFdownload(output_file = tempf, inputlist=inputlist, exclude_daily = TRUE, download = TRUE, download_only=FALSE) load(tempf) FFfive <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date") %>% left_join(FFdownload$`x_F-F_Momentum_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>% left_join(FFdownload$`x_F-F_ST_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>% left_join(FFdownload$`x_F-F_LT_Reversal_Factor`$monthly$Temp2 %>% timetk::tk_tbl(rename_index = "date"),by="date") %>% pivot_longer(Mkt.RF:LT_Rev,names_to="FFVar",values_to="FFret") %>% mutate(FFret=FFret/100,date=as.Date(date)) FFfive %>% filter(date>="1960-01-01",!FFVar=="RF") %>% group_by(FFVar) %>% arrange(FFVar,date) %>% mutate(FFret=ifelse(date=="1960-01-01",1,FFret),FFretv=cumprod(1+FFret)-1) %>% ggplot(aes(x=date,y=FFretv,col=FFVar,type=FFVar)) + geom_line(lwd=1.2) + scale_y_log10() + labs(title="FF5 Factors plus Momentum", subtitle="Cumulative wealth plots",ylab="cum. returns") + scale_colour_viridis_d("FFvar") + theme_bw() + theme(legend.position="bottom")
If you want a Snapshot of all the files saved on your hard drive (before they change again) I recommend specifying a permanent
tempdir where the downloaded files will not be deleted on restart. Also, if you have already downloaded a Snapshot of the data without processing (
download_only=TRUE), you can post-process without re-downloading by setting
listsaveto a specific location and keep
download=FALSEas well as
Carhart, Mark M. 1997. “On Persistence in Mutual Fund Performance.” The Journal of Finance 52 (1): 57–82. https://doi.org/10.2307/2329556.
Fama, Eugene F., and Kenneth R. French. 1992. “The Cross-Section of Expected Stock Returns.” The Journal of Finance 47 (2): 427–65. https://doi.org/10.1111/j.1540-6261.1992.tb04398.x.
———. 1993. “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics 33 (1): 3–56. https://doi.org/10.1016/0304-405X(93)90023-5.
———. 2014. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics. https://doi.org/10.1016/j.jfineco.2014.10.010.