How to download Fama French 3 factor Model data in R
In this post we will show you how to use R to download Fama French 3 factor model data from their website. You can find the data on their website at “https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html” We will select and download the Fama/French 3 factors monthly data.
Lets begin!
Since we will just be learning how to download the data and not perform any analysis, we won’t need to load any libraries.
First we will need to copy the link of the data that we are trying to download. If you visit the site’s home page, you should ‘copy link location’ by right clicking on the “csv” link. We have already done that. Below we will save that link to a an object call ff_url
ff_url <- "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip"
We can see that this is a zip file. We have the tools in R to open a zip file. But first we need to create a temporary object to store our file. In R we can create a tempfile()
where we can store files downloaded from the internet. So lets create that, we will simply call it temp_file.
Now that we have both the url and the temporary location to store it, we are ready to download the file. We will perform all the steps below.
Once we have downloaded the zip file we need to unzip it to extract the contents. To do that we will use the tidyverse()
package since we need it to read the .csv file.
library(tidyverse)
# Create temp_file to store the file
temp_file <- tempfile()
# Download the file
download.file(ff_url, temp_file)
# Unzip the file, to extract the data
ff_factors_raw_data <- unzip(temp_file)
# Read the contents using tidyverse package
ff_factors_raw_data <- read_csv(ff_factors_raw_data)
# Check the data
head(ff_factors_raw_data)
## # A tibble: 6 x 1
## `This file was created by CMPT_ME_BEME_RETS using the 201904 CRSP databa~
## <chr>
## 1 The 1-month TBill return is from Ibbotson and Associates
## 2 <NA>
## 3 192607
## 4 192608
## 5 192609
## 6 192610
Well that did not turn out nicely. We got the data but it doesn’t make any sense. This is happening because, the first few rows of the file have some unwanted(for our purpose anyways) information. By doing a bit of trial and error you will notice that you need to skip the first 3 rows to get our data. So lets do that next. We present the entire code again.
# We need to load the file again
ff_url <- "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip"
temp_file <- tempfile()
download.file(ff_url, temp_file)
ff_factors_raw_data <- unzip(temp_file)
# Skipping the first 3 rows
ff_factors_raw_data <- read_csv(ff_factors_raw_data, skip = 3)
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_integer(),
## `Mkt-RF` = col_double(),
## SMB = col_double(),
## HML = col_double(),
## RF = col_double()
## )
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 8 parsing failures.
## row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1115 X1 an integer Annual Factors: Januar~ './F-F_Research_Data_Fa~ file 2 1115 <NA> 5 columns 1 columns './F-F_Research_Data_Fa~ row 3 1116 Mkt-RF a double Mkt-RF './F-F_Research_Data_Fa~ col 4 1116 SMB a double SMB './F-F_Research_Data_Fa~ expected 5 1116 HML a double HML './F-F_Research_Data_Fa~
## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
## See problems(...) for more details.
head(ff_factors_raw_data)
## # A tibble: 6 x 5
## X1 `Mkt-RF` SMB HML RF
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 192607 2.96 -2.3 -2.87 0.22
## 2 192608 2.64 -1.4 4.19 0.25
## 3 192609 0.36 -1.32 0.01 0.23
## 4 192610 -3.24 0.04 0.51 0.32
## 5 192611 2.53 -0.2 -0.35 0.31
## 6 192612 2.62 -0.04 -0.02 0.28
In the warnings we can see that tidyverse failed to parse row 1116. This warning is important and we will need it for further analysis. Lets see the dimensions.
dim(ff_factors_raw_data)
## [1] 1209 5
Lets look at the tail of the data.
tail(ff_factors_raw_data)
## # A tibble: 6 x 5
## X1 `Mkt-RF` SMB HML RF
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 2014 11.7 -8.08 -1.64 0.02
## 2 2015 0.07 -4.05 -9.61 0.02
## 3 2016 13.3 6.6 22.9 0.2
## 4 2017 21.5 -4.77 -13.9 0.8
## 5 2018 -6.93 -3.32 -9.4 1.81
## 6 NA NA NA NA NA
Ah! We can see that the date format has changed. The reason for this is that Fama/French also includes the yearly data for the factors. Lets see where they are located, perhaps something is going on in rows 1116 onwards. Lets investigate.
ff_factors_raw_data[c(1110:1123),]
## # A tibble: 14 x 5
## X1 `Mkt-RF` SMB HML RF
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 201812 -9.55 -2.58 -1.51 0.19
## 2 201901 8.41 3.02 -0.6 0.21
## 3 201902 3.4 2.02 -2.84 0.18
## 4 201903 1.1 -3.15 -4.07 0.19
## 5 201904 3.96 -1.69 1.99 0.21
## 6 NA NA NA NA NA
## 7 NA NA NA NA NA
## 8 1927 29.5 -2.46 -3.75 3.12
## 9 1928 35.4 4.2 -6.15 3.56
## 10 1929 -19.5 -30.8 11.8 4.75
## 11 1930 -31.2 -5.13 -12.3 2.41
## 12 1931 -45.1 3.53 -14.3 1.07
## 13 1932 -9.39 4.67 10.2 0.96
## 14 1933 57.0 49.1 28.5 0.3
We can see that the date formate changes and there are many NA values in the data.
We will need to clean up this data, before we can use it for any analysis. This is the essence of any kind of data analysis. Most of the time is spent getting the data and wrangling it in correct form and then the analysis don’t take much time.
We will do it with Python next.