Crypto Research Data from Coin-Market Cap without Survivorship Bias

Photo by André François McKenzie on Unsplash

In the past two years a ever-growing number of academic researchers has been researching the market for crypto currencies (CC), many often concentrating on the few largest ones (Brauneis and Mestel 2018a; Bouri, Gupta, and Roubaud 2018; Corbet et al. 2018). A notable exception is Brauneis and Mestel (2018b) who derive mean-variance portfolios taking the 20 most liquid crypto currencies from the 500 largest crypto currencies on coinmarketcap.com.

However, I think, that using the (ex-post) largest or most-liquid crypto currencies often introduces some survivorship bias into the data. That might explain the often stunning outperformance of cryptocurrencies over traditional assets. Before we can think about how to remedy this fact and how to introduce correct delisting returns (Shumway 1997), we have to download a dataset that includes historically crypto currencies that are not traded any more.

Relying on two of the more popular R-packages crypto by JesseVent and cryptor by James Blair one finds very good data sources for getting (historical) data (in OHLC format) for currencies that are currently listed somewhere. However, none of those packages allows to easily retrieve currencies that were listed historically. All this data (and much more) is available on cmc via their restful api.

I have therefore extended Jesse Vents crypto package to crypto2 to download information on all coins that were at any time listed on CMC (crypto_list()). For any desired subsets of those CCs we can retrieve historical timeseries (crypto_history()) as well as additional information (crypto_info()).

devtools::install_github("sstoeckl/crypto2")

Now, let us retrieve a list of all (active and inactive) crypto currencies ever listed by CMC:

pacman::p_load(tidyverse,crypto2)
coin_list <- crypto_list(only_active = FALSE)
coin_list %>% head()
## # A tibble: 6 x 8
##      id name   symbol slug    rank is_active first_historical_~ last_historical~
##   <int> <chr>  <chr>  <chr>  <int>     <int> <date>             <date>          
## 1     1 Bitco~ BTC    bitco~     1         1 2013-04-28         2020-12-10      
## 2     2 Litec~ LTC    litec~     5         1 2013-04-28         2020-12-10      
## 3     3 Namec~ NMC    namec~   589         1 2013-04-28         2020-12-10      
## 4     4 Terra~ TRC    terra~  1327         1 2013-04-28         2020-12-10      
## 5     5 Peerc~ PPC    peerc~   665         1 2013-04-28         2020-12-10      
## 6     6 Novac~ NVC    novac~  1343         1 2013-04-28         2020-12-10

Then we download additional information on the first three CCs from CMC:

coin_info <- crypto_info(slugs = coin_list$slug[1:3])
## > Scraping crypto info
## 
## > Processing historical crypto data
## 
coin_info %>% head()
## # A tibble: 3 x 19
##      id name  symbol category description slug  logo  subreddit notice
##   <int> <chr> <chr>  <chr>    <chr>       <chr> <chr> <chr>     <chr> 
## 1     1 Bitc~ BTC    coin     "## **What~ bitc~ http~ bitcoin   ""    
## 2     2 Lite~ LTC    coin     "### What ~ lite~ http~ litecoin  ""    
## 3     3 Name~ NMC    coin     "Namecoin ~ name~ http~ namecoin  ""    
## # ... with 10 more variables: date_added <chr>, twitter_username <chr>,
## #   is_hidden <int>, date_launched <lgl>,
## #   self_reported_circulating_supply <lgl>, self_reported_tags <lgl>,
## #   status <dttm>, tags <list>, urls <list>, platform <list>

Finally, we download historical OHLC-data from CMC. In this case, we only select data from 2015.

coins_2015 <- crypto_history(coin_list = coin_list[1:3,], start_date = "20150101", end_date = "20151231")
## > Scraping historical crypto data
## 
## > Processing historical crypto data
## 
coins_2015 %>% filter(as.Date(timestamp)<="2015-01-02")
## # A tibble: 6 x 15
##   timestamp           slug     id name  symbol    open    high     low   close
##   <dttm>              <chr> <int> <chr> <chr>    <dbl>   <dbl>   <dbl>   <dbl>
## 1 2015-01-01 23:59:59 bitc~     1 Bitc~ BTC    320.    320.    314.    314.   
## 2 2015-01-02 23:59:59 bitc~     1 Bitc~ BTC    314.    316.    314.    315.   
## 3 2015-01-01 23:59:59 lite~     2 Lite~ LTC      2.72    2.72    2.69    2.70 
## 4 2015-01-02 23:59:59 lite~     2 Lite~ LTC      2.70    2.70    2.66    2.67 
## 5 2015-01-01 23:59:59 name~     3 Name~ NMC      0.716   0.716   0.703   0.704
## 6 2015-01-02 23:59:59 name~     3 Name~ NMC      0.705   0.743   0.701   0.726
## # ... with 6 more variables: volume <dbl>, market_cap <dbl>, time_open <dttm>,
## #   time_close <dttm>, time_high <dttm>, time_low <dttm>

I hope this package is useful to everyone who looks for a survivorship-bias-free (historical) dataset of crypto currencies!

References

Bouri, Elie, Rangan Gupta, and David Roubaud. 2018. “Herding Behaviour in Cryptocurrencies.” Finance Research Letters, July. https://doi.org/10.1016/j.frl.2018.07.008.

Brauneis, Alexander, and Roland Mestel. 2018a. “Price Discovery of Cryptocurrencies: Bitcoin and Beyond.” Economics Letters 165: 58–61. https://doi.org/10.1016/j.econlet.2018.02.001.

———. 2018b. “Cryptocurrency-Portfolios in a Mean-Variance Framework.” Finance Research Letters, June. https://doi.org/10.1016/j.frl.2018.05.008.

Corbet, Shaen, Andrew Meegan, Charles Larkin, Brian Lucey, and Larisa Yarovaya. 2018. “Exploring the Dynamic Relationships Between Cryptocurrencies and Other Financial Assets.” Economics Letters 165 (April): 28–34. https://doi.org/10.1016/j.econlet.2018.01.004.

Shumway, Tyler. 1997. “The Delisting Bias in CRSP Data.” The Journal of Finance 52 (1): 327–40. https://doi.org/10.2307/2329566.

Related