Crypto Research Data from Coin-Market Cap without Survivorship Bias

In the past two years a ever-growing number of academic researchers has been researching the market for crypto currencies (CC), many often concentrating on the few largest ones (Brauneis and Mestel 2018a; Bouri, Gupta, and Roubaud 2018; Corbet et al. 2018). A notable exception is Brauneis and Mestel (2018b) who derive mean-variance portfolios taking the 20 most liquid crypto currencies from the 500 largest crypto currencies on coinmarketcap.com.
However, I think, that using the (ex-post) largest or most-liquid crypto currencies often introduces some survivorship bias into the data. That might explain the often stunning outperformance of cryptocurrencies over traditional assets. Before we can think about how to remedy this fact and how to introduce correct delisting returns (Shumway 1997), we have to download a dataset that includes historically crypto currencies that are not traded any more.
Relying on two of the more popular R-packages crypto by JesseVent and cryptor by James Blair one finds very good data sources for getting (historical) data (in OHLC format) for currencies that are currently listed somewhere. However, none of those packages allows to easily retrieve currencies that were listed historically. All this data (and much more) is available on cmc via their restful api.
I have therefore extended Jesse Vents crypto
package to crypto2 to download information on all coins that were at any time listed on CMC (crypto_list()
). For any desired subsets of those CCs we can retrieve historical timeseries (crypto_history()
) as well as additional information (crypto_info()
).
devtools::install_github("sstoeckl/crypto2")
Now, let us retrieve a list of all (active and inactive) crypto currencies ever listed by CMC:
pacman::p_load(tidyverse,crypto2)
coin_list <- crypto_list(only_active = FALSE)
coin_list %>% head()
## # A tibble: 6 x 8
## id name symbol slug rank is_active first_historical_~ last_historical~
## <int> <chr> <chr> <chr> <int> <int> <date> <date>
## 1 1 Bitco~ BTC bitco~ 1 1 2013-04-28 2020-12-10
## 2 2 Litec~ LTC litec~ 5 1 2013-04-28 2020-12-10
## 3 3 Namec~ NMC namec~ 589 1 2013-04-28 2020-12-10
## 4 4 Terra~ TRC terra~ 1327 1 2013-04-28 2020-12-10
## 5 5 Peerc~ PPC peerc~ 665 1 2013-04-28 2020-12-10
## 6 6 Novac~ NVC novac~ 1343 1 2013-04-28 2020-12-10
Then we download additional information on the first three CCs from CMC:
coin_info <- crypto_info(slugs = coin_list$slug[1:3])
## > Scraping crypto info
##
## > Processing historical crypto data
##
coin_info %>% head()
## # A tibble: 3 x 19
## id name symbol category description slug logo subreddit notice
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Bitc~ BTC coin "## **What~ bitc~ http~ bitcoin ""
## 2 2 Lite~ LTC coin "### What ~ lite~ http~ litecoin ""
## 3 3 Name~ NMC coin "Namecoin ~ name~ http~ namecoin ""
## # ... with 10 more variables: date_added <chr>, twitter_username <chr>,
## # is_hidden <int>, date_launched <lgl>,
## # self_reported_circulating_supply <lgl>, self_reported_tags <lgl>,
## # status <dttm>, tags <list>, urls <list>, platform <list>
Finally, we download historical OHLC-data from CMC. In this case, we only select data from 2015.
coins_2015 <- crypto_history(coin_list = coin_list[1:3,], start_date = "20150101", end_date = "20151231")
## > Scraping historical crypto data
##
## > Processing historical crypto data
##
coins_2015 %>% filter(as.Date(timestamp)<="2015-01-02")
## # A tibble: 6 x 15
## timestamp slug id name symbol open high low close
## <dttm> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2015-01-01 23:59:59 bitc~ 1 Bitc~ BTC 320. 320. 314. 314.
## 2 2015-01-02 23:59:59 bitc~ 1 Bitc~ BTC 314. 316. 314. 315.
## 3 2015-01-01 23:59:59 lite~ 2 Lite~ LTC 2.72 2.72 2.69 2.70
## 4 2015-01-02 23:59:59 lite~ 2 Lite~ LTC 2.70 2.70 2.66 2.67
## 5 2015-01-01 23:59:59 name~ 3 Name~ NMC 0.716 0.716 0.703 0.704
## 6 2015-01-02 23:59:59 name~ 3 Name~ NMC 0.705 0.743 0.701 0.726
## # ... with 6 more variables: volume <dbl>, market_cap <dbl>, time_open <dttm>,
## # time_close <dttm>, time_high <dttm>, time_low <dttm>
I hope this package is useful to everyone who looks for a survivorship-bias-free (historical) dataset of crypto currencies!
References
Bouri, Elie, Rangan Gupta, and David Roubaud. 2018. “Herding Behaviour in Cryptocurrencies.” Finance Research Letters, July. https://doi.org/10.1016/j.frl.2018.07.008.
Brauneis, Alexander, and Roland Mestel. 2018a. “Price Discovery of Cryptocurrencies: Bitcoin and Beyond.” Economics Letters 165: 58–61. https://doi.org/10.1016/j.econlet.2018.02.001.
———. 2018b. “Cryptocurrency-Portfolios in a Mean-Variance Framework.” Finance Research Letters, June. https://doi.org/10.1016/j.frl.2018.05.008.
Corbet, Shaen, Andrew Meegan, Charles Larkin, Brian Lucey, and Larisa Yarovaya. 2018. “Exploring the Dynamic Relationships Between Cryptocurrencies and Other Financial Assets.” Economics Letters 165 (April): 28–34. https://doi.org/10.1016/j.econlet.2018.01.004.
Shumway, Tyler. 1997. “The Delisting Bias in CRSP Data.” The Journal of Finance 52 (1): 327–40. https://doi.org/10.2307/2329566.