Skip to contents

Why a second source?

crypto2 was built around CoinMarketCap (CMC). The cg_* functions are a second, independent source that returns tibbles with the same column conventions as the CMC functions, so research code that already consumes a crypto_* tibble works on a cg_* tibble too.

Three concrete reasons to bother with a second source:

  • Triangulation. If a factor signal disagrees between CMC and CG, treat the disagreement as informative on its own. Most schema / data-quality regressions show up first as a cross-source delta. The vignette cg-vs-cmc shows the dedicated reconciliation workflow.
  • Independence. CMC and CG are owned and operated separately, so policy changes on one side do not affect the other.
  • Universe completeness. CG exposes a separate (and partially non-overlapping) set of delisted coins to CMC, so combining the two universes captures more of the historic cross-section than either one alone.

This vignette focuses on how to actually pull a complete history out of CG for asset-pricing research.

Build a survivorship-bias-free price history (free, no key)

The end-to-end recipe is three lines of code. It produces a daily panel of (slug, date, close, volume, market_cap) for every coin CoinGecko has ever tracked – active and delisted – back to each coin’s listing date.

library(crypto2)
library(arrow)

# 1. Full historic universe: active + delisted, via cg_id_mapping()
universe <- cg_list(only_active = FALSE)

# 2. Daily close / volume / market cap, full lifetime per coin.
#    Skip OHLC here -- it adds a 3rd HTTP call per coin and is the only
#    free-tier-capped stream (see "What is NOT in the free tier" below).
options(crypto2.cg_what = c("price", "market_cap"))
hist <- cg_history(universe)

# 3. Persist
arrow::write_parquet(hist, "data/cg_history.parquet")

Output shape (hist): columns match crypto_history() exactly – id, slug, name, symbol, timestamp, ref_cur_id, ref_cur_name, open, high, low, close, volume, market_cap, time_open, .... Under the default date_convention = "end_of_day", dates are labelled with CMC’s convention so close[X] / close[X-1] - 1 is the return earned during date X (see vignette("cg-vs-cmc") for the date-convention story).

Preconditions

  • Run from a workstation / local machine. CoinGecko serves the full historic backfill freely, but its bot filtering refuses requests from some cloud / VPS environments. If cg_history() prints the one-time message “CoinGecko refused the request from this environment”, the recipe above will not complete on your host. Workarounds:
    1. run the bootstrap on a laptop, ship the parquet to the server;
    2. use the one-shot Pro recipes in vignette("coingecko-pro-backfill").
  • The historic mapping must be reachable. cg_list(only_active = FALSE) calls cg_id_mapping() to download the slug / numeric-id / symbol / name archive of delisted coins. The mapping is cached after the first call; if the download itself is blocked, only the bundled fallback (~20 reference coins) is used and you will see a yellow “using bundled sample” message.

What you get back

Column Coverage on free tier
close full lifetime of each coin (daily)
volume full lifetime of each coin (daily)
market_cap full lifetime of each coin (daily)
open, high, low only the most recent 365 days; older rows have NA here

For complete OHLC over the full history (microstructure work, candlestick-based signals, intraday volatility models), see the Pro recipes in vignette("coingecko-pro-backfill").

Function reference

All four exported cg_* functions accept the same arguments as their crypto_* counterparts. Arguments without a CG equivalent (e.g. add_untracked, requestLimit, single_id) are kept for parity and silently ignored. Arguments where CG is more restrictive (e.g. which = "historical" in cg_listings()) emit a one-line warning and coerce to the supported mode.

Purpose CMC CoinGecko
Coin universe crypto_list() cg_list()
Current snapshot crypto_listings() cg_listings()
Daily history crypto_history() cg_history()
Per-coin metadata crypto_info() cg_info()

cg_list() – the universe

universe       <- cg_list()                       # active coins only
universe_full  <- cg_list(only_active = FALSE)    # + historic mapping

only_active = FALSE is the survivorship-bias-corrected universe: the output is cg_list()’s active rows plus the historic-only rows from cg_id_mapping(). A single one-line message reports the mapping’s harvest date: “Historic data retrieval is current until YYYY-MM-DD”.

cg_listings() – current cross-section

snap <- cg_listings(which = "latest", quote = TRUE, limit = 1000)

which = "historical" and which = "new" warn and coerce to "latest" – CG’s free tier does not expose the historical cross-section in a single call. To build your own cross-section history on the free tier, snapshot cg_listings() periodically (cron) and accumulate the parquet output:

arrow::write_dataset(
  cg_listings(which = "latest", quote = TRUE),
  path        = "data/cg_listings",
  partitioning = "harvested_at"
)

cg_history() – the workhorse

Covered in the recipe at the top of this vignette. Key knobs:

  • start_date / end_date – date window (client-side filter).
  • options(crypto2.cg_what = ...) – restrict to c("price", "market_cap") to skip OHLC and save one HTTP call per coin.
  • date_convention = c("end_of_day", "raw") – default "end_of_day" aligns dates with CMC; see vignette("cg-vs-cmc").

cg_info() – per-coin metadata

info <- cg_info(cg_list()[1:10, ])

Description, categories, contract addresses across chains, and various link fields. Same column conventions as crypto_info().

What is NOT in the free tier

The free tier covers every cell needed for daily asset-pricing work except the older end of the OHLC quartet:

  • OHLC (open / high / low) older than ~365 days. Close is fine (returned from the price stream), volume and market cap are fine, but the three intra-day extreme columns come back NA for any date more than a year old. For a complete backfill, run the Pro recipes in vignette("coingecko-pro-backfill") once – the recipes are kept inline in that vignette rather than exported from the package, so the package itself stays key-less.

Cross-checking against CMC

Triangulation is one click away once you have the parquet from the recipe above. The dedicated vignette cg-vs-cmc walks through:

  • the date-convention difference between CMC and CG and how crypto2 harmonizes it;
  • a worked BTC reconciliation showing typical agreement < 0.05% per day;
  • which fields are expected to agree exactly vs. which ones genuinely differ between providers (volume disagrees more than price – they aggregate over different exchange sets).

A live cross-source test (tests/testthat/test-cg-vs-cmc.R) runs in CI and will fail loudly if either provider drifts.