Methodology

How Bitcoin Weigh-In sources, validates, versions, and corrects its commodity price dataset. Companion to the dataset.

What this is

The Bitcoin Weigh-In dataset records daily closing prices in US dollars for a curated set of fungible commodities from 2013-01-02 to the most recent completed UTC day. From those closes it derives per-BTC equivalents (how many troy ounces of gold, pounds of copper, or barrels of crude one bitcoin could have purchased on each day) and pairs them with a deterministically computed BTC circulating supply. The artifact is a single small file — around 800 KB as CSV, 700 KB as Parquet — that any analyst, journalist, or hobbyist can download once and analyse offline without an API key.

This document describes how the data is collected, what the published flags mean, how cross-validation works, how versions are cut, and how to report corrections. The companion dataset page ships the artifacts; this page describes the rules behind them.

Data sources

Two providers between them cover every series in the live dataset. Each commodity is pinned to a single primary endpoint so the dataset has one parser, one rate-limit regime, and one place to look when something disagrees with the rest of the financial press.

Stooq

The primary source for BTC and for spot and continuous-front-month futures: BTC-USD (btcusd), gold (xauusd), silver (xagusd), platinum (xptusd), copper (hg.c), CBOT wheat (zw.c), and ICE coffee (kc.c). Symbols use the .c suffix for continuous contracts rather than .f per Stooq's published conventions. Stooq added an API-key requirement after the initial bootstrap; the daily job sends the key with each request, and a redacted form of every fetched URL is recorded in /health.json so an authentication failure surfaces clearly rather than presenting as silent forward-fill.

FRED (St. Louis Fed)

The primary source for Brent crude (DCOILBRENTEU). FRED redistributes the EIA spot price daily, typically with a one business-day lag. The daily job retries transient HTTP errors on a backoff and forward-fills if the value never arrives.

Derived (no API)

BTC circulating supply is computed in scripts/sources.ts as a pure function of days-since-genesis. Genesis is 2009-01-03; the protocol targets 144 blocks per day, the initial block reward is 50 BTC, and the reward halves every 210,000 blocks. The implementation walks halving eras and accumulates supply era-by-era. Because every input is a constant of the protocol, the column has no API dependency and is unit-tested against known halving block dates.

Forward-fill logic

Markets close on weekends and public holidays. Source endpoints occasionally drop a single day's row even on a normal trading session. In both cases the daily job carries the previous known value forward so that every calendar date from coverage start to last update has a row in the dataset. The decision to forward-fill rather than emit nulls is an honest one — analyses that join across commodities need a value for every date, and the alternative (per-commodity NaN) silently propagates into derived calculations.

v1.0 of the dataset ships a forward_filled column populated as empty string for every row, because per-row fill provenance is not reconstructable from the historical NDJSON that was bootstrapped before this column existed. Prospective per-row tracking begins in v1.1, at which point the column will hold a pipe-delimited list of the column names that were forward-filled on that date — for example xpt_usd|brent_usd on a typical US-market holiday where the Stooq feeds returned values but the FRED Brent series and Stooq XPT had not yet published.

The daily cron writes a separate fill record per source into /health.json on every run, so the present day's fill state is always visible even before per-row tracking lands. The cron also exits non-zero if every source returns zero rows on a UTC weekday — a signal that authentication, rate limits, or upstream infrastructure has changed, rather than silent fill propagating an undetected outage.

BTC supply derivation

The btc_supply column is deterministic. For a date D:

  1. Compute days since the genesis block at 2009-01-03.
  2. Multiply by 144 (the protocol's target blocks per day) to get an approximate cumulative block count.
  3. Walk halving eras of 210,000 blocks: era 1 pays 50 BTC per block, era 2 pays 25, era 3 pays 12.5, era 4 pays 6.25, era 5 pays 3.125, and so on. For each era, add min(era_end, total_blocks) − blocks_so_far times the era's reward.
  4. Round to an integer count of BTC.

The approximation drifts a few thousand BTC from reality (real interblock times vary around the 10-minute target, and mining hashrate growth nudges blocks slightly faster than schedule), but the error is small enough — under 0.1% across the full coverage range — that the column is fit for the visualisation's purpose: showing where on the supply curve any given date sits. Analyses that need block-exact supply should pull from a node or a block explorer; this dataset's column is a clean closed-form schedule.

Illustrative pricing

Two of the four commodities rendered in the visualisation — Plutonium-238 and cocaine — do not have public spot markets. Their prices on the site are illustrative composites constructed from named sources, with the as-of date carried alongside. They appear on the main visualisation but they are not in the live dataset published under /data, which holds only live market closes. A third commodity, the LEU uranium fuel pellet, follows the same pattern but is currently deferred from the visualisation; its illustrative price record persists in the repository for later re-enable.

Plutonium-238

Composite material-cost estimate of ~$5,000/g (midpoint of a $4,000–$8,000 range) derived from the DOE Office of Nuclear Energy, NASA Planetary Science Division publications on the Pu-238 production program (~$150M/year for ~1.5 kg/year), the Cassini OIG report from 1997 ($1,968/g escalated to 2024 dollars), and Atomic Insights' analysis of RTG heat sources. A separately cited fully-loaded program cost (~$100,000/g) reflects the facility maintenance and regulatory infrastructure required for production but is less directly comparable to other commodities' market prices, so the material-cost figure drives the BTC equivalence on the visualisation. Uncertainty bounds: roughly ±60% around the midpoint at the material-cost layer. As-of date: 2024-12-31.

LEU uranium fuel pellet

Composite cost of ~$20 per 7 g pellet from the World Nuclear Association "Economics of Nuclear Power" methodology, cross-checked against the IAEA/OECD-NEA Red Book 2024. Decomposes as: U₃O₈ feed at ~$100/lb, conversion to UF₆ at ~$20/kgU, enrichment at ~$150/SWU, fabrication at ~$300/kgU, yielding ~$3,000/kgU of finished fuel; divided by 7 g/pellet ≈ $20/pellet. Uncertainty bounds: ±30% by contract terms, enrichment level, and market conditions. As-of date: 2025-01-01.

Cocaine (three-tier)

There is no spot market for cocaine. The composite presents three tiers reflecting the market's actual structure: producer (~$2,500/kg, range $1,500–$3,500, raw refined base, UNODC World Drug Report 2024); wholesale (~$30,000/kg, range $25,000–$35,000, ≥80% pure US wholesale standard, UNODC 2024 / DEA NDTA 2024); and retail purity-adjusted (~$120,000/kg, range $80,000–$250,000, normalised to 100% for cross-tier comparison, DEA / EMCDDA). Wholesale is the primary tier for BTC equivalence because it is the most directly comparable to how other commodities are priced (standardised purity, kilogram-scale transactions). As-of date: 2024-12-31.

Cross-validation

After the primary stooq and FRED fetches complete, the daily job queries a secondary source — Massive — for the same day's close on BTC-USD, XAU-USD, XAG-USD, and (where available) XPT-USD. For each ticker where both providers return a value, the job computes the absolute percent difference. When the difference exceeds 0.5%, an entry is appended to a cross_validation_flags array in /health.json recording the date, ticker, both values, and the percent diff.

The cross-validation step is a quality signal, not a build gate. It does not fail the daily cron — a missing API key, an HTTP error, a parse failure, or a Massive ticker that doesn't exist all produce a "skipped" status without emitting a flag. This is deliberate: a secondary-source disagreement is information for an analyst, not an infrastructure outage that should block publication of the primary feed. Tickers Massive doesn't cover (continuous futures, FRED-only series like Brent) are skipped silently.

Versioning and updates

The dataset uses semantic versioning for schema changes: a major bump for removed or renamed columns, a minor bump for added columns or sources, and a patch for fixes that preserve the schema. The current version is pinned in dataset-config.json at the repository root; the artifact builder uses that value to decide which static/data/v{X.Y}/ directory to write to. Bumping the version is a manual one-line edit committed by the maintainer.

Daily updates happen at 02:00 UTC. A GitHub Actions cron fetches the previous UTC day's close from every source, appends a row to data/prices.ndjson, rebuilds static/prices.json, regenerates every artifact under static/data/v{X.Y}/, and commits the result to main. Cloudflare Pages redeploys automatically from the commit. The latest aliases at /data/prices.csv, /data/prices.json, and similar always point to the current version's artifacts; the versioned directory at /data/v{X.Y}/ persists indefinitely so prior versions remain downloadable.

Archival to Zenodo is triggered manually by cutting a GitHub release tag, at which point Zenodo's GitHub integration mints a DOI and archives the source tarball. The DOI is copied back into dataset-config.json and the next build surfaces it on the dataset page. Release cadence is keyed to schema-meaningful changes rather than the daily content updates, which keeps DOIs sparse and citable.

Corrections

To report a suspected error, email info@sortathing.com with the affected date(s) and column(s), the value the dataset shows, and where the corrected value should come from with a link. Corrections that affect a single row land in the next daily commit; corrections that affect the schema or a historical methodology trigger a minor version bump and a CHANGELOG entry. Either way, the original row stays in git history — the dataset is the current best truth, but the prior shape remains inspectable in the commit log.