METHODOLOGY

How We Track Predictions

Our process for sourcing, scoring, resolving, and auditing AI predictions. Updated as the system evolves.

What We Track

The Prediction Tracker covers three layers:

Scoreboard (Layer 1)

Public predictions from named individuals — researchers, lab leaders, independent voices. These are real claims made in interviews, blog posts, talks, or social media. We track the specific wording, source, and date.

Market Tracker (Layer 2)

Live probability data from prediction markets — Metaculus, Manifold, and Polymarket. We poll daily and store snapshots to track how consensus shifts over time.

Original Predictions (Layer 3)

Our own predictions, grounded in the Precondition Density Model. Every original prediction includes a reasoning chain showing which preconditions are met and which are bottlenecks. Coming soon.

How We Source Predictions

We look for specific, falsifiable claims with a clear timeframe. Vague statements like "AI will change the world" don't qualify. A prediction needs:

  • A specific claim — what exactly will happen
  • A timeframe — when it will happen (or a range)
  • A verifiable source — where the prediction was made
  • Attribution — who made it

Sources include published interviews, conference talks, blog posts, research papers, tweets, and podcast appearances. We link to the primary source wherever possible.

Confidence Values

Confidence represents how certain the predictor appears to be, on a 0-100% scale.

Predictor gives an explicit probabilityUse directly

"I'm confident" / "I believe" / "likely"70-85%

"Could happen" / "possible" / "plausible"40-60%

Strong hedging / "unlikely but"10-30%

No probability language availableLeft blank

These assignments are editorial judgments, not precise measurements. We err toward leaving confidence blank rather than guessing. When we assign a value, the reasoning is documented internally.

How We Resolve Predictions

Predictions are reviewed weekly by an editorial council. A prediction is eligible for resolution when its target date has passed or when clear evidence emerges.

CORRECT

The specific claim came true, supported by multiple credible sources.


WRONG

The deadline passed and the claim did not come true, or strong evidence contradicts it.


PARTIAL

Some aspects came true but the core claim was overstated or only partially fulfilled.


EXPIRED

Deadline passed with insufficient evidence either way.


WITHDRAWN

The predictor publicly walked back the claim.

Resolution requires council consensus. When in doubt, we leave predictions as pending. We are strict about the specific wording of the claim — a loose interpretation that "kind of" came true does not count as correct.

Market Data

We poll three prediction market platforms daily:

  • Metaculus — community forecasting platform, strong on scientific and technical questions
  • Manifold — play-money prediction market with broad coverage of AI topics
  • Polymarket — real-money prediction market, strongest signal for high-profile questions

Market probabilities represent crowd consensus, not our editorial view. We store daily snapshots to track how forecasts shift over time. Platform disagreements on the same question are noted when relevant.

Accuracy Scoring

The scoreboard ranks predictors by two metrics:

Accuracy

Percentage of resolved predictions marked correct. Simple and transparent. A predictor needs at least 2 resolved predictions to appear on the leaderboard.


Boldness

Average confidence across all predictions. Rewards people making specific, near-term, falsifiable claims over vague hedging. Someone predicting "AGI by 2028 at 80% confidence" is bolder than someone saying "AGI eventually, maybe."

As more predictions resolve, we plan to add Brier scores (measuring calibration quality) and calibration curves (are people who say 80% right 80% of the time?). These require a larger sample of resolved predictions to be meaningful.

Limitations

  • Confidence assignments for qualitative predictions involve editorial judgment. Reasonable people could assign different values.
  • We track a sample of public predictions, not an exhaustive record. Selection bias toward well-known figures is inherent.
  • "AGI" has no universally agreed definition. We use the predictor's own framing when evaluating their claim.
  • Early-stage tracker. Most predictions are still pending. The scoreboard will become more meaningful as predictions resolve over the coming months and years.
  • We do not track financial predictions, crypto markets, or stock prices. That is outside our scope.

Corrections and Additions

If we have misattributed a prediction, used an incorrect source, or missed an important public prediction worth tracking, contact us at nic@future-shock.ai. We take accuracy seriously and will correct errors promptly.

Free daily briefing on what actually matters in AI.