Calibration Simulator

Uncertainty Has Shape

This simulator teaches executives how confidence, calibration, uncertainty, probability curves, and decision quality interact.

Core Lessons

  • Point estimates are fragile.
  • Confidence rises as specificity falls.
  • Probability has shape.
  • Tail risk matters.
  • Calibration can be improved.
  • Forecasts support decisions, not certainty.
Point Estimate {{ state.pointIndex + 1 }} of {{ state.questions.length }}

Give your best exact answer.

{{ currentQuestion.text }}
Point Estimate Results

Exact answers are fragile.

{{ pointScore }}/{{ state.questions.length }}

Exact answers are brutally unforgiving.

Question Your Estimate Result
{{ q.text }} {{ q.pointEstimate }} {{ q.correct ? 'RIGHT' : 'WRONG' }}
Probability Curves

Uncertainty has structure.

A probability curve lets you express where outcomes feel more or less likely across the range.

Key Insight

The edges are not impossible. They are near-zero probability.

Interactive Curve Builder

Shape uncertainty visually.

Important

The most likely value does not need to sit halfway between the lowest and highest values.

Probability Archetypes

How uncertainty behaves in the real world.

By manipulating these variables, we can model classic statistical archetypes that define how uncertainty behaves. Select an archetype below.

{{ archetypeDetails.title }}

Profile: {{ archetypeDetails.setup }}

Business Impact: {{ archetypeDetails.lesson }}

Confidence vs Specificity

Higher confidence requires wider ranges.

Scenario: The Legacy Rewrite

The Board asks for an estimate to rewrite the core billing system. Historical data suggests the most likely duration is 200 days. The Board demands a precise delivery window.

Adjust the slider to see how giving them a "precise" (narrow) estimate mathematically destroys your probability of actually hitting the target.

0 days 200 days (Most Likely) 400 days {{ tradeoffLow }} {{ tradeoffHigh }}
Your Estimate
{{ tradeoffLow }} → {{ tradeoffHigh }} days
Statistical Confidence
{{ tradeoffConfidence.toFixed(1) }}%
The Executive Lesson

Specific estimates are often mathematically identical to low-confidence estimates. When leaders demand an exact number, they are inherently forcing the team to provide an estimate with a near-zero probability of success.

The Equivalent Bet

Are you really that confident?

Hubbard developed "The Equivalent Bet" to fix overconfidence. Imagine you can choose between two ways to win $10,000:

Option A: Your Estimate

You win $10,000 ONLY IF the true answer falls inside the range you just built:

{{ tradeoffLow }} → {{ tradeoffHigh }}

Option B: The Wheel

You spin a wheel that has a mathematically guaranteed 90% chance of landing on a win.

90% Win Rate
0 200 (True Answer) 400 {{ tradeoffLow }} {{ tradeoffHigh }}
Your Confidence: {{ tradeoffConfidence.toFixed(1) }}%
{{ state.estimateResult === 'WIN' ? '🎉 Finished in ' + state.actualDuration + ' days! You win $10,000!' : '💀 Finished in ' + state.actualDuration + ' days. You lose.' }}
90% Win Zone
{{ state.spinResult === 'WIN' ? '🎉 Landed on green! You win $10,000!' : '💀 Landed on red. You lose.' }}
The Ego Check

If you prefer to spin the wheel, your interval is too narrow. You aren't actually 90% confident. You must widen your bounds until you are completely indifferent between the two options.

The Redemption Round

Prove your calibration.

Now that you know the Equivalent Bet, provide a calibrated range for these 5 new questions. You choose the boundaries, the most likely outcome, and your confidence level. Imagine spinning the wheel before every submission.

{{ currentRangeQuestion?.text }}
{{ currentRangeQuestion.low?.toLocaleString() }} Most Likely: {{ currentRangeQuestion.mid?.toLocaleString() }} {{ currentRangeQuestion.high?.toLocaleString() }} Confidence: {{ currentRangeQuestion.conf || 0 }}%
Redemption Results

Did you overcome your bias?

{{ state.rangeScore }}/{{ state.rangeQuestions.length }} ({{ calibrationScore.hitRate.toFixed(0) }}% Hit Rate)

{{ calibrationScore.gap <= 10 ? "Excellent. Your ego is checked and you are properly accounting for uncertainty." : "Still uncalibrated. Your stated confidence does not match reality." }}

Question Your Range Conf True Answer Result
{{ q.text }} {{ q.low }} → {{ q.high }} {{ q.conf }}% {{ q.answer.toLocaleString() }} {{ q.correct ? 'HIT' : 'MISS' }}
Noise vs. Bias

The Wisdom of Crowds

Individuals are notoriously noisy and biased, but aggregated estimates often zero in on the truth. Provide a calibrated range for the following scenario, then ask 5 simulated "Experts" for theirs.

{{ state.wisdomQuestion.text }}
{{ state.wisdomQuestion.low?.toLocaleString() }} Most Likely: {{ state.wisdomQuestion.mid?.toLocaleString() }} {{ state.wisdomQuestion.high?.toLocaleString() }} Confidence: {{ state.wisdomQuestion.conf || 0 }}%
ExpertTheir 90% CIResult
Expert {{ i + 1 }} {{ e.low.toLocaleString() }} → {{ e.high.toLocaleString() }} {{ (14500 >= e.low && 14500 <= e.high) ? 'HIT' : 'MISS' }}
THE AGGREGATE AVERAGE {{ wisdomAvgLow.toLocaleString() }} → {{ wisdomAvgHigh.toLocaleString() }} {{ (14500 >= wisdomAvgLow && 14500 <= wisdomAvgHigh) ? 'HIT' : 'MISS' }}
The Lesson

Notice how individual experts missed the target entirely (high noise), but the mathematical average of their bounds perfectly captured the true answer. This is why collaborative estimation (like Planning Poker) is mathematically superior to asking a single Lead Architect.

Escaping Trivia

Real World Business Scenario

Trivia questions break down ego, but estimation matters most in the real world. You are asked to estimate the duration of an upcoming Cloud Migration. Provide a calibrated range for the project's lead time in days.

{{ state.businessScenario.low?.toLocaleString() }} Most Likely: {{ state.businessScenario.mid?.toLocaleString() }} {{ state.businessScenario.high?.toLocaleString() }} Confidence: {{ state.businessScenario.conf || 0 }}%
Actual Duration: 114 Days

Your Estimate: {{ state.businessScenario.low }} to {{ state.businessScenario.high }} days. You scored a {{ state.businessScenario.result }}.

The Danger of Overconfidence

The exact same overconfidence bias that makes us fail trivia questions will cause you to bankrupt a multi-million dollar IT project if you don't widen your bounds to respect uncertainty.

Fermi Decomposition

Measuring the "Immeasurable"

Estimate the Year 1 Revenue of a new AI product. Instead of guessing one giant range, break it down into 3 variables. Provide a 90% Confidence Interval for each.

10th Percentile (Worst Case)
${{ state.fermi.p10.toLocaleString() }}
50th Percentile (Most Likely)
${{ state.fermi.p50.toLocaleString() }}
90th Percentile (Best Case)
${{ state.fermi.p90.toLocaleString() }}
The Lesson

You don't need exact numbers to forecast accurately. You just need calibrated bounds on the sub-components. By decomposing the problem, your uncertainty is mathematically contained.

Expected Value of Perfect Information (EVPI)

The "Buy Information" Game

Measurement is an investment. You have a $100,000 budget and must decide whether to launch a massive, highly uncertain project.

Budget: ${{ state.buyGame.budget.toLocaleString() }}
Risk of Ruin: {{ buyGameRisk }}%
Project Worst Case
${{ state.buyGame.low.toLocaleString() }}
Project Best Case
${{ state.buyGame.high.toLocaleString() }}
{{ state.buyGame.resultText }}
The Lesson

Information is only valuable if it reduces uncertainty enough to change a decision and justify its cost. By buying information, you transformed a gamble into a calculated, safe bet.

Confidence as a Decision Variable

Acceptable for WHAT purpose?

Pedagogically, 90% is the training standard to shock people out of false precision. Operationally, insisting on 90% confidence everywhere is dysfunctional. Confidence is a decision variable, not a virtue signal.

{{ confidenceTradeoffScenario.name }}

{{ confidenceTradeoffScenario.description }}

Risk of Failure Cost
${{ Math.round(confidenceTradeoffChart.current.riskCost).toLocaleString() }}
Cost of Delay & Opp.
${{ Math.round(confidenceTradeoffChart.current.delayCost).toLocaleString() }}
Total Expected Cost
${{ Math.round(confidenceTradeoffChart.current.totalCost).toLocaleString() }}
{{ confidenceTradeoffChart.current.status }} (Optimal: ~{{ confidenceTradeoffChart.optimalConf }}%)
Value Destruction Warning

Waiting for 90% confidence in this scenario causes the delay and opportunity costs to vastly exceed the cost of failure. An honestly calibrated 80% forecast is vastly superior to a politically optimized 95% deterministic commitment.

From Certainty Performance to Uncertainty Literacy

The real question becomes: "When we say 80%, do outcomes actually land inside the range ~80% of the time?" That is calibration.

50%Exploratory / speculative
70%Directional confidence
80%Actionable operational confidence
90%Strong planning confidence (Training Standard)
95%+Extremely conservative / high-cost-of-failure
Forecast Quality Measurement

The Brier Score

A Brier score is a way to measure how good probabilistic predictions are. It rewards accurate confidence, honest uncertainty, and calibration, while penalizing unjustified certainty, overconfidence, and underconfidence.

Interactive Brier Score Calculator

Formula: (Predicted Probability - Actual Outcome)² (0 is perfect, 1 is worst)

{{ state.brierDemo.prediction }}%
Resulting Brier Score
{{ brierDemoScore }}
Catastrophic Overconfidence

You predicted {{ state.brierDemo.prediction }}% but it failed. This results in a terrible score ({{ brierDemoScore }}). This is why overconfidence is punished brutally.

Good Forecast

You predicted {{ state.brierDemo.prediction }}% and it happened. Your score is {{ brierDemoScore }}. Very good. You were confident and correct.

Honest Uncertainty

You predicted {{ state.brierDemo.prediction }}% and it missed. Score: {{ brierDemoScore }}. Not amazing, but much better than fake certainty.

Why This Is Powerful

Brier scores force a shift from asking "Was I right?" to asking "Was my confidence appropriate?" That is a profound shift in executive thinking.

Typical Ranges

0.00Perfect
0.05 - 0.10Exceptional
0.10 - 0.20Strong
0.20 - 0.30Moderate
0.50+Terrible calibration
Typical Forecasting Patterns
The Overconfident Executive

Predicts 99%, but is wrong. Brier: 0.9801.
Looks decisive, but forecasts terribly.

The Underconfident Analyst

Predicts 55%, and is correct. Brier: 0.2025.
Often calibrated, but timid.

The Strong Forecaster

Predicts 80% and is correct. Brier: 0.0400.
They are not always right, but they are probabilistically honest.

Final Debrief

Calibration is organizational intelligence.

Your Personal Calibration Score
{{ calibrationScore.hitRate.toFixed(0) }}% Hit Rate
Brier: {{ calibrationScore.brierScore }}
({{ calibrationScore.status }})

You achieved a {{ calibrationScore.hitRate.toFixed(0) }}% hit rate with a Brier Score of {{ calibrationScore.brierScore }} (0 is perfect, 0.25 is random guessing). This metric has been saved to your learner profile. Calibration is a trainable skill—you can watch this score improve over time!

Douglas Hubbard's Rule

"Strong forecasters are not people who know the future. They are people whose confidence behaves more like reality. Measurement is about uncertainty reduction, not absolute certainty."

Tom DeMarco's Rule

"An estimate is a probability distribution. A target is a business desire. The moment a leader conflates a target with an estimate, they destroy the estimate and guarantee failure."