The Farness thesis
Forecasting as a harness
Why reframing decisions as predictions leads to better outcomes—and how to do it.
The problem with advice
When we ask someone—a friend, a mentor, an AI—"Should I do X?", we're asking the wrong question. The answer we get depends entirely on unstated assumptions: What do we value? What counts as success? How certain is the advisor? None of this is made explicit.
Worse, we can never learn from these answers. A year later, we can't evaluate whether the advice was good because we never defined what "good" meant. The feedback loop is broken.
This isn't just a problem with AI (though AI's tendency toward sycophancy makes it worse[1]). It's a problem with how we structure decision-making conversations. Annie Duke calls this "resulting"—judging decisions by outcomes rather than process[16]. When we ask for advice and get a good outcome, we credit the advice. Bad outcome, we blame it. But a single outcome tells us almost nothing about whether the decision was good.
The reframe
Instead of asking for advice, ask for forecasts conditional on actions.
The shift is subtle but transformative:
Before:"Should I take this job?"
After:"If I value income, growth, and work-life balance, what's the probability that each of these exceeds my threshold under Option A vs Option B? What assumptions drive those estimates?"
This forces several things to happen:
- Values become explicit.You must state what you're optimizing for before anyone can help you.
- Uncertainty becomes visible.A forecast requires a confidence interval. "Probably fine" becomes "70% chance, with a range of 50-85%."
- Assumptions surface. To make a forecast, you must reason about mechanisms. What needs to be true for this outcome to occur?
- Accountability emerges. Predictions can be scored. Opinions cannot.
The superforecasting connection
This isn't a new idea. Philip Tetlock's research on superforecasting[2] identified a set of techniques that reliably improve predictive accuracy. In the Good Judgment Project, a small group of forecasters consistently beat professional intelligence analysts with access to classified information[3].
Their techniques include:
- Fermi decomposition: Break complex estimates into simpler, estimable components[4].
- Outside view first:Start with base rates before adjusting for specifics—what Kahneman calls "reference class forecasting"[5].
- Calibrated confidence: Your 80% predictions should come true 80% of the time.
- Continuous updating: Revise estimates as new information arrives, following Bayesian principles.
Superforecasters don't have access to secret information. They're just more disciplined about structuring their thinking. Across nearly 100 comparative studies, Dawes, Faust, and Meehl found that structured "mechanical" prediction equaled or outperformed unstructured expert judgment in every domain tested[17]. Farness applies this discipline to personal and professional decisions.
Why AI makes this better
Large language models are surprisingly good at forecasting. LLM ensembles can match human crowd accuracy on prediction tasks[6]. Halawi et al. built a retrieval-augmented system that approaches competitive forecaster accuracy[18], and AI forecasting systems like AIA Forecaster have achieved superforecaster-level performance through structured pipelines of search, independent reasoning, and calibration[7]. The CAIS forecasting bot has demonstrated superhuman accuracy on competitive forecasting platforms[8]. On ForecastBench, LLMs now surpass the median public forecaster, with projected LLM-superforecaster parity by late 2026[28].
But LLMs are also prone to sycophancy: telling you what you want to hear rather than what's true. Research has shown this tendency is robust across models and contexts[1].
The forecasting frame is a harnessthat constrains this tendency. When you ask an AI for a probability with a confidence interval, it's harder for it to simply validate your existing beliefs. Numbers create accountability. Xiong et al. found that structured elicitation strategies—multi-step prompting, top-k sampling—can help mitigate LLM overconfidence, though no single technique consistently outperforms others[19]. How you ask matters as much as what you ask.
More importantly, the structure itself improves thinking. Research on LLM-augmented forecasting found that AI assistance significantly boosts human forecasting accuracy, with the largest gains for less experienced forecasters[9]:
- KPI definition forces you to articulate what you actually care about.
- Option expansionsurfaces alternatives you hadn't considered.
- Assumption surfacing reveals where your model might be wrong.
- Sensitivity analysis shows which uncertainties matter most.
The AI becomes a structured thinking partner, not an oracle.
See the research:I've developed a methodology called "stability-under-probing" to empirically test whether frameworks reduce sycophancy. Read the paper →
The calibration loop
The most powerful part of this approach is what happens over time. By logging your forecasts and scoring them against reality, you build a calibration curve.
Research on expert prediction shows that without feedback, even domain experts are poorly calibrated[10]. Lichtenstein, Fischhoff, and Phillips found that when people said they were 98% confident, they were correct only 68% of the time[20]. But with structured feedback, calibration improves dramatically. Weather forecasters and professional oddsmakers—who receive regular, structured feedback on their probabilistic predictions—exhibited little or no overconfidence. The Good Judgment Project confirmed this: regular accuracy feedback was one of the key interventions that improved performance[3].
You learn that you're overconfident on career decisions. Or underconfident on technical estimates. Or systematically biased toward optimism about timelines.
This meta-knowledge is invaluable. It's not just about making better individual decisions—it's about understanding your own decision-making patterns and compensating for systematic biases.
The decision quality chain
Ron Howard and the Strategic Decisions Group developed a framework for measuring decision quality at the time of decision, independent of outcome[21]. A decision is only as good as its weakest link across six elements: appropriate frame, creative alternatives, reliable information, clear values, sound reasoning, and commitment to action[22].
Farness maps directly onto this chain. Defining KPIs addresses frame and values. Option expansion addresses creative alternatives. Forecasting with base rates addresses reliable information and sound reasoning. The calibration loop addresses the feedback mechanism that strengthens every link over time.
The key insight from decision analysis is that you can assess decision quality without waiting for outcomes. Howard's information value theory shows that when decisions are framed as forecasts, you can calculate exactly how much to invest in resolving each uncertainty[23]. If the expected value of learning your probability of success is only $50, don't spend $5,000 on a feasibility study.
This connects to what Kahneman and Lovallo call the "inside view" versus "outside view"[24]. Decision makers naturally treat each problem as unique, anchoring on plans and scenarios rather than base rates from comparable situations. Reframing decisions as forecasts naturally invokes the outside view by forcing explicit probability assessment against a reference class.
Boosting, not nudging
Hertwig and Grune-Yanoff distinguish "nudges" (environmental changes that steer behavior) from "boosts" (interventions that build decision-making competence)[25]. A nudge might default your retirement savings to 10%. A boost teaches you to think about compound interest so you choose the right rate yourself.
Farness is a boost, not a nudge. It doesn't tell you what to decide. It teaches a way of thinking—probabilistic, structured, accountable—that transfers across domains. Julia Galef calls this the "scout mindset": treating beliefs as provisional hypotheses to be stress-tested, not positions to defend[26]. The forecasting frame cultivates this mindset by making accuracy the explicit goal.
And critically, Koriat, Lichtenstein, and Fischhoff showed that simply asking people to generate reasons against their preferred option eliminates overconfidence almost entirely[27]. Structured consideration of alternatives—a core forecasting discipline—is one of the most robust debiasing techniques known.
The framework
Farness implements a five-step process, drawing on structured analytic techniques from intelligence analysis[11] and the superforecasting literature:
- Define KPIs.What outcomes matter? Pick 1-3 metrics you'd actually use to judge success in hindsight. This mirrors the "AIMS" technique (Audience, Issue, Message, Storyline) from intelligence analysis[11].
- Expand options.Don't just compare A vs B. What about C? Waiting? A hybrid? The best option is often one you didn't initially consider. This combats "premature closure"—a well-documented cognitive bias[12].
- Decompose and forecast.For each option x KPI, apply outside view, inside view, Fermi decomposition. Produce a point estimate with confidence interval. Decomposition is one of Heuer's core structured analytic techniques[11].
- Surface assumptions.What must be true for this forecast to hold? What would change it? This is the "key assumptions check" from intelligence tradecraft[13].
- Log and score. Record the decision. Return in 3-6 months. Compare predictions to reality. Update your calibration. Brier scores provide a proper scoring rule that rewards both accuracy and calibration[14].
When to use it
Farness is valuable across a range of decisions:
- High-stakes decisions where the cost of being wrong is significant.
- Recurring decision types where you can build calibration over time.
- Decisions with delayed feedbackwhere you won't know if you were right for months or years.
- Decisions where you suspect motivated reasoning—where you might be fooling yourself[15].
- Smaller decisions as practice—building the habit and calibration data that pays off when stakes are high.
The vision
Imagine a world where every significant decision comes with:
- Explicit success criteria
- A range of options, not just the obvious ones
- Quantified predictions with uncertainty ranges
- Surfaced assumptions that can be tested
- A record that can be scored and learned from
This is possible today. The tools exist. The research supports it. What's missing is the habit—the muscle memory of reaching for forecasts instead of opinions.
Farness is an attempt to build that habit. Use it as a Python library, a CLI tool, or a Claude Code plugin. Log your decisions. Score your predictions. Get better over time.
References
- ↑Sharma, M., et al. (2024). "Towards Understanding Sycophancy in Language Models." ICLR 2024. openreview.net
- ↑ Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown. Amazon
- ↑Mellers, B., et al. (2014). "Psychological Strategies for Winning a Geopolitical Forecasting Tournament." Psychological Science, 25(5), 1106-1115. DOI
- ↑Good Judgment. "Superforecasters' Toolbox: Fermi-ization in Forecasting." goodjudgment.com
- ↑Kahneman, D., & Tversky, A. (1979). "Intuitive Prediction: Biases and Corrective Procedures." TIMS Studies in Management Science, 12, 313-327.
- ↑Schoenegger, P., et al. (2024). "Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy." arXiv:2402.19379. arxiv.org/abs/2402.19379
- ↑Alur, R., et al. (2025). "AIA Forecaster: Technical Report." arXiv:2511.07678. arxiv.org/abs/2511.07678
- ↑Center for AI Safety. "Superhuman Automated Forecasting." safe.ai/blog/forecasting
- ↑Schoenegger, P., et al. (2024). "AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy." arXiv:2402.07862. arxiv.org/abs/2402.07862
- ↑ Tetlock, P. E. (2005). Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press. Princeton University Press
- ↑ Heuer, R. J., & Pherson, R. H. (2015). Structured Analytic Techniques for Intelligence Analysis (2nd ed.). CQ Press. Amazon
- ↑Kruglanski, A. W., & Webster, D. M. (1996). "Motivated Closing of the Mind: 'Seizing' and 'Freezing'." Psychological Review, 103(2), 263-283. DOI
- ↑CIA. (2009). "A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis." cia.gov
- ↑Brier, G. W. (1950). "Verification of Forecasts Expressed in Terms of Probability." Monthly Weather Review, 78(1), 1-3. DOI
- ↑Kunda, Z. (1990). "The Case for Motivated Reasoning." Psychological Bulletin, 108(3), 480-498. DOI
- ↑ Duke, A. (2018). Thinking in Bets: Making Smarter Decisions When You Don't Have All the Facts. Portfolio/Penguin.
- ↑Dawes, R. M., Faust, D., & Meehl, P. E. (1989). "Clinical Versus Actuarial Judgment." Science, 243(4899), 1668-1674. DOI
- ↑Halawi, D., Zhang, F., Chen, Y.-H., & Steinhardt, J. (2024). "Approaching Human-Level Forecasting with Language Models." NeurIPS 2024. arxiv.org/abs/2402.18563
- ↑Xiong, M., Hu, Z., Lu, X., et al. (2024). "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs." ICLR 2024. arxiv.org/abs/2306.13063
- ↑Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). "Calibration of Probabilities: The State of the Art to 1980." In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under Uncertainty: Heuristics and Biases (pp. 306-334). Cambridge University Press.
- ↑Howard, R. A. (1988). "Decision Analysis: Practice and Promise." Management Science, 34(6), 679-695. DOI
- ↑ Spetzler, C., Winter, H., & Meyer, J. (2016). Decision Quality: Value Creation from Better Business Decisions. Wiley.
- ↑Howard, R. A. (1966). "Information Value Theory." IEEE Transactions on Systems Science and Cybernetics, 2(1), 22-26. DOI
- ↑Kahneman, D., & Lovallo, D. (1993). "Timid Choices and Bold Forecasts: A Cognitive Perspective on Risk Taking." Management Science, 39(1), 17-31. DOI
- ↑Hertwig, R., & Grune-Yanoff, T. (2017). "Nudging and Boosting: Steering or Empowering Good Decisions." Perspectives on Psychological Science, 12(6), 973-986. DOI
- ↑ Galef, J. (2021). The Scout Mindset: Why Some People See Things Clearly and Others Don't. Portfolio/Penguin.
- ↑Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). "Reasons for Confidence." Journal of Experimental Psychology: Human Learning and Memory, 6(2), 107-118. DOI
- ↑Karger, E., et al. (2025). "ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities." ICLR 2025. openreview.net