Why Most Backtests Lie

Every systematic trader has lived the same plot: a backtest with a beautiful equity curve goes live and produces something between mediocrity and slow bleeding. The instinct is to blame the market — "conditions changed." The evidence points somewhere less comfortable: most backtests are structurally biased toward optimism, and the bias is installed by the researcher, usually without noticing. Here are the five mechanisms, ranked roughly by how much damage they do.

1. Overfitting: the mathematics of fooling yourself

Run enough strategy variants over the same historical data and excellent results are guaranteed — not likely, guaranteed — even if no variant has any real edge. This is selection under multiple testing, and its scale in trading research is hard to overstate. Bailey, Borwein, López de Prado, and Zhu (2014) showed that with as few as a handful of independent trials, the expected maximum Sharpe ratio among them is substantial even when the true Sharpe of every trial is zero. The best backtest out of 200 parameter combinations is not evidence; it is an order statistic.

The tell is fragility: move a parameter 10% and performance collapses; shift the sample window a year and the strategy inverts. Genuine edges are usually boring — modest, stable across reasonable parameter neighborhoods, and explicable by an economic mechanism (a risk premium, a structural flow, a behavioral regularity). If the only explanation for why a strategy works is "the optimizer found it," the honest prior is that it doesn't.

The partial remedies are procedural, not clever: fix hypotheses before testing; count every trial, including the discarded ones; hold out data that is touched exactly once; and discount reported Sharpe ratios for the number of experiments behind them (the "deflated Sharpe ratio" of Bailey and López de Prado formalizes exactly this).

2. Look-ahead bias: trading on tomorrow's newspaper

Look-ahead bias is any leak of future information into a simulated decision, and it is more insidious than the textbook cases suggest. The obvious version — computing a signal on the day's close and "executing" at that same close — still appears constantly. The subtle versions hide in data plumbing: economic figures used as of their reference date rather than their release date; restated financials replacing the numbers actually known at the time; indicators computed over a window that silently includes the current bar; databases whose symbols, contract mappings, or corporate-action adjustments were built with hindsight.

Futures add their own trap: continuous contracts. Splicing expiring contracts into one series requires a roll rule and an adjustment method, and careless choices embed phantom returns at every roll — profits no live trader could have captured. A backtest on futures that cannot state precisely which contract it held on which date, and what it paid to roll, is not a simulation; it is an illustration.

3. Survivorship bias: testing on the winners' history

Datasets curated today quietly exclude the delisted, the defaulted, and the discontinued. A strategy tested on current index constituents was, by construction, tested on companies that survived — an edge no one had in advance. Futures traders inherit a cousin of this bias: strategies validated only on today's most liquid contracts ignore the products that lost their liquidity, and the regimes that killed them.

4. Cost mismodeling: the friction discount

Backtests routinely assume fills at prices no counterparty offered: mid-price executions, limit orders filled the instant price touches them (ignoring queue position entirely), zero impact at any size, and constant spreads through calm and chaos alike. As we detail in our microstructure series, every one of these assumptions is systematically wrong in the optimistic direction — and for high-turnover strategies, realistic frictions alone are frequently the entire distance between a backtested edge and a live loss. A useful discipline: model costs pessimistically, then require the strategy to survive. Edges that only exist under generous fill assumptions are execution fantasies, not strategies.

5. Regime dependence: the honest limitation

Even a methodologically clean backtest establishes one thing only: the strategy would have worked over that sample. Markets are non-stationary — volatility regimes rotate, correlations break, market structure itself changes (decimalization, electronification, the growth of passive flow). Some edges also carry the seeds of their own decay: once discovered and crowded, they are arbitraged thin. A backtest is a necessary filter, never a sufficient one.

The discipline that survives contact with reality

The common thread across all five failure modes is the absence of an evidentiary standard. The corrective is to treat live execution data as the ground truth that audits the simulation: log every signal, every order, every fill with synchronized timestamps; reconcile live slippage against backtested assumptions monthly; and treat a widening gap between simulated and realized performance as an incident, not a mystery. This is a place where compliance-grade infrastructure quietly pays for itself — the same immutable execution records regulators expect are exactly the dataset that keeps a research process honest.

A backtest is a hypothesis about the past. Live, timestamped execution is the only experiment. Firms that confuse the two pay tuition; firms that instrument the difference compound.

References

Bailey, D., Borwein, J., López de Prado, M. & Zhu, Q. (2014). "Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance." Notices of the AMS, 61(5).
Bailey, D. & López de Prado, M. (2014). "The Deflated Sharpe Ratio." Journal of Portfolio Management, 40(5).
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
Harvey, C., Liu, Y. & Zhu, H. (2016). "…and the Cross-Section of Expected Returns." Review of Financial Studies, 29(1).

This article is educational material and does not constitute investment advice. Trading derivatives involves substantial risk of loss.