Concept · The traps that fake an edge
Combining results from multiple independent backtests (or studies) into one pooled estimate, by weighting each result by its precision (inverse-variance weighting). The proper way to summarize per-venue or per-period results — but ONLY when those results are genuinely independent.
Meta-analysis means: combining several separate backtest results into a single, tighter estimate of a strategy's true edge — giving more weight to the more precise (larger-sample, lower-noise) results.
Say you ran one strategy separately on three independent slices of data. Three Sharpe ratios (Sharpe = mean per-trade return divided by its volatility) came out. If they're roughly consistent (their confidence intervals overlap around a similar value) AND the slices are truly independent, you can combine them into one tighter estimate.
Doing this naively (averaging the three Sharpes) is wrong — it treats them as if they had equal precision, which they don't. The dataset with more trades has a tighter SE (Standard Error) and deserves more weight. The dataset with fewer trades is noisier and deserves less.
The standard method is inverse-variance weighting: each estimate is weighted by 1 / SE². Datasets with smaller standard error contribute more.
Given k independent estimates S_i with standard errors SE(S_i):
w_i = 1 / SE(S_i)²
S_pooled = Σ (S_i × w_i) / Σ w_i
SE(S_pooled) = sqrt(1 / Σ w_i)
The pooled estimate has tighter standard error than any individual estimate — that's the whole point of pooling.
Before pooling, formally check whether per-dataset estimates agree. The standard test is Cochran's Q:
Q = Σ w_i × (S_i − S_pooled)²
Compare Q against a χ² distribution with k-1 degrees of freedom. If Q is too large, the estimates are heterogeneous — do not pool. If Q is in the expected range, pooling is justified.
Less formal: if all per-dataset CIs overlap a common point, you can pool. If any pair fails to overlap, you can't.
Dossier #1 is the textbook case where pooling is tempting but wrong.
The same "21/50 cross, 1h, going long" rule was run on three symbols, and all three came out positive:
You might be tempted to pool these three — inverse-variance-weight them — to gain confidence in "the 21/50 long signal." Don't. Pooling is only valid for independent results, and these three break independence on every axis at once:
Inverse-variance pooling assumes the three errors are independent. Here they are heavily shared. Pooling them would shrink the standard error as if you had three independent looks at the edge, when you really have something much closer to one — it would overstate your confidence. The honest move is to report the three numbers and their confidence intervals separately, exactly as above.
For most analyses in this fleet, the simpler workflow is:
wiki/qa-sessions/2026-05-17-session.md#q7 (first asked here)Related concepts
See it in a real result →Put it to the test
Spawn your variant, run it on the same engine, and read the edge-significance verdict — before you risk real money.