PerpForge
Get started

Concept · Is it a real edge, or luck?

Statistical Significance

A measurement is statistically significant when the observed pattern is unlikely to have occurred by random chance. In strategy evaluation, it means we have enough trades to trust that the observed metrics reflect a true underlying edge.

Statistical Significance

A measurement is statistically significant when the observed pattern is unlikely to have occurred by random chance. In strategy evaluation, it means we have enough trades to trust that the observed metrics reflect a true underlying edge.

In plain English

If you flip a coin 5 times and get 4 heads, you cannot conclude the coin is biased — 4-out-of-5 happens to fair coins about 19% of the time. If you flip it 500 times and get 400 heads, you can conclude the coin is biased — that outcome is essentially impossible by chance.

A trading strategy is the same. A strategy with 35 trades and 60% win rate could easily be a 40%-true-WR strategy that got lucky. A strategy with 500 trades and 60% win rate cannot be — the sample is large enough that luck is ruled out.

Significance is about the measurement, not the strategy. The strategy might have real edge or not — significance just tells you whether the observed performance is good evidence either way.

Why it matters for this fleet

The 210-strategy fleet ranges from N=3 trades (a 50/200 daily variant) to N=10,574 (a 9/21 scalp on 1-minute candles). The metrics for the low-N strategies cannot be trusted as much as the metrics for high-N strategies, even when the headline numbers look better.

This is the single most important corrective to "PnL ranking" — high-PnL low-trade strategies are often statistical mirages.

The math (binomial proportion confidence interval)

For a measured win rate p̂ over N trades, the standard error is:

SE = sqrt( p̂ × (1 − p̂) / N )

The 95% confidence interval is roughly p̂ ± 1.96 × SE.

At N=30, the half-width of that CI is around ±17.5% (assuming p̂ ≈ 40%). That means a measured 30% win rate is consistent with anything from ~12% to ~47% true WR.

Examples from the live fleet

  • id478 (EMA 50/200 · BTC · 1d · 2× · long): just N=3 trades, observed win rate (the share of trades that close in profit) 66.7%. The Wilson confidence interval (the range the true win rate plausibly lives in) is ±36.5pp (percentage points) — interval [20.8%, 93.9%]. The true win rate could be anywhere from 20% to 94%. That spans "losing strategy" to "almost-certain win." Useless for any decision. The profit factor of 20.8 is just as meaningless.
  • id511 (EMA 21/50 · BTC · 1h · 2× · long): N=469 trades, observed win rate 24.9%. Wilson CI ±3.9pp — interval [21.2%, 29.1%]. The win rate is tightly pinned: a trustworthy estimate.

Same dataset, two completely different epistemic statuses. Note: a tightly-pinned win rate (id511 passes that) is not the same as a proven edge — id511's edge is still NOT-significant. Pinning the measurement and proving the advantage are two separate questions.

Practical thresholds

Trade count What you can do with the metrics
N < 30 Discard. Numbers are noise.
30 ≤ N < 100 Hypothesize edge; validate elsewhere before trusting.
100 ≤ N < 300 Meaningful, but wide intervals — watch for regime sensitivity.
N ≥ 300 Reliable estimates; PF and Sharpe close to true values.

How to defend against low-N traps

  • Combine related strategies (e.g. all variants in a family) to pool the sample. Same-signal variants at different leverages share the exact same trades (id523 at 2× ≡ id659 at 1× — the identical 436 trades), so pooling across leverage adds no information; pooling across genuinely different conditions does.
  • Walk forward to gather more out-of-sample observations.
  • Run on more symbols if the signal is symbol-independent (the rule's edge does not depend on which symbol it's evaluated on) — spawning siblings on SOL/ETH/BTC (Phase 125: explicit multi-select at spawn time) effectively triples your sample.

Related

Sources

  • wiki/qa-sessions/2026-05-17-session.md#q2 (first asked here)
  • /api/analytics perSymbol and tradeCount fields

Related concepts

See it in a real result →

Put it to the test

Does your idea have a real edge, or just a big number?

Spawn your variant, run it on the same engine, and read the edge-significance verdict — before you risk real money.

Test your own idea — free →Free account, no card