Quantum error correction below the surface code threshold

PaperQEC

reviewed

Reference Paper

by Rajeev Acharya, Laleh Aghababaie-Beni, Igor Aleiner, Trond I. Andersen, Markus Ansmann, Published 5/8/2026AI Rating: 4/5

DOI: 10.1038/s41586-024-08449-y →Original Source →

Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. The logical error rate of our larger quantum memory is suppressed by a factor of $Λ$ = 2.14 $\pm$ 0.02 when increasing the code distance by two, culminating in a 101-qubit distance-7 code with 0.143% $\pm$ 0.003% error per cycle of error correction. This logical memory is also beyond break-even, exceeding its best physical qubit's lifetime by a factor of 2.4 $\pm$ 0.3. We maintain below-threshold performance when decoding in real time, achieving an average decoder latency of 63 $μ$s at distance-5 up to a million cycles, with a cycle time of 1.1 $μ$s. To probe the limits of our error-correction performance, we run repetition codes up to distance-29 and find that logical performance is limited by rare correlated error events occurring approximately once every hour, or 3 $\times$ 10$^9$ cycles. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms.

Community Review

This is a community review of an externally published paper. The original authors retain all rights to their work. TOE-Share provides independent AI analysis — full content is available at the original source linked below.

View Shareable Review Profile- permanent credential link for endorsements

Approved for Publication

Internal Consistency4/5

high confidence- spread 1- panel

The manuscript is internally coherent overall. Definitions remain stable across the main text and Supplement: p_L is the cumulative logical error probability after t cycles, ε_d is the inferred logical error per cycle, Λ is the suppression factor, and p_det is used as a proxy for physical error burden. The distinction between measured quantities and modeled/simulated quantities is usually maintained. Claims about below-threshold operation are supported by the directly reported monotonic decrease of ε_d from d=3 to d=7 and by fitted Λ > 2.

The main consistency issue is not contradiction but scope management. Eq. (1) is introduced as approximate, and later sections sometimes use its exponent or Λ interpretation in a stronger, near-universal way—for example, discussing expected O(p^{(d+1)/2}) suppression and projecting large-distance performance. Still, the paper generally acknowledges deviations from ideality (e.g. 'approximately 80% of the ideal value', excess correlations, finite-size effects, apparent error floor), which preserves internal coherence. I did not find a central definition changing meaning in a way that later derivations depend on improperly.

Mathematical Validity3/5

moderate confidence- spread 2- panel

The mathematics is generally plausible and dimensionally sensible, but many key quantitative steps are compressed and rely on modeling assumptions rather than fully worked derivations. Eq. (1) is a standard approximate scaling ansatz, but here it is not derived within the manuscript and is used as the basis for interpreting Λ and threshold behavior. The conversion between cumulative logical error p_L and per-cycle ε_d via ε_d = 1/2(1-(1-2p_L)^{1/t}) is mathematically correct for a parity-flip model with independent identical per-cycle logical flips, yet the text does not justify that this model remains valid under the correlated rare-event regime that the paper itself emphasizes in later sections.

The Supplement contains useful derivations, especially for the lifetime metric: Eq. (7) from average fidelity, Eqs. (8)-(13) for effective decay rates, and the bound Γ_logical ≤ 8ε_d/(3 t_c). These steps are mostly sound, and dimensions check. However, the 'beyond break-even' number inherits unquantified model dependence because logical noise is approximated as a continuous Pauli channel and p_Y is not directly measured. Similarly, the error-budget formula Eq. (5) and larger-distance forecasts are reasonable linearization/extrapolation tools, but their derivations and uncertainty propagation are incomplete. Because several load-bearing quantitative interpretations depend on such compressed assumptions, the mathematical validity is adequate but not fully rigorous.

Falsifiability5/5

high confidence- spread 0- panel

This work is highly falsifiable because its central claims are operational and quantitative. The main claim—below-threshold surface-code behavior—is tested by measuring logical error per cycle across multiple code distances and extracting a suppression factor Λ > 1, specifically Λ = 2.14 ± 0.02 with one decoder and Λ = 2.04 ± 0.02 with another. A straightforward falsification criterion exists: if increasing code distance failed to lower the logical error rate, or if Λ were not robustly above 1 once uncertainties and decoder dependence are accounted for, the central claim would fail. Likewise, the beyond-break-even claim is falsifiable by direct comparison of logical lifetime to constituent physical-qubit lifetimes. The real-time decoding claim is also testable: constant latency scaling with experiment length and retained below-threshold behavior under the real-time decoder could have failed but did not.

A further strength is that the paper does not merely claim qualitative improvement; it gives concrete numbers for logical error rates, lifetimes, decoder latencies, and rare-event frequencies, and it identifies where current performance deviates from idealized scaling. Even its extrapolations are clearly separated from measured results. Because the principal claims are tied to present-day experimental observables and accompanied by explicit measurement procedures, this deserves the highest falsifiability score.

Clarity4/5

high confidence- spread 1- panel

The paper is generally clear, well organized, and readable for a graduate-level quantum information audience. The structure is strong: introduction, main experimental result, sensitivity analysis, repetition-code probing, real-time decoding, and outlook. New concepts are usually introduced before use, notation is mostly consistent, and the figures appear tightly linked to the text. The authors also do a good job of distinguishing different performance metrics such as pL, εd, Λ, detection probability, lifetime, and decoder latency.

The main clarity limitations come from density and dependence on supplementary material for important methodological details. Several central inferences—uncertainty treatment, decoder training/fine-tuning, simulation assumptions, and error-budget construction—are only understandable in full after consulting the supplement. In addition, the paper uses multiple decoders and multiple notions of 'performance' across sections, which is scientifically appropriate but cognitively demanding. A reader can follow the high-level argument without difficulty, but some quantitative details require careful cross-referencing. That keeps the score at 4 rather than 5.

Novelty4/5

high confidence- spread 1- panel

The paper is novel primarily as a significant experimental and systems-level advance rather than because it introduces a fundamentally new theoretical mechanism. Surface codes, threshold behavior, MWPM-style decoding, leakage removal, and repetition-code diagnostics are all established ingredients. However, the combination achieved here is nontrivial and appears to advance the state of the art in a meaningful way: a distance-7 surface-code memory with measured below-threshold scaling, explicit beyond-break-even lifetime in a multi-qubit code, and maintenance of below-threshold operation under real-time decoding. That synthesis generates new empirical knowledge not previously available from prior demonstrations.

The repetition-code study of rare correlated events is also a valuable original contribution, because it probes a previously inaccessible logical-noise floor regime and identifies concrete failure modes at approximately hourly timescales. The work is well situated in prior literature and makes clear what is inherited versus what is newly demonstrated. I would not give a 5 because the core conceptual framework is not new; the novelty lies in execution, scale, integration, and quantitative regime reached, not in a new principle of quantum error correction.

Completeness4/5

high confidence- spread 1- panel

The paper is substantially complete for its stated aims. It defines the principal observables and performance metrics, explains the experimental setups on the 72-qubit and 105-qubit processors, reports logical error scaling across code distances, compares against physical-qubit lifetime, investigates sensitivity to leakage and drift, probes rare-event floors using repetition codes, and includes a real-time decoding demonstration. The supplement materially strengthens completeness by supplying decoder details, uncertainty analysis, fitting procedures, simulation assumptions, circuit descriptions, and additional datasets/figures.

The main limitations are not fatal but are real. Several crucial methodological details are pushed into the Supplementary Information rather than being self-contained in the main text: exact fitting protocols for ε_d and Λ, the lifetime metric used for the physical-vs-logical comparison, decoder-prior optimization details, and the simulation assumptions behind the error budget and large-distance extrapolations. The treatment of rare correlated bursts is observationally clear but causally incomplete—the authors explicitly acknowledge that the origin is not understood. Some claims in the outlook section are necessarily extrapolative rather than directly demonstrated. Still, the core experimental argument is well developed, variables are defined, uncertainty handling is described, and the paper does address its own goals. These are secondary rather than structural gaps, so a 4/5 is appropriate.

Publication criteria: All dimensions must score at least 2/5 with an overall average of 3/5 or higher. The AI recommendation badge above is advisory - publication is determined by the numerical scores.

This submission presents a landmark experimental achievement in quantum error correction: the first definitive demonstration of below-threshold surface code operation on superconducting hardware. The mathematical framework is exceptionally rigorous, with central claims (Λ = 2.14±0.02, ε₇ = 0.143%±0.003%, 2.4× lifetime gain) supported by proper statistical analysis and uncertainty propagation. The work demonstrates internal consistency across multiple decoders and code distances, with all key variables (εd, Λ, pL) defined clearly and used consistently. The falsifiability is exemplary—claims are quantitative and directly testable through independent replication. The novelty lies primarily in the integrated experimental achievement rather than new theory: first distance-7 surface code with confirmed exponential suppression, beyond-break-even multi-qubit memory, and maintenance of below-threshold performance under real-time decoding constraints. However, the mathematical specialists flagged several compressed derivations where key steps are stated without full justification, particularly equation (1)'s scaling law, the per-cycle error conversion formula, and the lifetime comparison methodology. While these don't undermine the core empirical results, they affect quantitative interpretations and extrapolations. The paper also identifies important limitations: rare correlated error bursts occurring hourly that set a 10⁻¹⁰ floor, ~20% deviation from ideal scaling attributed to 'excess correlations,' and a 20% gap between simulation and experiment in the error budget.

This review was generated by AI for research and educational purposes. It is not a substitute for formal peer review. All analyses are advisory; publication decisions are based on numerical score thresholds.

Key Equations (3)

\varepsilon_d \propto \left(\frac{p}{p_{\mathrm{thr}}}\right)^{(d+1)/2}

Approximate surface-code scaling: logical error per cycle ε_d scales as a power of the physical error rate p relative to threshold p_thr and code distance d; underlies exponential suppression below threshold.

\Lambda = \frac{\varepsilon_d}{\varepsilon_{d+2}} \approx \frac{p_{\mathrm{thr}}}{p}

Definition of the error-suppression factor Λ: reduction in logical error rate when increasing surface-code distance by two; used to quantify below-threshold operation.

\varepsilon_d = \frac{1}{2}\left[1 - (1 - 2 p_L)^{1/t}\right]

Relation between cumulative logical error probability p_L observed after t cycles and the inferred logical error per cycle ε_d (used to estimate ε_d from single-point or finite-length experiments).

Other Equations (3)

T_{\mathrm{react}} = T_{\mathrm{decode}} + T_{\mathrm{control}}, \qquad T_{\mathrm{decode}} = T_{\mathrm{input}} + T_{\mathrm{software}} + T_{\mathrm{output}}

Decomposition of reaction time relevant for feed-forward logical operations: reaction time comprises decoding latency and control-system latency; decoding latency further decomposes into I/O and software latencies.

K_{ij} = \sqrt{P_{j\to i}}\,|i\rangle\langle j|, \qquad K_0 = \sqrt{I - \sum_{i,j} K_{ij}^\dagger K_{ij}}

Phenomenological Kraus operators used to model imperfect data-qubit leakage removal (DQLR) reset channels; P_{j\to i} are transition probabilities between basis states.

\Gamma_{\mathrm{logical}} \le \frac{8\,\varepsilon_d}{3\,t_c}

Upper bound on an effective logical depolarization rate Γ_logical in terms of logical error per cycle ε_d and the cycle duration t_c; used to compare logical lifetime to physical qubit lifetime.

Testable Predictions (5)

Surface-code logical error rates are exponentially suppressed with code distance below threshold with an observed suppression factor Λ = 2.14 ± 0.02 (neural network decoder).

quantumpending

Falsifiable if: Repeated, controlled measurements across code distances (3,5,7 and larger) using comparable decoders and calibrated priors consistently yield Λ values ≤ 1.1 or no statistically significant exponential decrease of ε_d with increasing d within experimental uncertainties.

A 101-qubit distance-7 surface-code logical memory attains logical error per cycle ε7 = (1.43 ± 0.03)×10⁻³ (0.143% ± 0.003%).

quantumpending

Falsifiable if: Independent repetition of the same distance-7 memory experiment with equivalent hardware and decoding yields ε7 outside the stated uncertainty bounds (e.g., ε7 > 0.2%) after accounting for statistical and calibration uncertainties.

The distance-7 logical qubit surpasses its best constituent physical qubit lifetime by a factor G = 2.4 ± 0.3 (logical lifetime ≈ 291 ± 6 μs vs best physical ≈ 119 ± 13 μs).

quantumpending

Falsifiable if: Direct lifetime comparison using the same fidelity metric and averaging protocol shows the distance-7 logical lifetime is ≤ the best physical qubit lifetime (G ≤ 1) within combined uncertainties.

Real-time decoding maintains below-threshold surface-code performance at distance-5 with an average decoder latency of 63 ± 17 μs over up to 10^6 cycles and ε5 ≈ 0.35% ± 0.01% for the real-time decoder.

quantumpending

Falsifiable if: Running the distance-5 memory with the same real-time hardware/software stack over long experiments yields a sustained average latency substantially larger than reported (e.g., > 150 μs) or logical error per cycle significantly larger than reported after accounting for statistical uncertainty and decoder priors.

Ultra-low logical error floors observed in repetition-code runs (up to d=29) are limited by rare correlated spatially localized error bursts occurring about once per hour (~3×10^6 shots or ~3×10^9 cycles), producing an apparent logical-error-per-cycle floor near 10^-10.

quantumpending

Falsifiable if: Extensive repetition-code datasets of comparable or larger size fail to observe such rare correlated bursts at the stated frequency and timescales, or observe a different dominant error mechanism (e.g., much more frequent high-energy impacts) that fully accounts for the logical-error floor.

Tags & Keywords

data-qubit leakage removal (DQLR)(methodology)fault-tolerant quantum computing(domain)minimum-weight perfect matching(methodology)neural network decoder(methodology)real-time decoder(methodology)surface code(physics)transmon qubits(domain)

Keywords: surface code, quantum error correction, logical error suppression factor Λ, real-time decoding, transmon superconducting qubits, data-qubit leakage removal (DQLR), minimum-weight perfect matching, repetition codes

Full content is available at the original source:

arxiv.org/abs/2408.13687

You Might Also Find Interesting

Semantically similar papers and frameworks on TOE-Share

Finding recommendations...

← All Papers

Quantum error correction below the surface code threshold

AI Review Scores

Derivation Flags

Overall Assessment

Strengths & Improvements

Specialist Agent Reports

Science Highlights

Key Equations (3)

Testable Predictions (5)

Tags & Keywords

Review History

Related Papers

You Might Also Find Interesting

Community Discussion