We investigate computational methods for identifying elliptic curves with anomalously large Tate-Shafarevich groups ($|Ш| ≫ 1$) among rank-0 curves over $ℚ$. After documenting and correcting circular reasoning in AI-assisted analysis, we find that the BSD geometric factor $α_{BSD}(E) = Ω_E^+ · ∏_p c_p(E) / |E(ℚ)_{tors}|^2$ achieves 99.5% precision at 98.4% recall for detecting $|Ш| > 1$ curves. We additionally report a power-law tail distribution for $|Ш|$ across 1.9 million curves with exponent $α̂ = 2.02 ± 0.07$, placing the distribution at the convergence threshold for $𝔼[|Ш|]$.
The paper maintains strong internal consistency throughout most sections. The BSD formula application is coherent, the circular reasoning analysis is self-consistent, and the corrected methodology follows logically. However, there are minor inconsistencies: the paper sometimes blurs verification scope statements (claiming verification 'for all curves in our dataset' while later qualifying this to conductor ≤ 500,000), and the characterization of L(E,1) as 'moderate' in §4.4 creates tension with the extreme example (L(E,1) = 306.6) cited later in §5.4.
Mathematical Validity4/5
The core mathematical content is sound. The BSD formula is correctly applied, the circularity diagnosis is mathematically accurate (showing S ≈ 1/(|Ш| · log N)), and the α_BSD inversion mechanism is valid. However, several issues affect validity: normalization conventions are asserted but not fully justified beyond empirical agreement checks that depend on BSD-derived |Ш|; the power-law fitting uses OLS on log-counts with only 38 points without proper statistical model validation; and the claim about L(E,1) being 'moderate' lacks rigorous bounds given examples like L(E,1) = 306.6. The statistical methodology for tail exponent estimation needs stronger foundation.
Falsifiability4/5
The work makes clear, testable predictions with specific performance metrics (α_BSD < 0.2 achieves 99.5% precision at 98.4% recall) and includes confidence intervals. The power-law exponent claim (α̂ = 2.02 ± 0.07) is directly testable on other datasets. The authors demonstrate commitment to falsifiability by explicitly documenting how their original method failed when independently verified. The score is 4 rather than 5 because some observational claims like the 'ragged frontier' are less predictive than the main screening results.
Clarity5/5
This is exceptionally well-written scientific communication. The paper clearly distinguishes between failed approaches and valid results, provides detailed explanations of all notation and conventions, and maintains consistent terminology throughout. The forensic reconstruction of circular reasoning is particularly clear and educational. Statistical results include proper confidence intervals and effect sizes. The abstract accurately summarizes findings without overselling, and the methodological transparency sets a high standard for computational mathematics research.
Novelty2/5
The authors are refreshingly honest that α_BSD screening 'is not a new theoretical insight' but rather the BSD formula used computationally. The main mathematical result is BSD formula inversion, which is straightforward algebra rather than a theoretical advance. The empirical observations (power-law distribution, ragged frontier) are new data patterns but not fundamental discoveries. The most valuable contribution is methodological - documenting AI-assisted research pitfalls - but this concerns research methodology rather than mathematical content. The work provides practical value through careful empirical analysis but limited theoretical novelty.
Completeness3/5
The paper is largely complete in its internal logic and mathematical exposition, with variables defined and methodology explained. However, several key gaps affect completeness: the 1.9M-curve dataset construction is under-specified (exact LMFDB queries, deduplication methods, completeness criteria); the detection performance evaluation uses a highly non-representative test set (998 vs 200) with unclear sampling procedures that may not generalize; and the tail-law statistical methodology relies on OLS fitting with only 38 points without adequate robustness analysis. The power-law claim would benefit from discrete probability modeling and goodness-of-fit testing.
Evidence Strength3/5
The evidence has significant strengths in its transparency and internal validation, but key limitations affect strength. Strengths include the thorough documentation of methodological failure, independent L-function computation via PARI/GP, and careful statistical reporting with confidence intervals. However, the evidence is weakened by: heavy reliance on a single database source without independent verification; detection performance evaluated on a deliberately biased test set that may not reflect population performance; and tail-law claims based on limited statistical methodology (OLS on 38 points) without adequate robustness checks or alternative model comparisons. The circular reasoning documentation is exemplary, but the positive findings need stronger empirical support.
Publication criteria: All dimensions must score at least 2/5 with an overall average of 3/5 or higher. The AI recommendation badge above is advisory - publication is determined by the numerical scores.
This paper presents a valuable case study in computational mathematics research methodology, demonstrating both the pitfalls and potential of AI-assisted analysis. The work's greatest contribution may be its transparent documentation of how circular reasoning can be introduced and perpetuated in AI-assisted research, providing practical lessons for the growing community of researchers using these tools.
Mathematically, the paper is fundamentally sound but offers limited theoretical novelty. The α_BSD screening method is acknowledged by the authors as simply the BSD formula read in reverse - not a new insight, but a practical computational tool. The empirical findings (power-law distribution with exponent near 2, non-monotonic conductor dependence) are interesting observations that warrant further investigation, though the statistical methodology for the tail-law analysis could be strengthened.
The paper's exceptional clarity and methodological transparency partially compensate for its limited novelty. The forensic reconstruction of the circular reasoning is educational and valuable for the community. However, several empirical claims rest on methodological choices that are not fully justified - particularly the power-law fitting procedure and the generalizability of detection performance from a biased test set to the broader population.
Overall, this represents solid computational mathematics with exemplary research integrity, but the theoretical contributions are modest and some empirical claims need stronger statistical foundations.
Strengths
Exceptional methodological transparency and documentation of research failure modes
Clear mathematical exposition with proper handling of BSD formula conventions
Practical computational tool (α_BSD screening) with well-characterized performance metrics
Valuable case study for AI-assisted research methodology and pitfall avoidance
Honest assessment of novelty limitations and appropriate scope of claims
Areas for Improvement
Strengthen statistical methodology for power-law tail analysis with discrete probability models and goodness-of-fit testing
Provide more complete specification of the 1.9M-curve dataset construction and validation procedures
Address generalizability concerns for detection performance by evaluating on representative test sets
Justify normalization conventions more rigorously beyond empirical agreement checks
Develop stronger theoretical connections between observed power-law exponent and existing heuristics
This review was generated by AI for research and educational purposes. It is not a substitute for formal peer review. All analyses are advisory; publication decisions are based on numerical score thresholds.
Key Equations (3)
L(E,1)=∣E(Q)tors∣2ΩE+⋅∏pcp(E)⋅∣Ш(E/Q)∣
The Birch-Swinnerton-Dyer leading-term formula for rank-0 elliptic curves
αBSD(E)=∣E(Q)tors∣2ΩE+⋅∏pcp(E)
The BSD geometric factor used for screening large Tate-Shafarevich groups
∣Ш∣=αBSDL(E,1)
BSD formula inverted to express Tate-Shafarevich group order
Other Equations (2)
Sreal(E)=∣L(E,1)∣⋅logNE∣L′(E,1)∣
The normalized logarithmic derivative metric that shows no dependence on |Ш|
log10count(∣Ш∣=k)=−α^⋅log10k+β^
Power law fit for the tail distribution of Tate-Shafarevich group orders
Testable Predictions (3)
The BSD geometric factor α_BSD < 0.2 identifies curves with |Ш| > 1 at 99.5% precision and 98.4% recall
mathpending
Falsifiable if: Testing on independent elliptic curve datasets with verified Tate-Shafarevich group orders shows significantly different precision/recall rates
The frequency distribution of |Ш| follows a power law with exponent α ≈ 2.02 ± 0.07 for rank-0 elliptic curves
mathpending
Falsifiable if: Analysis of larger elliptic curve databases or different conductor ranges shows a significantly different power law exponent outside the confidence interval
The normalized L-derivative |L'(E,1)|/|L(E,1)| shows tight concentration (IQR = 0.018) independent of |Ш| for rank-0 curves
mathpending
Falsifiable if: Independent L-function computations on different elliptic curve samples show significant dependence of the normalized derivative on Tate-Shafarevich group size
Tags & Keywords
AI methodology validation(methodology)algebraic geometry(math)Birch-Swinnerton-Dyer conjecture(math)computational number theory(math)elliptic curves over rationals(math)L-function computation(methodology)power law distributions(math)statistical screening(methodology)
Keywords: elliptic curves, Tate-Shafarevich groups, BSD conjecture, computational number theory, L-functions, power law distribution, geometric invariants, AI-assisted research methodology, Tamagawa numbers, statistical detection methods
Detecting Large Tate-Shafarevich Groups via BSD Geometric Invariants: Lessons from a Computational Audit of 1.9 Million Elliptic Curves
Author: Adam Murphy
Date: February 27, 2026
Version: 3.2.1
Created: 2026-02-27 PST
Last Modified: 2026-02-27 14:43 PST
Abstract
We investigate computational methods for identifying elliptic curves with anomalously large Tate-Shafarevich groups (∣Ш∣≫1) among rank-0 curves over Q. An initial stability metric based on L-function Taylor coefficients achieved apparent perfect separation but was found to embed ∣Ш∣ in its own denominator — a circularity we document as a case study in AI-assisted research pitfalls. After correcting with independently computed L-series derivatives (via PARI/GP, n=1,198 curves), we find that the normalized logarithmic derivative ∣L′(E,1)∣/(∣L(E,1)∣⋅logNE) is tightly concentrated (median 0.285, IQR 0.018) with no dependence on ∣Ш∣, eliminating the original detection mechanism.
A genuine screening signal survives in the BSD geometric factor
αBSD(E)=∣E(Q)tors∣2ΩE+⋅∏pcp(E)
computed from the real period, Tamagawa numbers, and torsion order of the minimal Weierstrass model — without knowledge of ∣Ш∣ or L(E,1). On our test set, αBSD<0.2 achieves 99.5% precision (95% CI: 98.9%–99.8%) at 98.4% recall (95% CI: 97.5%–99.0%) for identifying ∣Ш∣>1 curves. The mechanism is the BSD formula read in reverse: since L(E,1) remains moderate for rank-0 curves (median 2.2–3.5), large ∣Ш∣ forces small αBSD.
We additionally report a power-law tail distribution for ∣Ш∣ across 1,900,161 curves (OLS exponent α^=2.02±0.07, R2=0.955, on log10-log10 scale for ∣Ш∣≥4, n=38 unique values). The exponent α^≈2 places the distribution at the threshold where E[∣Ш∣] transitions from divergent to convergent, consistent with Delaunay's heuristics [7] for rank-0 curves. We document non-monotonic growth of maximal ∣Ш∣ with conductor ("the ragged frontier") and discuss implications for AI-assisted mathematical research.
Keywords: Elliptic curves, Tate-Shafarevich group, BSD conjecture, computational number theory, AI-assisted research methodology
1. Introduction
1.1 The Problem
The Tate-Shafarevich group Ш(E/Q) is among the most important and least accessible invariants of an elliptic curve. Computing ∣Ш∣ typically requires sophisticated descent techniques or high-precision L-function evaluation, with computational cost growing rapidly with the conductor NE. For a rank-0 curve E/Q, the Birch and Swinnerton-Dyer (BSD) formula — verified numerically for all curves in current databases (see §2.1) — gives:
L(E,1)=∣E(Q)tors∣2ΩE+⋅∏pcp(E)⋅∣Ш(E/Q)∣
A natural question arises: can we detect curves with large ∣Ш∣ without computing it directly? Such a screening tool would allow researchers to identify candidates for further study before investing in expensive ∣Ш∣ verification.
One might hope that the L-function's analytic behavior near s=1 encodes information about ∣Ш∣ beyond what the BSD formula dictates at s=1 exactly. Specifically, if L′(E,1) or higher derivatives carried a signal correlated with ∣Ш∣independent ofL(E,1) itself, this would provide a genuinely new detection mechanism. We test this hypothesis and find it does not hold (§4.2).
1.2 Summary of Contributions
This paper presents three empirical findings alongside a detailed methodological failure analysis:
αBSD screening. The BSD geometric factor αBSD=ΩE+⋅∏cp/∣Etors∣2 achieves 99.5% precision at 98.4% recall for detecting ∣Ш∣>1 curves (§4.3–4.5). The mechanism is the BSD formula itself: since ∣Ш∣=L(E,1)/αBSD and L(E,1) remains moderate, large ∣Ш∣ forces small αBSD. This is not a new theoretical insight but a practical screening tool.
Tail distribution. Across 1,900,161 curves with conductors up to ∼106, the frequency of ∣Ш∣ values follows a power law with OLS exponent α^=2.02±0.07 (R2=0.955). This places the distribution at the convergence threshold for E[∣Ш∣] (§5.1).
The ragged frontier. The maximum ∣Ш∣ observed does not grow monotonically with conductor, peaking in the mid-range (conductor ∼165,000) rather than at the database boundary (§5.3).
Methodological audit. We document in detail how a circular metric — embedding ∣Ш∣ in its own detector — produced artificially perfect results that survived multiple AI-assisted review sessions. The forensic reconstruction (§3) and resulting verification protocols (§7.4) are offered as a case study for the growing community of AI-assisted researchers.
1.3 A Note on AI-Assisted Research
This work was conducted with substantial AI assistance (code generation, analysis, drafting). The circular reasoning documented in §3 was both introduced and reinforced by AI tools, which confidently validated tautological results across multiple sessions. The circularity was also eventually caught through AI-assisted auditing, but only after a systematic protocol was imposed requiring line-by-line formula verification.
As AI tools become standard in computational research, the failure mode described here — where circular dependencies are hidden inside generated code and validated by confident-sounding AI analysis — represents a systemic risk. We offer practical recommendations in §7.4.
2. Background and Conventions
2.1 The BSD Formula: Precise Statement
Let E/Q be an elliptic curve given in minimal Weierstrass form. We use the following conventions throughout:
ΩE+: the real period. We use Cremona's convention [1]: ΩE+ is the integral ∫E(R)∣ωE∣ of the Néron differential ωE=dx/(2y+a1x+a3) over all real components of E(R). When E(R) has two connected components, this includes both; equivalently, ΩE+=nR⋅Ωident+ where nR∈{1,2} is the number of real components and Ωident+ is the integral over the identity component alone. All Ω values in our dataset are taken directly from Cremona's tables and use this convention consistently.
cp(E): the Tamagawa number at prime p, defined as cp=[E(Qp):E0(Qp)] where E0 is the subgroup of points with non-singular reduction. These are nonzero only at primes of bad reduction and are computable from Tate's algorithm.
∣E(Q)tors∣: the order of the rational torsion subgroup. By Mazur's theorem, this is bounded by 16 and is computable in polynomial time.
∣Ш(E/Q)∣: the order of the Tate-Shafarevich group. The Cassels-Tate pairing shows that ∣Ш∣, if finite, is a perfect square. For rank-0 curves in our range, finiteness follows from Kolyvagin's theorem [5], and all tabulated values in Cremona's tables are indeed perfect squares.
L(E,s): the (uncompleted) Hasse-Weil L-function, defined as an Euler product over primes and continued analytically to C via modularity [4]. We write L(E,1) for its value at s=1 and L′(E,1)=dsdL(E,1) for its first derivative.
Manin constant: For the optimal curve in each isogeny class, the Manin constant is conjectured to be 1 and has been verified numerically for all curves in Cremona's tables (conductor ≤500,000) [1]. All curves in our PARI/GP test set have conductor ≤129,850, well within this verified range.
For E/Q of analytic rank 0 (i.e., L(E,1)=0), Kolyvagin [5] proved that E(Q) and Ш(E/Q) are both finite. Combined with modularity (Wiles et al. [4]) and work of Gross-Zagier, these results make the BSD leading-term formula a strong theoretical expectation. Cremona's database verifies it numerically for all curves with conductor ≤500,000. The formula is:
On the Manin constant. The general BSD formula includes a factor cE2, where cE is the Manin constant of the optimal parametrization X0(N)→E. For the optimal curve in each isogeny class, cE is conjectured to equal 1 (Manin's conjecture) and has been verified numerically for all curves in Cremona's tables with conductor ≤500,000 [1]. For non-optimal curves, the appropriate period adjustment is absorbed into Cremona's tabulated ΩE+ values. Consequently, cE2=1 throughout our dataset, and we write equation (1) with this substitution already made. All subsequent equations (2), (6) inherit cE=1 from this convention. There is no regulator factor since rank(E/Q)=0.
Scope of verification. Equation (1) is the BSD conjecture's leading-term formula. It remains a conjecture in full generality — what has been established is: (i) Kolyvagin's proof that rank-0 curves have finite E(Q) and finite Ш, (ii) modularity (Wiles et al.), and (iii) Cremona's high-precision numerical verification that (1) holds exactly (to the computed precision) for every curve with conductor ≤500,000 in his tables, with Manin constant cE=1 confirmed throughout. Our claim is not that BSD is proven — it is that (1) holds as a verified numerical identity for all curves in our dataset.
so that equation (1) becomes L(E,1)=αBSD(E)⋅∣Ш(E/Q)∣.
2.2 Prior Work on Large ∣Ш∣
Large Tate-Shafarevich groups have been studied computationally through Cremona's tables [1] and the LMFDB database [2]. The largest known ∣Ш∣ values for rank-0 curves with conductor below 106 include ∣Ш∣=5625=752 (Cremona label 165066.v1) and ∣Ш∣=2500=502 (287175.n1). These values are verified via the BSD formula (1) using high-precision computation of L(E,1).
The Cohen-Lenstra heuristics [6] and their extensions (Delaunay [7], Bhargava-Kane-Lenstra-Poonen-Rains [8]) make predictions about the distribution of ∣Ш∣ in families of elliptic curves. Our tail-law observation (§5.1) provides empirical data for comparison with these predictions.
2.3 Data Sources and Selection Criteria
Full survey dataset (§5). We use a combined Cremona/LMFDB extract covering 1,900,161 curves (1,238,224 rank-0) with conductors up to ∼106. For each curve, the dataset provides: conductor N, rank, ∣Ш∣ (analytic, via BSD), and ∣Etors∣. A subset with full BSD invariants (ΩE+, ∏cp) is available for 847,550 curves from Cremona's allbsd tables. The tail law (§5.1) uses the full 1.9M dataset; the ragged frontier (§5.3) uses the subset with band-by-band conductor coverage.
PARI/GP test set (§4). To test detection performance with independently computed L-derivatives, we selected 1,198 rank-0 curves: 998 curves with ∣Ш∣>1 (all available curves with ∣Ш∣>1 and complete BSD data in our Cremona extract, conductors 106–129,850) plus 200 randomly sampled control curves with ∣Ш∣=1 from the same conductor range. The controls were sampled uniformly by conductor value. For each curve, we computed L(E,1) and L′(E,1) directly from the L-series using PARI/GP's lfun() framework, independently of all BSD invariants.
Separation between datasets. The full survey dataset provides the population statistics (∣Ш∣ distribution, frontier structure). The PARI/GP test set provides the detection performance claims. These are nested: the 1,198-curve test set is a subset of the 1.75M survey. No train/test split is applicable since αBSD is a formula, not a fitted model — the threshold τ=0.2 is chosen for interpretability, not optimized on training data.
3. The Original Metric and Its Circularity
We document the failure of our initial approach in detail, as the forensic process is instructive for AI-assisted research methodology.
intended to detect curves whose L-function behavior near s=1 was anomalous relative to their conductor. The hypothesis was that curves with large ∣Ш∣ might exhibit unusual sensitivity to perturbation at s=1, detectable through the ratio of derivative to value.
However, the implementation (ghost_harvester_v1.0.py, dated November 25, 2025) did not compute L′(E,1) from the L-series. Instead, it set L′(E,1)≈1 and reconstructed L(E,1) from BSD invariants via equation (1), yielding:
Sactual=∣Ш∣⋅ΩE+⋅∏cp⋅logNE∣Etors∣2
Since most curves in our dataset had torsion order 1, and ΩE+ and ∏cp were set to placeholder values of 1 (the actual values were not loaded), this collapsed to:
S \approx \frac{1}{|\text{Ш}| \cdot \log N_E} \tag{4}
The "perfect separation" between curves with ∣Ш∣=1 and ∣Ш∣>1 was trivially guaranteed: ∣Ш∣ appeared directly in the denominator.
3.2 The Diffusion Law Circularity
A "diffusion index" was defined as D=−log10S. We claimed to discover an empirical law:
D=e1log10∣Ш∣+C
with slope matching the mathematical constant 1/e≈0.6065. Forensic audit (February 27, 2026) revealed that every calibration data point was generated by this formula and then hardcoded into calibration scripts. The regression trivially recovered its own input. The reported R2=0.9999 was an artifact of self-reference, as shown in Table 1.
Table 1: Circularity Verification — Hardcoded D values vs. formula prediction
All calibration D values match e1⋅log10∣Ш∣ within rounding error (∣Δ∣<0.007), confirming that the "discovered" law was computed from ∣Ш∣, not measured independently. (A fifth data point, curve 165066.d3, showed D=2.50 versus predicted 1.87 — see §6 for discussion of its ambiguous provenance.)
3.3 The v2 Pivot and Its Limitations
On December 9, 2025, we detected a tautology in a derived quantity RBSD=C/S, which collapsed to log(N)/N — a function of conductor alone carrying no curve-specific information. This was explicitly documented as "a tautology" in project files (R_BSD_v2_BREAKTHROUGH.md).
We pivoted to a v2 approach conceptually separating analytic and geometric inputs:
Analytic:C(E)=∣L′(E,1)∣/(∣L(E,1)∣⋅N)
Geometric:αBSD(E)=ΩE+⋅∏cp/∣Etors∣2
The conceptual design was sound. But the implementation (r_bsd_period_tamagawa_test.py) estimated L′(E,1) using a synthetic function that depended on ∣Ш∣: ghost curves received artificially low L′ estimates, normals received high ones. The reported "232× separation" was therefore still contaminated by circularity.
3.4 How the Circularity Persisted
The December 2025 tautology detection and v2 pivot were documented in the project repository. However, when the main paper was revised for submission in February 2026, the AI assistant working in a new session context did not access the v2 documentation. It reverted to the original calibration data, producing paper v2.3 with circular claims intact.
This illustrates a failure mode specific to AI-assisted workflows: corrections made in one session can be silently lost when a new session starts with different context. The fix existed in the codebase but was invisible to the drafting process because the AI did not re-read the correction files.
4. The Corrected Analysis
4.1 Computing Real L′(E,1)
To break the circularity, we computed L(E,1) and L′(E,1) directly from L-series data using PARI/GP (version 2.15 [3]) with default long precision (∼38 significant digits). For each of 1,198 rank-0 curves (§2.3), we:
Constructed the elliptic curve from its a-invariants via ellinit()
Computed L(E,1) via lfun(E, 1) — the uncompleted Hasse-Weil L-function at s=1
Computed L′(E,1) via lfun(E, 1, 1) — the first derivative at s=1
PARI/GP's lfun computes L-values via functional equation and inverse Mellin transforms, independent of BSD invariants. The computation is from the Euler product / Dirichlet series representation and does not use Ω, Tamagawa numbers, torsion, or ∣Ш∣ at any stage.
Numerical consistency check. We compared LPARI=lfun(E,1) against LBSD=αBSD⋅∣Ш∣ computed from Cremona's tabulated values. Agreement to <10−6: 1,166 out of 1,198 curves (97.3%). The 32 discrepant curves (2.7%) showed deviations of 10−6 to 10−3, attributable to precision differences between Cremona's tabulated Ω values and our PARI/GP recomputation. No curve exceeded 10−3 discrepancy.
Important caveat on this check. This comparison is not an independent test of the BSD formula, since Cremona's tabulated ∣Ш∣ values are themselves computed via equation (1). What the comparison validates is narrower: (i) our PARI/GP pipeline computes the same L(E,1) as Cremona's pipeline, and (ii) no data corruption or parsing errors occurred. The theoretical validity of (1) rests on the results cited in §2.1, not on this numerical check.
4.2 The L-Derivative Does Not Distinguish Ghost Curves
We computed the normalized logarithmic derivative for all 1,198 curves:
The normalization by logNE was inherited from the original (circular) metric definition (3). The choice of logN (rather than N or N itself) does not affect the key finding, which concerns the relative behavior between ghost and normal curves, not the absolute value.
Result:Sreal is tightly concentrated for all rank-0 curves regardless of ∣Ш∣:
Statistic
All curves (n=1,198)
Ghost (n=998)
Normal (n=200)
Median
0.285
0.286
0.280
IQR
0.018
—
—
Std. dev.
0.021
—
—
The ghost and normal medians differ by only 2% (0.286/0.280=1.02). The interquartile range of 0.018 on a median of 0.285 indicates a coefficient of variation of approximately 7%. This tight concentration holds regardless of ∣Ш∣: curves with ∣Ш∣=4 and ∣Ш∣=676 show statistically indistinguishable Sreal distributions.
Equivalently, the instability measure C(E)=∣L′(E,1)∣/(∣L(E,1)∣⋅NE) shows only a small difference between classes (ghost median C=1.15×10−2, normal median C=1.26×10−2, ratio 0.91×). A Mann-Whitney U test detects this difference as statistically significant (p=4×10−4) but with a small effect size (rank-biserial r=0.16). By contrast, the αBSD separation has p=9.3×10−107 and r=0.98. The L-derivative carries a weak statistical signal, but it is far too small to be useful for detection — the entire practical separation comes from αBSD.
Why not? Heuristically, L′(E,1) measures the slope of the L-function at the central point. For a rank-0 curve, L(E,1)=0 and the L-function is smooth near s=1; the slope is determined by the functional equation and the distribution of Fourier coefficients ap(E), which do not know about ∣Ш∣ directly. The BSD formula constrains only the valueL(E,1), not its derivative.
4.3 The αBSD Screening Method
Despite the failure of the analytic approach, a genuine signal exists in the geometric side. The BSD factor αBSD(E) from equation (2) is computable from a curve's minimal Weierstrass model and local reduction data, without any knowledge of ∣Ш∣, L(E,1), or L′(E,1).
Our 1,198-curve dataset reveals:
Class
n
Median αBSD
Q1 – Q3
Median C
Ghost ($
\text{Ш}
> 1$)
998
0.052
Normal ($
\text{Ш}
= 1$)
200
1.691
Separation ratio
32.2×
0.91×
The distributions are well-separated: the ghost IQR (0.033–0.081) does not overlap with the normal IQR (0.933–2.854). A two-sample Kolmogorov-Smirnov test confirms the separation (D=0.967, p=3.0×10−194). The Mann-Whitney U test gives U=1,930, p=9.3×10−107, with rank-biserial effect size r=0.98 — near-perfect rank separation. The entire practical separation comes from αBSD; C carries only a negligible signal (§4.2).
4.4 Why αBSD Works — and Its Limitations
The mechanism is equation (1) read in reverse (recall cE=1 throughout, see §2.1):
We observe that L(E,1) is moderate for rank-0 curves:
Ghost median L(E,1)=3.48; normal median L(E,1)=2.21
Ghost mean L(E,1)=4.07; normal mean L(E,1)=2.54
Ghost curves have largerL(E,1) (because ∣Ш∣≫1 more than compensates for small αBSD), but the variation is modest — within a factor of 2. Meanwhile, αBSD varies by a factor of 32 between classes. So αBSD carries most of the information about ∣Ш∣ that the BSD formula provides.
This is not a new theoretical insight. It is equation (1) deployed as a computational screen. The practical value is cost asymmetry:
Computing αBSD requires: period integration (ΩE+, fast from the Weierstrass model), Tate's algorithm (cp, fast at finitely many primes), and Mazur's theorem (∣Etors∣, fast with bounded output).
Verifying ∣Ш∣ requires: computing L(E,1) to high precision and solving (6), or performing explicit descent.
For large-scale surveys, αBSD<0.2 is a fast filter that identifies candidate curves before expensive verification.
Important caveat. Because αBSD screening is BSD inversion, it is not independent of the BSD formula. In our setting (rank-0 curves, conductor ≤500,000), the BSD leading-term formula has been verified numerically for all curves in Cremona's tables with Manin constant cE=1 confirmed throughout, so this is not a practical limitation. For curves outside this verified range, αBSD screening would be conditional on BSD.
4.5 Detection Performance
Using αBSD<τ as a threshold classifier on the 1,198-curve test set:
Threshold τ
TP
FP
FN
TN
Precision (95% CI)
Recall (95% CI)
0.5
998
16
0
184
98.4% (97.5–99.0%)
100.0% (99.6–100%)
0.2
982
5
16
195
99.5% (98.9–99.8%)
98.4% (97.5–99.0%)
0.1
838
2
160
198
99.8% (99.2–100%)
83.9% (81.5–86.1%)
0.05
471
2
527
198
99.6% (98.6–99.9%)
47.2% (44.1–50.3%)
Confidence intervals are Wilson score intervals at the 95% level. At the τ=0.2 operating point, the classifier flags 987 curves, of which 982 genuinely have ∣Ш∣>1. The 5 false positives (normal curves with αBSD<0.2) merit individual investigation to determine whether they have unusual geometric structure.
Note on threshold selection. The threshold τ=0.2 was not optimized on this dataset. It was chosen as a round number that falls in the gap between the ghost and normal αBSD distributions. Any threshold between 0.1 and 0.5 yields precision above 98%.
5. Empirical Findings from 1.9 Million Curves
The following results use the full survey of 1,900,161 curves (§2.3) and are independent of the αBSD screening analysis. They are direct observations from the database.
5.1 The Tail Law
Among the 1,900,161 curves, 256,199 have ∣Ш∣>1. All observed ∣Ш∣ values are perfect squares, consistent with the Cassels-Tate alternating pairing (which requires ∣Ш∣ to be a perfect square when finite). We observe 38 distinct values of ∣Ш∣>1, ranging from 4 (=22) to 5625 (=752).
For ∣Ш∣≥4, we fit a power law via ordinary least squares on the log10-log10 scale:
yielding α^=2.02±0.07 (standard error), R2=0.955, p<10−15, on n=38 data points.
Model comparison. We compared the power law (2 parameters) against a quadratic model on log-log scale (3 parameters) via the Bayesian Information Criterion. The quadratic model gives a marginally lower BIC (ΔBIC=−2.1), indicating that the two models are comparable — the data does not strongly distinguish a pure power law from a curved log-log relationship. We report the power-law exponent as a convenient summary statistic, not as a claim of exact power-law behavior.
Cutoff sensitivity. The exponent varies with the lower ∣Ш∣ cutoff:
Cutoff
α^
SE
R2
n points
≥4
2.02
0.07
0.955
38
≥9
2.03
0.08
0.945
37
≥25
2.00
0.10
0.922
35
≥49
1.94
0.12
0.895
33
≥100
1.77
0.14
0.845
30
The exponent is stable (α^≈2.0) for cutoffs up to ∣Ш∣≥25, then decreases to ∼1.77 at cutoff ≥100. The decreasing exponent at higher cutoffs indicates that the extreme tail is heavier than the body of the distribution — there are more very-large-∣Ш∣ curves than a k−2 power law would predict. (An earlier version of this paper reported α^=1.76 based on a cutoff ≥100 subset; the full-range fit gives α^=2.02.)
The convergence threshold. The exponent α^≈2 is theoretically significant. If count(∣Ш∣=k)∼k−α across N curves, then the population mean E[∣Ш∣]=N−1∑kk⋅count(k) converges if and only if α>2. At α^=2.02, the mean just barely converges — the distribution is at the critical threshold. This is consistent with Delaunay's extension of the Cohen-Lenstra heuristics [7] to Tate-Shafarevich groups, which predicts finite but potentially large moments for rank-0 families, and with the Bhargava-Kane-Lenstra-Poonen-Rains framework [8] for modeling ∣Ш∣ distributions. A detailed quantitative comparison of our frequency data against the specific Delaunay predictions (which involve prime-by-prime contributions to ∣Ш∣ rather than a simple power law) is left for future work.
5.2 The No-Go Region
In the (N,∣Ш∣) plane, we observe 908 curves with ∣Ш∣>100 out of 1,900,161 total. The global maximum is ∣Ш∣=5625=752 at conductor N=165,066 (Cremona label 165066.v1, the "Leviathan"), well below the database's conductor frontier.
5.3 The Ragged Frontier
Partitioning the data into conductor bands reveals a non-monotonic pattern in maximal ∣Ш∣:
Band 4 has lower maximal ∣Ш∣ than Band 3, and the sparse band (where LMFDB coverage drops to 4,754 curves) shows a dramatic decline. Within the well-covered range (Bands 1–4, covering 1.69M curves), the non-monotonic pattern is robust to band boundary choices.
Caveat on the sparse band. The drop at conductors >500,000 is likely dominated by database incompleteness (4,754 curves versus 100,000+ in other bands). We do not claim a physical decline in maximal ∣Ш∣ at high conductors. The frontier pattern in Bands 1–4 uses conductor ranges with dense coverage.
Note on priority. The individual monster curves (including the Leviathan) are previously known from Cremona's tables. Our contribution is the systematic band-by-band survey revealing the non-monotonic frontier structure.
5.4 The Monster Catalog
The four principal high-∣Ш∣ curves in our dataset:
All have αBSD<0.1, consistent with the screening threshold in §4.5.
6. The d3 Anomaly — An Open Question
Among the calibration data points examined in our forensic audit (§3.2), curve 165066.d3 was the sole case where the hardcoded diffusion value (D=2.50) diverged from the ∣Ш∣-based formula (e1log10(1225)=1.87, deviation 0.63). All other calibration values matched the formula exactly.
We present this anomaly for completeness but with strong caveats:
Provenance is unverified. We cannot reconstruct how the d3 measurement D=2.50 was originally computed. It may have been an independent measurement, a typo, or a different formula. No code or log file documents its origin.
The original D metric is meaningless. Even if D=2.50 was independently measured, it was measuring a quantity (D=−log10S where S is the circular metric) that does not correspond to any well-defined analytic property of the curve.
The twist observation is separable. Independently of the D anomaly, the quadratic twist of 165066.d3 yields a curve at conductor 5,282,112 with ∣Ш∣=1225=352 — a "ghost breeding" phenomenon where a low-∣Ш∣ base curve produces a high-∣Ш∣ twist. This is a potentially interesting observation but requires independent verification beyond current database ranges.
We flag this as an open question rather than a finding.
7. Discussion
7.1 What αBSD Screening Offers
The αBSD screen is the BSD formula deployed as a filter: compute everything except ∣Ш∣, and flag curves where the remaining factors force ∣Ш∣ to be large (assuming BSD). This is useful in scenarios where:
One has access to period and Tamagawa data (e.g., from Cremona-style tables or fast recomputation) but not to verified ∣Ш∣ values
One is extending databases to higher conductors and wants to prioritize verification effort
One is studying a family of curves and wants to identify members with interesting arithmetic without computing L(E,1)
It does not provide information about ∣Ш∣ beyond what BSD already encodes. It is a practical tool, not a theoretical advance.
7.2 The Tail Law and Delaunay Heuristics
The observed exponent α^≈2.02 places the ∣Ш∣ frequency distribution at the threshold where E[∣Ш∣] transitions from divergent (α≤2) to convergent (α>2). This is a potentially meaningful coincidence in the context of the Delaunay heuristics [7], which extend the Cohen-Lenstra philosophy from class groups of number fields to Tate-Shafarevich groups of elliptic curves.
Delaunay's framework predicts specific probabilities for ∣Ш∣ values based on prime-by-prime contributions from the alternating pairing structure of Ш. These predictions are more structured than a simple power law — they involve products over prime divisors of ∣Ш∣ analogous to the Cohen-Lenstra weights for class numbers. Whether our observed frequency distribution (which is well-approximated by k−2 over three orders of magnitude) is consistent with, or deviates from, the Delaunay predictions at the level of individual ∣Ш∣ values is an open question.
The heavier tail at large ∣Ш∣ (exponent dropping from 2.02 at cutoff ≥4 to 1.77 at cutoff ≥100) suggests the largest ∣Ш∣ values occur more frequently than a pure k−2 law predicts. This tail enrichment may reflect specific arithmetic mechanisms (e.g., particular conductor factorizations or isogeny structures) that favor extreme ∣Ш∣ growth.
The exponent may also depend on how curves are ordered (by conductor, height, or discriminant) and on the conductor range sampled. Our measurement covers curves ordered by conductor up to ∼106.
7.3 The Ragged Frontier and Conductor Geometry
The non-monotonic frontier challenges naive expectations. One might expect that as the conductor grows, more "room" exists for large ∣Ш∣. Instead, the data suggest that conductor factorization structure — not just magnitude — influences the attainable ∣Ш∣ values.
We note that LMFDB coverage is highly non-uniform above conductor 500,000, which dominates the observed drop in the sparse band. The frontier pattern in Bands 1–4 (where coverage exceeds 100,000 curves per band) is more reliable.
7.4 Verification Protocols for AI-Assisted Research
Our experience suggests three practical protocols:
Protocol 1: Numerical spot-checks. Before accepting any claimed empirical law, substitute concrete numbers into every formula in the pipeline and verify the output against independently computed values. In our case, checking D=e1log10(5625)=2.2746 against the hardcoded D=2.27 would have revealed the circularity immediately.
Protocol 2: Physical column deletion. When building a detector for property X, physically remove column X from the input data file. If the code fails to run, X was in the pipeline. Do not rely on variable renaming or abstraction layers — these are transparent to the computation.
Protocol 3: Correction persistence. When an AI session discovers a critical error, write the correction into a file that future sessions will encounter by default (e.g., a project rule file, a header warning in the main document, or a configuration that loads automatically). Corrections documented only in subsidiary files will be silently lost when new sessions begin with different context windows.
7.5 Limitations
Conductor range. Our PARI/GP test set covers conductors 106–129,850. The αBSD screening performance at higher conductors (where L(E,1) distributions may differ) is untested.
BSD dependence. The αBSD method assumes BSD holds. In our setting (rank-0 curves, conductor ≤500,000, verified Manin constant), equation (1) has been verified numerically for every curve. For curves outside this range, the assumption is conditional on BSD.
Control group size. Our test set used 200 normal versus 998 ghost curves. The precision estimate (which depends on the false positive rate among normals) has wider confidence intervals than the recall estimate. A larger control sample would tighten the precision bounds.
Class balance. The 998:200 ghost-to-normal ratio in our test set does not reflect the population ratio (where ∣Ш∣=1 is dominant). In a population-representative sample, the false positive count would be much larger in absolute terms, though the false positive rate would remain the same.
Tail law model selection. BIC comparison between power-law and quadratic models is marginal (ΔBIC=−2.1). More sophisticated alternatives (log-normal, stretched exponential, or Delaunay-derived distributions) were not tested.
Single dataset. All observations are from Cremona's tables / LMFDB. Independent verification using differently computed databases would strengthen the findings.
8. Conclusion
The original "Ghost Rank" stability metric and "spectral diffusion law" do not survive independent verification. With L-series derivatives computed via PARI/GP, the normalized ratio ∣L′(E,1)∣/∣L(E,1)∣ is tightly concentrated (IQR =0.018) across all rank-0 curves, showing no dependence on ∣Ш∣.
A genuine screening signal exists in αBSD=ΩE+⋅∏cp/∣Etors∣2. At threshold α<0.2, this achieves 99.5% precision (95% CI: 98.9–99.8%) and 98.4% recall (95% CI: 97.5–99.0%) for detecting ∣Ш∣>1 on 1,198 curves with verified L-derivatives. The mechanism is BSD inversion — a practical tool, not a theoretical discovery.
The frequency distribution of ∣Ш∣ follows an approximate power law with exponent α^=2.02±0.07 across 1.9 million curves (R2=0.955, n=38 unique values). The exponent sits at the convergence threshold for E[∣Ш∣], consistent with Delaunay heuristics. The tail is enriched relative to a pure k−2 law (exponent drops to ∼1.77 for ∣Ш∣≥100). The "ragged frontier" — non-monotonic growth of maximal ∣Ш∣ with conductor — is a genuine empirical observation in the well-covered conductor range.
The forensic process of discovering and correcting circular reasoning in AI-assisted computational mathematics is itself a methodological contribution. The circularity was introduced by AI code generation, reinforced by AI analysis, and eventually caught by AI auditing under a systematic protocol. We propose three verification protocols (§7.4) for researchers working with AI tools.
Acknowledgments
This research used data from the LMFDB [2] and Cremona's elliptic curve database [1]. Computations were performed using PARI/GP [3] and Python. The author thanks the developers of these tools for making large-scale computational number theory accessible.
The honest account of methodological failure in §3 reflects the author's commitment to scientific transparency. The paper was refined through iterative AI review via TOEShare (theoryofeverything.ai), which identified gaps in mathematical precision and statistical methodology addressed in this version.
References
[1] J.E. Cremona, Algorithms for Modular Elliptic Curves, 2nd ed., Cambridge University Press, 1997. Updated tables: https://johncremona.github.io/ecdata/
[2] The LMFDB Collaboration, The L-functions and Modular Forms DataBase, https://www.lmfdb.org, 2025.
[4] A. Wiles, Modular elliptic curves and Fermat's Last Theorem, Ann. Math. 141 (1995), 443–551.
[5] V.A. Kolyvagin, Finiteness of E(Q) and Ш(E/Q) for a subclass of Weil curves, Izv. Akad. Nauk SSSR 52 (1988), 522–540.
[6] H. Cohen and H.W. Lenstra, Heuristics on class groups of number fields, Lecture Notes in Math. 1068 (1984), 33–62.
[7] C. Delaunay, Heuristics on Tate-Shafarevitch groups of elliptic curves defined over Q, Experiment. Math. 10 (2001), 191–196.
[8] M. Bhargava, D.M. Kane, H.W. Lenstra, B. Poonen, E. Rains, Modeling the distribution of ranks, Selmer groups, and Shafarevich-Tate groups of elliptic curves, Cambridge J. Math. 3 (2015), 275–321.
Tail law analysis:calibration/tail_law_analysis.py (formal power-law fitting with BIC)
Appendix A: Circularity Verification
The script calibration/verify_d_values.py computes e1⋅log10(∣Ш∣) for each calibration monster and compares to the hardcoded D value. All non-anomaly points match within ∣Δ∣<0.007, confirming that D was computed from ∣Ш∣ rather than measured independently.
Appendix B: PARI/GP Computation Details
L-series values were computed via the PARI/GP commands:
The lfun function computes L-values via inverse Mellin transforms using the functional equation, with precision controlled by PARI's default stack precision (38+ digits). Total batch computation time: approximately 6 minutes for 1,198 curves on a standard workstation (Intel i7, 16GB RAM, WSL2/Ubuntu).
Appendix C: Chronological Project Timeline
A complete listing of all project files and their modification dates (November 25, 2025 through February 27, 2026) is maintained in TIMELINE.md, documenting nine development phases from initial data collection through the final forensic audit.
"We built a machine to detect treasure. The machine was cheating. We caught it. We fixed it. There was still treasure."