paper Review Profile

Detecting Large Tate-Shafarevich Groups via BSD Geometric Invariants: Lessons from a Computational Audit of 1.9 Million Elliptic Curves

publishedby Adam MurphyCreated 3/3/20261 review

3.6/ 5

Composite

We investigate computational methods for identifying elliptic curves with anomalously large Tate-Shafarevich groups ($|Ш| ≫ 1$) among rank-0 curves over $ℚ$. After documenting and correcting circular reasoning in AI-assisted analysis, we find that the BSD geometric factor $α_{BSD}(E) = Ω_E^+ · ∏_p c_p(E) / |E(ℚ)_{tors}|^2$ achieves 99.5% precision at 98.4% recall for detecting $|Ш| > 1$ curves. We additionally report a power-law tail distribution for $|Ш|$ across 1.9 million curves with exponent $α̂ = 2.02 ± 0.07$, placing the distribution at the convergence threshold for $𝔼[|Ш|]$.

Read the Full Breakdown

Internal Consistency

4/5

The paper maintains strong internal consistency throughout most sections. The BSD formula application is coherent, the circular reasoning analysis is self-consistent, and the corrected methodology follows logically. However, there are minor inconsistencies: the paper sometimes blurs verification scope statements (claiming verification 'for all curves in our dataset' while later qualifying this to conductor ≤ 500,000), and the characterization of L(E,1) as 'moderate' in §4.4 creates tension with the extreme example (L(E,1) = 306.6) cited later in §5.4.

Mathematical Validity

4/5

The core mathematical content is sound. The BSD formula is correctly applied, the circularity diagnosis is mathematically accurate (showing S ≈ 1/(|Ш| · log N)), and the α_BSD inversion mechanism is valid. However, several issues affect validity: normalization conventions are asserted but not fully justified beyond empirical agreement checks that depend on BSD-derived |Ш|; the power-law fitting uses OLS on log-counts with only 38 points without proper statistical model validation; and the claim about L(E,1) being 'moderate' lacks rigorous bounds given examples like L(E,1) = 306.6. The statistical methodology for tail exponent estimation needs stronger foundation.

Falsifiability

4/5

The work makes clear, testable predictions with specific performance metrics (α_BSD < 0.2 achieves 99.5% precision at 98.4% recall) and includes confidence intervals. The power-law exponent claim (α̂ = 2.02 ± 0.07) is directly testable on other datasets. The authors demonstrate commitment to falsifiability by explicitly documenting how their original method failed when independently verified. The score is 4 rather than 5 because some observational claims like the 'ragged frontier' are less predictive than the main screening results.

Clarity

5/5

This is exceptionally well-written scientific communication. The paper clearly distinguishes between failed approaches and valid results, provides detailed explanations of all notation and conventions, and maintains consistent terminology throughout. The forensic reconstruction of circular reasoning is particularly clear and educational. Statistical results include proper confidence intervals and effect sizes. The abstract accurately summarizes findings without overselling, and the methodological transparency sets a high standard for computational mathematics research.

Novelty

2/5

The authors are refreshingly honest that α_BSD screening 'is not a new theoretical insight' but rather the BSD formula used computationally. The main mathematical result is BSD formula inversion, which is straightforward algebra rather than a theoretical advance. The empirical observations (power-law distribution, ragged frontier) are new data patterns but not fundamental discoveries. The most valuable contribution is methodological - documenting AI-assisted research pitfalls - but this concerns research methodology rather than mathematical content. The work provides practical value through careful empirical analysis but limited theoretical novelty.

Completeness

3/5

The paper is largely complete in its internal logic and mathematical exposition, with variables defined and methodology explained. However, several key gaps affect completeness: the 1.9M-curve dataset construction is under-specified (exact LMFDB queries, deduplication methods, completeness criteria); the detection performance evaluation uses a highly non-representative test set (998 vs 200) with unclear sampling procedures that may not generalize; and the tail-law statistical methodology relies on OLS fitting with only 38 points without adequate robustness analysis. The power-law claim would benefit from discrete probability modeling and goodness-of-fit testing.

Evidence Strength

3/5

The evidence has significant strengths in its transparency and internal validation, but key limitations affect strength. Strengths include the thorough documentation of methodological failure, independent L-function computation via PARI/GP, and careful statistical reporting with confidence intervals. However, the evidence is weakened by: heavy reliance on a single database source without independent verification; detection performance evaluated on a deliberately biased test set that may not reflect population performance; and tail-law claims based on limited statistical methodology (OLS on 38 points) without adequate robustness checks or alternative model comparisons. The circular reasoning documentation is exemplary, but the positive findings need stronger empirical support.

This paper presents a valuable case study in computational mathematics research methodology, demonstrating both the pitfalls and potential of AI-assisted analysis. The work's greatest contribution may be its transparent documentation of how circular reasoning can be introduced and perpetuated in AI-assisted research, providing practical lessons for the growing community of researchers using these tools. Mathematically, the paper is fundamentally sound but offers limited theoretical novelty. The α_BSD screening method is acknowledged by the authors as simply the BSD formula read in reverse - not a new insight, but a practical computational tool. The empirical findings (power-law distribution with exponent near 2, non-monotonic conductor dependence) are interesting observations that warrant further investigation, though the statistical methodology for the tail-law analysis could be strengthened. The paper's exceptional clarity and methodological transparency partially compensate for its limited novelty. The forensic reconstruction of the circular reasoning is educational and valuable for the community. However, several empirical claims rest on methodological choices that are not fully justified - particularly the power-law fitting procedure and the generalizability of detection performance from a biased test set to the broader population. Overall, this represents solid computational mathematics with exemplary research integrity, but the theoretical contributions are modest and some empirical claims need stronger statistical foundations.

Strengths

+Exceptional methodological transparency and documentation of research failure modes
+Clear mathematical exposition with proper handling of BSD formula conventions
+Practical computational tool (α_BSD screening) with well-characterized performance metrics
+Valuable case study for AI-assisted research methodology and pitfall avoidance
+Honest assessment of novelty limitations and appropriate scope of claims

Areas for Improvement

-Strengthen statistical methodology for power-law tail analysis with discrete probability models and goodness-of-fit testing
-Provide more complete specification of the 1.9M-curve dataset construction and validation procedures
-Address generalizability concerns for detection performance by evaluating on representative test sets
-Justify normalization conventions more rigorously beyond empirical agreement checks
-Develop stronger theoretical connections between observed power-law exponent and existing heuristics

Detecting Large Tate-Shafarevich Groups via BSD Geometric Invariants: Lessons from a Computational Audit of 1.9 Million Elliptic Curves

Author: Adam Murphy

Date: February 27, 2026

Version: 3.2.1

Created: 2026-02-27 PST Last Modified: 2026-02-27 14:43 PST

Abstract

We investigate computational methods for identifying elliptic curves with anomalously large Tate-Shafarevich groups ( $|\text{Ш}| \gg 1$ ) among rank-0 curves over $\mathbb{Q}$ . An initial stability metric based on L-function Taylor coefficients achieved apparent perfect separation but was found to embed $|\text{Ш}|$ in its own denominator — a circularity we document as a case study in AI-assisted research pitfalls. After correcting with independently computed L-series derivatives (via PARI/GP, $n = 1{,}198$ curves), we find that the normalized logarithmic derivative $|L'(E,1)| / (|L(E,1)| \cdot \log N_E)$ is tightly concentrated (median $0.285$ , IQR $0.018$ ) with no dependence on $|\text{Ш}|$ , eliminating the original detection mechanism.

A genuine screening signal survives in the BSD geometric factor

$\alpha_{\text{BSD}}(E) = \frac{\Omega_E^+ \cdot \prod_p c_p(E)}{|E(\mathbb{Q})_{\text{tors}}|^2}$

computed from the real period, Tamagawa numbers, and torsion order of the minimal Weierstrass model — without knowledge of $|\text{Ш}|$ or $L(E,1)$ . On our test set, $\alpha_{\text{BSD}} < 0.2$ achieves 99.5% precision (95% CI: 98.9%–99.8%) at 98.4% recall (95% CI: 97.5%–99.0%) for identifying $|\text{Ш}| > 1$ curves. The mechanism is the BSD formula read in reverse: since $L(E,1)$ remains moderate for rank-0 curves (median $2.2$ – $3.5$ ), large $|\text{Ш}|$ forces small $\alpha_{\text{BSD}}$ .

We additionally report a power-law tail distribution for $|\text{Ш}|$ across 1,900,161 curves (OLS exponent $\hat{\alpha} = 2.02 \pm 0.07$ , $R^2 = 0.955$ , on $\log_{10}$ - $\log_{10}$ scale for $|\text{Ш}| \geq 4$ , $n = 38$ unique values). The exponent $\hat{\alpha} \approx 2$ places the distribution at the threshold where $\mathbb{E}[|\text{Ш}|]$ transitions from divergent to convergent, consistent with Delaunay's heuristics [7] for rank-0 curves. We document non-monotonic growth of maximal $|\text{Ш}|$ with conductor ("the ragged frontier") and discuss implications for AI-assisted mathematical research.

Keywords: Elliptic curves, Tate-Shafarevich group, BSD conjecture, computational number theory, AI-assisted research methodology

1. Introduction

1.1 The Problem

The Tate-Shafarevich group $\text{Ш}(E/\mathbb{Q})$ is among the most important and least accessible invariants of an elliptic curve. Computing $|\text{Ш}|$ typically requires sophisticated descent techniques or high-precision L-function evaluation, with computational cost growing rapidly with the conductor $N_E$ . For a rank-0 curve $E/\mathbb{Q}$ , the Birch and Swinnerton-Dyer (BSD) formula — verified numerically for all curves in current databases (see §2.1) — gives:

$L(E,1) = \frac{\Omega_E^+ \cdot \prod_p c_p(E) \cdot |\text{Ш}(E/\mathbb{Q})|}{|E(\mathbb{Q})_{\text{tors}}|^2}$

A natural question arises: can we detect curves with large $|\text{Ш}|$ without computing it directly? Such a screening tool would allow researchers to identify candidates for further study before investing in expensive $|\text{Ш}|$ verification.

One might hope that the L-function's analytic behavior near $s = 1$ encodes information about $|\text{Ш}|$ beyond what the BSD formula dictates at $s = 1$ exactly. Specifically, if $L'(E,1)$ or higher derivatives carried a signal correlated with $|\text{Ш}|$ independent of $L(E,1)$ itself, this would provide a genuinely new detection mechanism. We test this hypothesis and find it does not hold (§4.2).

1.2 Summary of Contributions

This paper presents three empirical findings alongside a detailed methodological failure analysis:

$\alpha_{\text{BSD}}$ screening. The BSD geometric factor $\alpha_{\text{BSD}} = \Omega_E^+ \cdot \prod c_p / |E_{\text{tors}}|^2$ achieves 99.5% precision at 98.4% recall for detecting $|\text{Ш}| > 1$ curves (§4.3–4.5). The mechanism is the BSD formula itself: since $|\text{Ш}| = L(E,1) / \alpha_{\text{BSD}}$ and $L(E,1)$ remains moderate, large $|\text{Ш}|$ forces small $\alpha_{\text{BSD}}$ . This is not a new theoretical insight but a practical screening tool.
Tail distribution. Across 1,900,161 curves with conductors up to $\sim 10^6$ , the frequency of $|\text{Ш}|$ values follows a power law with OLS exponent $\hat{\alpha} = 2.02 \pm 0.07$ ( $R^2 = 0.955$ ). This places the distribution at the convergence threshold for $\mathbb{E}[|\text{Ш}|]$ (§5.1).
The ragged frontier. The maximum $|\text{Ш}|$ observed does not grow monotonically with conductor, peaking in the mid-range (conductor $\sim 165{,}000$ ) rather than at the database boundary (§5.3).
Methodological audit. We document in detail how a circular metric — embedding $|\text{Ш}|$ in its own detector — produced artificially perfect results that survived multiple AI-assisted review sessions. The forensic reconstruction (§3) and resulting verification protocols (§7.4) are offered as a case study for the growing community of AI-assisted researchers.

1.3 A Note on AI-Assisted Research

This work was conducted with substantial AI assistance (code generation, analysis, drafting). The circular reasoning documented in §3 was both introduced and reinforced by AI tools, which confidently validated tautological results across multiple sessions. The circularity was also eventually caught through AI-assisted auditing, but only after a systematic protocol was imposed requiring line-by-line formula verification.

As AI tools become standard in computational research, the failure mode described here — where circular dependencies are hidden inside generated code and validated by confident-sounding AI analysis — represents a systemic risk. We offer practical recommendations in §7.4.

2. Background and Conventions

2.1 The BSD Formula: Precise Statement

Let $E/\mathbb{Q}$ be an elliptic curve given in minimal Weierstrass form. We use the following conventions throughout:

$\Omega_E^+$ : the real period. We use Cremona's convention [1]: $\Omega_E^+$ is the integral $\int_{E(\mathbb{R})} |\omega_E|$ of the Néron differential $\omega_E = dx/(2y + a_1 x + a_3)$ over all real components of $E(\mathbb{R})$ . When $E(\mathbb{R})$ has two connected components, this includes both; equivalently, $\Omega_E^+ = n_{\mathbb{R}} \cdot \Omega^+_{\text{ident}}$ where $n_{\mathbb{R}} \in \{1, 2\}$ is the number of real components and $\Omega^+_{\text{ident}}$ is the integral over the identity component alone. All $\Omega$ values in our dataset are taken directly from Cremona's tables and use this convention consistently.
$c_p(E)$ : the Tamagawa number at prime $p$ , defined as $c_p = [E(\mathbb{Q}_p) : E^0(\mathbb{Q}_p)]$ where $E^0$ is the subgroup of points with non-singular reduction. These are nonzero only at primes of bad reduction and are computable from Tate's algorithm.
$|E(\mathbb{Q})_{\text{tors}}|$ : the order of the rational torsion subgroup. By Mazur's theorem, this is bounded by 16 and is computable in polynomial time.
$|\text{Ш}(E/\mathbb{Q})|$ : the order of the Tate-Shafarevich group. The Cassels-Tate pairing shows that $|\text{Ш}|$ , if finite, is a perfect square. For rank-0 curves in our range, finiteness follows from Kolyvagin's theorem [5], and all tabulated values in Cremona's tables are indeed perfect squares.
$L(E,s)$ : the (uncompleted) Hasse-Weil L-function, defined as an Euler product over primes and continued analytically to $\mathbb{C}$ via modularity [4]. We write $L(E,1)$ for its value at $s = 1$ and $L'(E,1) = \frac{dL}{ds}(E,1)$ for its first derivative.
Manin constant: For the optimal curve in each isogeny class, the Manin constant is conjectured to be 1 and has been verified numerically for all curves in Cremona's tables (conductor $\leq 500{,}000$ ) [1]. All curves in our PARI/GP test set have conductor $\leq 129{,}850$ , well within this verified range.

For $E/\mathbb{Q}$ of analytic rank 0 (i.e., $L(E,1) \neq 0$ ), Kolyvagin [5] proved that $E(\mathbb{Q})$ and $\text{Ш}(E/\mathbb{Q})$ are both finite. Combined with modularity (Wiles et al. [4]) and work of Gross-Zagier, these results make the BSD leading-term formula a strong theoretical expectation. Cremona's database verifies it numerically for all curves with conductor $\leq 500{,}000$ . The formula is:

$L(E,1) = \frac{\Omega_E^+ \cdot \prod_p c_p(E) \cdot |\text{Ш}(E/\mathbb{Q})|}{|E(\mathbb{Q})_{\text{tors}}|^2} \tag{1}$

On the Manin constant. The general BSD formula includes a factor $c_E^2$ , where $c_E$ is the Manin constant of the optimal parametrization $X_0(N) \to E$ . For the optimal curve in each isogeny class, $c_E$ is conjectured to equal 1 (Manin's conjecture) and has been verified numerically for all curves in Cremona's tables with conductor $\leq 500{,}000$ [1]. For non-optimal curves, the appropriate period adjustment is absorbed into Cremona's tabulated $\Omega_E^+$ values. Consequently, $c_E^2 = 1$ throughout our dataset, and we write equation (1) with this substitution already made. All subsequent equations (2), (6) inherit $c_E = 1$ from this convention. There is no regulator factor since $\text{rank}(E/\mathbb{Q}) = 0$ .

Scope of verification. Equation (1) is the BSD conjecture's leading-term formula. It remains a conjecture in full generality — what has been established is: (i) Kolyvagin's proof that rank-0 curves have finite $E(\mathbb{Q})$ and finite $\text{Ш}$ , (ii) modularity (Wiles et al.), and (iii) Cremona's high-precision numerical verification that (1) holds exactly (to the computed precision) for every curve with conductor $\leq 500{,}000$ in his tables, with Manin constant $c_E = 1$ confirmed throughout. Our claim is not that BSD is proven — it is that (1) holds as a verified numerical identity for all curves in our dataset.

We define the BSD geometric factor:

$\alpha_{\text{BSD}}(E) = \frac{\Omega_E^+ \cdot \prod_p c_p(E)}{|E(\mathbb{Q})_{\text{tors}}|^2} \tag{2}$

so that equation (1) becomes $L(E,1) = \alpha_{\text{BSD}}(E) \cdot |\text{Ш}(E/\mathbb{Q})|$ .

2.2 Prior Work on Large $|\text{Ш}|$

Large Tate-Shafarevich groups have been studied computationally through Cremona's tables [1] and the LMFDB database [2]. The largest known $|\text{Ш}|$ values for rank-0 curves with conductor below $10^6$ include $|\text{Ш}| = 5625 = 75^2$ (Cremona label 165066.v1) and $|\text{Ш}| = 2500 = 50^2$ (287175.n1). These values are verified via the BSD formula (1) using high-precision computation of $L(E,1)$ .

The Cohen-Lenstra heuristics [6] and their extensions (Delaunay [7], Bhargava-Kane-Lenstra-Poonen-Rains [8]) make predictions about the distribution of $|\text{Ш}|$ in families of elliptic curves. Our tail-law observation (§5.1) provides empirical data for comparison with these predictions.

2.3 Data Sources and Selection Criteria

Full survey dataset (§5). We use a combined Cremona/LMFDB extract covering 1,900,161 curves (1,238,224 rank-0) with conductors up to $\sim 10^6$ . For each curve, the dataset provides: conductor $N$ , rank, $|\text{Ш}|$ (analytic, via BSD), and $|E_{\text{tors}}|$ . A subset with full BSD invariants ( $\Omega_E^+$ , $\prod c_p$ ) is available for 847,550 curves from Cremona's allbsd tables. The tail law (§5.1) uses the full 1.9M dataset; the ragged frontier (§5.3) uses the subset with band-by-band conductor coverage.

PARI/GP test set (§4). To test detection performance with independently computed L-derivatives, we selected 1,198 rank-0 curves: 998 curves with $|\text{Ш}| > 1$ (all available curves with $|\text{Ш}| > 1$ and complete BSD data in our Cremona extract, conductors 106–129,850) plus 200 randomly sampled control curves with $|\text{Ш}| = 1$ from the same conductor range. The controls were sampled uniformly by conductor value. For each curve, we computed $L(E,1)$ and $L'(E,1)$ directly from the L-series using PARI/GP's lfun() framework, independently of all BSD invariants.

Separation between datasets. The full survey dataset provides the population statistics ( $|\text{Ш}|$ distribution, frontier structure). The PARI/GP test set provides the detection performance claims. These are nested: the 1,198-curve test set is a subset of the 1.75M survey. No train/test split is applicable since $\alpha_{\text{BSD}}$ is a formula, not a fitted model — the threshold $\tau = 0.2$ is chosen for interpretability, not optimized on training data.

3. The Original Metric and Its Circularity

We document the failure of our initial approach in detail, as the forensic process is instructive for AI-assisted research methodology.

3.1 The Stability Metric (v1, November 2025)

We originally defined a "stability metric":

$S(E) = \frac{|L'(E,1)|}{|L(E,1)| \cdot \log N_E} \tag{3}$

intended to detect curves whose L-function behavior near $s = 1$ was anomalous relative to their conductor. The hypothesis was that curves with large $|\text{Ш}|$ might exhibit unusual sensitivity to perturbation at $s = 1$ , detectable through the ratio of derivative to value.

However, the implementation (ghost_harvester_v1.0.py, dated November 25, 2025) did not compute $L'(E,1)$ from the L-series. Instead, it set $L'(E,1) \approx 1$ and reconstructed $L(E,1)$ from BSD invariants via equation (1), yielding:

$S_{\text{actual}} = \frac{|E_{\text{tors}}|^2}{|\text{Ш}| \cdot \Omega_E^+ \cdot \prod c_p \cdot \log N_E}$

Since most curves in our dataset had torsion order 1, and $\Omega_E^+$ and $\prod c_p$ were set to placeholder values of 1 (the actual values were not loaded), this collapsed to:

$S \approx \frac{1}{|\text{Ш}| \cdot \log N_E} \tag{4}$

The "perfect separation" between curves with $|\text{Ш}| = 1$ and $|\text{Ш}| > 1$ was trivially guaranteed: $|\text{Ш}|$ appeared directly in the denominator.

3.2 The Diffusion Law Circularity

A "diffusion index" was defined as $D = -\log_{10} S$ . We claimed to discover an empirical law:

$D = \frac{1}{\sqrt{e}} \log_{10} |\text{Ш}| + C$

with slope matching the mathematical constant $1/\sqrt{e} \approx 0.6065$ . Forensic audit (February 27, 2026) revealed that every calibration data point was generated by this formula and then hardcoded into calibration scripts. The regression trivially recovered its own input. The reported $R^2 = 0.9999$ was an artifact of self-reference, as shown in Table 1.

Table 1: Circularity Verification — Hardcoded D values vs. formula prediction

| Curve | $|\text{Ш}|$ | $D_{\text{hardcoded}}$ | $\tfrac{1}{\sqrt{e}} \log_{10}|\text{Ш}|$ | $|\Delta|$ | |-------|-------------|----------------------|------------------------------------------|----------| | 165066.v1 (Leviathan) | 5625 | 2.2700 | 2.2746 | 0.005 | | 287175.n1 (Titan) | 2500 | 2.0600 | 2.0610 | 0.001 | | 146850.cb1 (Behemoth) | 2209 | 2.0300 | 2.0284 | 0.002 | | 95438.c2 (Original) | 676 | 1.7100 | 1.7164 | 0.006 |

All calibration D values match $\tfrac{1}{\sqrt{e}} \cdot \log_{10}|\text{Ш}|$ within rounding error ( $|\Delta| < 0.007$ ), confirming that the "discovered" law was computed from $|\text{Ш}|$ , not measured independently. (A fifth data point, curve 165066.d3, showed $D = 2.50$ versus predicted 1.87 — see §6 for discussion of its ambiguous provenance.)

3.3 The v2 Pivot and Its Limitations

On December 9, 2025, we detected a tautology in a derived quantity $R_{\text{BSD}} = C/S$ , which collapsed to $\log(N)/\sqrt{N}$ — a function of conductor alone carrying no curve-specific information. This was explicitly documented as "a tautology" in project files (R_BSD_v2_BREAKTHROUGH.md).

We pivoted to a v2 approach conceptually separating analytic and geometric inputs:

Analytic: $C(E) = |L'(E,1)| / (|L(E,1)| \cdot \sqrt{N})$
Geometric: $\alpha_{\text{BSD}}(E) = \Omega_E^+ \cdot \prod c_p / |E_{\text{tors}}|^2$

The conceptual design was sound. But the implementation (r_bsd_period_tamagawa_test.py) estimated $L'(E,1)$ using a synthetic function that depended on $|\text{Ш}|$ : ghost curves received artificially low $L'$ estimates, normals received high ones. The reported "232× separation" was therefore still contaminated by circularity.

3.4 How the Circularity Persisted

The December 2025 tautology detection and v2 pivot were documented in the project repository. However, when the main paper was revised for submission in February 2026, the AI assistant working in a new session context did not access the v2 documentation. It reverted to the original calibration data, producing paper v2.3 with circular claims intact.

This illustrates a failure mode specific to AI-assisted workflows: corrections made in one session can be silently lost when a new session starts with different context. The fix existed in the codebase but was invisible to the drafting process because the AI did not re-read the correction files.

4. The Corrected Analysis

4.1 Computing Real $L'(E,1)$

To break the circularity, we computed $L(E,1)$ and $L'(E,1)$ directly from L-series data using PARI/GP (version 2.15 [3]) with default long precision ( $\sim 38$ significant digits). For each of 1,198 rank-0 curves (§2.3), we:

Constructed the elliptic curve from its a-invariants via ellinit()
Computed $L(E,1)$ via lfun(E, 1) — the uncompleted Hasse-Weil L-function at $s = 1$
Computed $L'(E,1)$ via lfun(E, 1, 1) — the first derivative at $s = 1$

PARI/GP's lfun computes L-values via functional equation and inverse Mellin transforms, independent of BSD invariants. The computation is from the Euler product / Dirichlet series representation and does not use $\Omega$ , Tamagawa numbers, torsion, or $|\text{Ш}|$ at any stage.

Numerical consistency check. We compared $L_{\text{PARI}} = \texttt{lfun}(E, 1)$ against $L_{\text{BSD}} = \alpha_{\text{BSD}} \cdot |\text{Ш}|$ computed from Cremona's tabulated values. Agreement to $< 10^{-6}$ : 1,166 out of 1,198 curves (97.3%). The 32 discrepant curves (2.7%) showed deviations of $10^{-6}$ to $10^{-3}$ , attributable to precision differences between Cremona's tabulated $\Omega$ values and our PARI/GP recomputation. No curve exceeded $10^{-3}$ discrepancy.

Important caveat on this check. This comparison is not an independent test of the BSD formula, since Cremona's tabulated $|\text{Ш}|$ values are themselves computed via equation (1). What the comparison validates is narrower: (i) our PARI/GP pipeline computes the same $L(E,1)$ as Cremona's pipeline, and (ii) no data corruption or parsing errors occurred. The theoretical validity of (1) rests on the results cited in §2.1, not on this numerical check.

4.2 The L-Derivative Does Not Distinguish Ghost Curves

We computed the normalized logarithmic derivative for all 1,198 curves:

$S_{\text{real}}(E) = \frac{|L'(E,1)|}{|L(E,1)| \cdot \log N_E} \tag{5}$

The normalization by $\log N_E$ was inherited from the original (circular) metric definition (3). The choice of $\log N$ (rather than $\sqrt{N}$ or $N$ itself) does not affect the key finding, which concerns the relative behavior between ghost and normal curves, not the absolute value.

Result: $S_{\text{real}}$ is tightly concentrated for all rank-0 curves regardless of $|\text{Ш}|$ :

Statistic	All curves ( $n = 1{,}198$ )	Ghost ( $n = 998$ )	Normal ( $n = 200$ )
Median	0.285	0.286	0.280
IQR	0.018	—	—
Std. dev.	0.021	—	—

The ghost and normal medians differ by only 2% ( $0.286 / 0.280 = 1.02$ ). The interquartile range of 0.018 on a median of 0.285 indicates a coefficient of variation of approximately 7%. This tight concentration holds regardless of $|\text{Ш}|$ : curves with $|\text{Ш}| = 4$ and $|\text{Ш}| = 676$ show statistically indistinguishable $S_{\text{real}}$ distributions.

Equivalently, the instability measure $C(E) = |L'(E,1)| / (|L(E,1)| \cdot \sqrt{N_E})$ shows only a small difference between classes (ghost median $C = 1.15 \times 10^{-2}$ , normal median $C = 1.26 \times 10^{-2}$ , ratio $0.91\times$ ). A Mann-Whitney $U$ test detects this difference as statistically significant ( $p = 4 \times 10^{-4}$ ) but with a small effect size (rank-biserial $r = 0.16$ ). By contrast, the $\alpha_{\text{BSD}}$ separation has $p = 9.3 \times 10^{-107}$ and $r = 0.98$ . The L-derivative carries a weak statistical signal, but it is far too small to be useful for detection — the entire practical separation comes from $\alpha_{\text{BSD}}$ .

Why not? Heuristically, $L'(E,1)$ measures the slope of the L-function at the central point. For a rank-0 curve, $L(E,1) \neq 0$ and the L-function is smooth near $s = 1$ ; the slope is determined by the functional equation and the distribution of Fourier coefficients $a_p(E)$ , which do not know about $|\text{Ш}|$ directly. The BSD formula constrains only the value $L(E,1)$ , not its derivative.

4.3 The $\alpha_{\text{BSD}}$ Screening Method

Despite the failure of the analytic approach, a genuine signal exists in the geometric side. The BSD factor $\alpha_{\text{BSD}}(E)$ from equation (2) is computable from a curve's minimal Weierstrass model and local reduction data, without any knowledge of $|\text{Ш}|$ , $L(E,1)$ , or $L'(E,1)$ .

Our 1,198-curve dataset reveals:

Class	$n$	Median $\alpha_{\text{BSD}}$	Q1 – Q3	Median $C$
Ghost ($	\text{Ш}	> 1$)	998	0.052
Normal ($	\text{Ш}	= 1$)	200	1.691
Separation ratio		32.2×		0.91×

The distributions are well-separated: the ghost IQR (0.033–0.081) does not overlap with the normal IQR (0.933–2.854). A two-sample Kolmogorov-Smirnov test confirms the separation ( $D = 0.967$ , $p = 3.0 \times 10^{-194}$ ). The Mann-Whitney $U$ test gives $U = 1{,}930$ , $p = 9.3 \times 10^{-107}$ , with rank-biserial effect size $r = 0.98$ — near-perfect rank separation. The entire practical separation comes from $\alpha_{\text{BSD}}$ ; $C$ carries only a negligible signal (§4.2).

4.4 Why $\alpha_{\text{BSD}}$ Works — and Its Limitations

The mechanism is equation (1) read in reverse (recall $c_E = 1$ throughout, see §2.1):

$|\text{Ш}| = \frac{L(E,1)}{\alpha_{\text{BSD}}} \tag{6}$

We observe that $L(E,1)$ is moderate for rank-0 curves:

Ghost median $L(E,1) = 3.48$ ; normal median $L(E,1) = 2.21$
Ghost mean $L(E,1) = 4.07$ ; normal mean $L(E,1) = 2.54$

Ghost curves have larger $L(E,1)$ (because $|\text{Ш}| \gg 1$ more than compensates for small $\alpha_{\text{BSD}}$ ), but the variation is modest — within a factor of 2. Meanwhile, $\alpha_{\text{BSD}}$ varies by a factor of 32 between classes. So $\alpha_{\text{BSD}}$ carries most of the information about $|\text{Ш}|$ that the BSD formula provides.

This is not a new theoretical insight. It is equation (1) deployed as a computational screen. The practical value is cost asymmetry:

Computing $\alpha_{\text{BSD}}$ requires: period integration ( $\Omega_E^+$ , fast from the Weierstrass model), Tate's algorithm ( $c_p$ , fast at finitely many primes), and Mazur's theorem ( $|E_{\text{tors}}|$ , fast with bounded output).
Verifying $|\text{Ш}|$ requires: computing $L(E,1)$ to high precision and solving (6), or performing explicit descent.

For large-scale surveys, $\alpha_{\text{BSD}} < 0.2$ is a fast filter that identifies candidate curves before expensive verification.

Important caveat. Because $\alpha_{\text{BSD}}$ screening is BSD inversion, it is not independent of the BSD formula. In our setting (rank-0 curves, conductor $\leq 500{,}000$ ), the BSD leading-term formula has been verified numerically for all curves in Cremona's tables with Manin constant $c_E = 1$ confirmed throughout, so this is not a practical limitation. For curves outside this verified range, $\alpha_{\text{BSD}}$ screening would be conditional on BSD.

4.5 Detection Performance

Using $\alpha_{\text{BSD}} < \tau$ as a threshold classifier on the 1,198-curve test set:

Threshold $\tau$	TP	FP	FN	TN	Precision (95% CI)	Recall (95% CI)
0.5	998	16	0	184	98.4% (97.5–99.0%)	100.0% (99.6–100%)
0.2	982	5	16	195	99.5% (98.9–99.8%)	98.4% (97.5–99.0%)
0.1	838	2	160	198	99.8% (99.2–100%)	83.9% (81.5–86.1%)
0.05	471	2	527	198	99.6% (98.6–99.9%)	47.2% (44.1–50.3%)

Confidence intervals are Wilson score intervals at the 95% level. At the $\tau = 0.2$ operating point, the classifier flags 987 curves, of which 982 genuinely have $|\text{Ш}| > 1$ . The 5 false positives (normal curves with $\alpha_{\text{BSD}} < 0.2$ ) merit individual investigation to determine whether they have unusual geometric structure.

Note on threshold selection. The threshold $\tau = 0.2$ was not optimized on this dataset. It was chosen as a round number that falls in the gap between the ghost and normal $\alpha_{\text{BSD}}$ distributions. Any threshold between 0.1 and 0.5 yields precision above 98%.

5. Empirical Findings from 1.9 Million Curves

The following results use the full survey of 1,900,161 curves (§2.3) and are independent of the $\alpha_{\text{BSD}}$ screening analysis. They are direct observations from the database.

5.1 The Tail Law

Among the 1,900,161 curves, 256,199 have $|\text{Ш}| > 1$ . All observed $|\text{Ш}|$ values are perfect squares, consistent with the Cassels-Tate alternating pairing (which requires $|\text{Ш}|$ to be a perfect square when finite). We observe 38 distinct values of $|\text{Ш}| > 1$ , ranging from 4 ( $= 2^2$ ) to 5625 ( $= 75^2$ ).

For $|\text{Ш}| \geq 4$ , we fit a power law via ordinary least squares on the $\log_{10}$ - $\log_{10}$ scale:

$\log_{10} \text{count}(|\text{Ш}| = k) = -\hat{\alpha} \cdot \log_{10} k + \hat{\beta} \tag{7}$

yielding $\hat{\alpha} = 2.02 \pm 0.07$ (standard error), $R^2 = 0.955$ , $p < 10^{-15}$ , on $n = 38$ data points.

Model comparison. We compared the power law (2 parameters) against a quadratic model on $\log$ - $\log$ scale (3 parameters) via the Bayesian Information Criterion. The quadratic model gives a marginally lower BIC ( $\Delta\text{BIC} = -2.1$ ), indicating that the two models are comparable — the data does not strongly distinguish a pure power law from a curved log-log relationship. We report the power-law exponent as a convenient summary statistic, not as a claim of exact power-law behavior.

Cutoff sensitivity. The exponent varies with the lower $|\text{Ш}|$ cutoff:

Cutoff	$\hat{\alpha}$	SE	$R^2$	$n$ points
$\geq 4$	2.02	0.07	0.955	38
$\geq 9$	2.03	0.08	0.945	37
$\geq 25$	2.00	0.10	0.922	35
$\geq 49$	1.94	0.12	0.895	33
$\geq 100$	1.77	0.14	0.845	30

The exponent is stable ( $\hat{\alpha} \approx 2.0$ ) for cutoffs up to $|\text{Ш}| \geq 25$ , then decreases to $\sim 1.77$ at cutoff $\geq 100$ . The decreasing exponent at higher cutoffs indicates that the extreme tail is heavier than the body of the distribution — there are more very-large- $|\text{Ш}|$ curves than a $k^{-2}$ power law would predict. (An earlier version of this paper reported $\hat{\alpha} = 1.76$ based on a cutoff $\geq 100$ subset; the full-range fit gives $\hat{\alpha} = 2.02$ .)

The convergence threshold. The exponent $\hat{\alpha} \approx 2$ is theoretically significant. If $\text{count}(|\text{Ш}| = k) \sim k^{-\alpha}$ across $N$ curves, then the population mean $\mathbb{E}[|\text{Ш}|] = N^{-1} \sum_k k \cdot \text{count}(k)$ converges if and only if $\alpha > 2$ . At $\hat{\alpha} = 2.02$ , the mean just barely converges — the distribution is at the critical threshold. This is consistent with Delaunay's extension of the Cohen-Lenstra heuristics [7] to Tate-Shafarevich groups, which predicts finite but potentially large moments for rank-0 families, and with the Bhargava-Kane-Lenstra-Poonen-Rains framework [8] for modeling $|\text{Ш}|$ distributions. A detailed quantitative comparison of our frequency data against the specific Delaunay predictions (which involve prime-by-prime contributions to $|\text{Ш}|$ rather than a simple power law) is left for future work.

5.2 The No-Go Region

In the $(N, |\text{Ш}|)$ plane, we observe 908 curves with $|\text{Ш}| > 100$ out of 1,900,161 total. The global maximum is $|\text{Ш}| = 5625 = 75^2$ at conductor $N = 165{,}066$ (Cremona label 165066.v1, the "Leviathan"), well below the database's conductor frontier.

5.3 The Ragged Frontier

Partitioning the data into conductor bands reveals a non-monotonic pattern in maximal $|\text{Ш}|$ :

| Band | Conductor Range | Max $\sqrt{|\text{Ш}|}$ | Curves in Band | |------|-----------------|----------------------|----------------| | 1 | $10{,}000$ – $100{,}000$ | 26 | 711,857 | | 2 | $100{,}000$ – $150{,}000$ | 47 | 316,708 | | Peak | $150{,}000$ – $200{,}000$ | 75 | 116,515 | | 3 | $200{,}000$ – $300{,}000$ | 50 | 228,539 | | 4 | $300{,}000$ – $500{,}000$ | 43 | 435,991 | | Sparse | $500{,}000$ – $1{,}000{,}000$ | 14 | 4,754 |

Band 4 has lower maximal $|\text{Ш}|$ than Band 3, and the sparse band (where LMFDB coverage drops to 4,754 curves) shows a dramatic decline. Within the well-covered range (Bands 1–4, covering 1.69M curves), the non-monotonic pattern is robust to band boundary choices.

Caveat on the sparse band. The drop at conductors $> 500{,}000$ is likely dominated by database incompleteness (4,754 curves versus 100,000+ in other bands). We do not claim a physical decline in maximal $|\text{Ш}|$ at high conductors. The frontier pattern in Bands 1–4 uses conductor ranges with dense coverage.

Note on priority. The individual monster curves (including the Leviathan) are previously known from Cremona's tables. Our contribution is the systematic band-by-band survey revealing the non-monotonic frontier structure.

5.4 The Monster Catalog

The four principal high- $|\text{Ш}|$ curves in our dataset:

| Curve | Conductor | $|\text{Ш}|$ | $\sqrt{|\text{Ш}|}$ | $\alpha_{\text{BSD}}$ | $L(E,1)$ | |-------|-----------|-------------|---------------------|-------------------|-----------| | 165066.v1 (Leviathan) | 165,066 | 5625 | 75 | 0.054 | 306.6 | | 287175.n1 (Titan) | 287,175 | 2500 | 50 | — | — | | 146850.cb1 (Behemoth) | 146,850 | 2209 | 47 | — | — | | 95438.c2 (Original) | 95,438 | 676 | 26 | — | — |

All have $\alpha_{\text{BSD}} < 0.1$ , consistent with the screening threshold in §4.5.

6. The d3 Anomaly — An Open Question

Among the calibration data points examined in our forensic audit (§3.2), curve 165066.d3 was the sole case where the hardcoded diffusion value ( $D = 2.50$ ) diverged from the $|\text{Ш}|$ -based formula ( $\tfrac{1}{\sqrt{e}} \log_{10}(1225) = 1.87$ , deviation 0.63). All other calibration values matched the formula exactly.

We present this anomaly for completeness but with strong caveats:

Provenance is unverified. We cannot reconstruct how the d3 measurement $D = 2.50$ was originally computed. It may have been an independent measurement, a typo, or a different formula. No code or log file documents its origin.
The original D metric is meaningless. Even if $D = 2.50$ was independently measured, it was measuring a quantity ( $D = -\log_{10} S$ where $S$ is the circular metric) that does not correspond to any well-defined analytic property of the curve.
The twist observation is separable. Independently of the D anomaly, the quadratic twist of 165066.d3 yields a curve at conductor $5{,}282{,}112$ with $|\text{Ш}| = 1225 = 35^2$ — a "ghost breeding" phenomenon where a low- $|\text{Ш}|$ base curve produces a high- $|\text{Ш}|$ twist. This is a potentially interesting observation but requires independent verification beyond current database ranges.

We flag this as an open question rather than a finding.

7. Discussion

7.1 What $\alpha_{\text{BSD}}$ Screening Offers

The $\alpha_{\text{BSD}}$ screen is the BSD formula deployed as a filter: compute everything except $|\text{Ш}|$ , and flag curves where the remaining factors force $|\text{Ш}|$ to be large (assuming BSD). This is useful in scenarios where:

One has access to period and Tamagawa data (e.g., from Cremona-style tables or fast recomputation) but not to verified $|\text{Ш}|$ values
One is extending databases to higher conductors and wants to prioritize verification effort
One is studying a family of curves and wants to identify members with interesting arithmetic without computing $L(E,1)$

It does not provide information about $|\text{Ш}|$ beyond what BSD already encodes. It is a practical tool, not a theoretical advance.

7.2 The Tail Law and Delaunay Heuristics

The observed exponent $\hat{\alpha} \approx 2.02$ places the $|\text{Ш}|$ frequency distribution at the threshold where $\mathbb{E}[|\text{Ш}|]$ transitions from divergent ( $\alpha \leq 2$ ) to convergent ( $\alpha > 2$ ). This is a potentially meaningful coincidence in the context of the Delaunay heuristics [7], which extend the Cohen-Lenstra philosophy from class groups of number fields to Tate-Shafarevich groups of elliptic curves.

Delaunay's framework predicts specific probabilities for $|\text{Ш}|$ values based on prime-by-prime contributions from the alternating pairing structure of $\text{Ш}$ . These predictions are more structured than a simple power law — they involve products over prime divisors of $|\text{Ш}|$ analogous to the Cohen-Lenstra weights for class numbers. Whether our observed frequency distribution (which is well-approximated by $k^{-2}$ over three orders of magnitude) is consistent with, or deviates from, the Delaunay predictions at the level of individual $|\text{Ш}|$ values is an open question.

The heavier tail at large $|\text{Ш}|$ (exponent dropping from 2.02 at cutoff $\geq 4$ to 1.77 at cutoff $\geq 100$ ) suggests the largest $|\text{Ш}|$ values occur more frequently than a pure $k^{-2}$ law predicts. This tail enrichment may reflect specific arithmetic mechanisms (e.g., particular conductor factorizations or isogeny structures) that favor extreme $|\text{Ш}|$ growth.

The exponent may also depend on how curves are ordered (by conductor, height, or discriminant) and on the conductor range sampled. Our measurement covers curves ordered by conductor up to $\sim 10^6$ .

7.3 The Ragged Frontier and Conductor Geometry

The non-monotonic frontier challenges naive expectations. One might expect that as the conductor grows, more "room" exists for large $|\text{Ш}|$ . Instead, the data suggest that conductor factorization structure — not just magnitude — influences the attainable $|\text{Ш}|$ values.

We note that LMFDB coverage is highly non-uniform above conductor $500{,}000$ , which dominates the observed drop in the sparse band. The frontier pattern in Bands 1–4 (where coverage exceeds 100,000 curves per band) is more reliable.

7.4 Verification Protocols for AI-Assisted Research

Our experience suggests three practical protocols:

Protocol 1: Numerical spot-checks. Before accepting any claimed empirical law, substitute concrete numbers into every formula in the pipeline and verify the output against independently computed values. In our case, checking $D = \tfrac{1}{\sqrt{e}} \log_{10}(5625) = 2.2746$ against the hardcoded $D = 2.27$ would have revealed the circularity immediately.

Protocol 2: Physical column deletion. When building a detector for property X, physically remove column X from the input data file. If the code fails to run, X was in the pipeline. Do not rely on variable renaming or abstraction layers — these are transparent to the computation.

Protocol 3: Correction persistence. When an AI session discovers a critical error, write the correction into a file that future sessions will encounter by default (e.g., a project rule file, a header warning in the main document, or a configuration that loads automatically). Corrections documented only in subsidiary files will be silently lost when new sessions begin with different context windows.

7.5 Limitations

Conductor range. Our PARI/GP test set covers conductors 106–129,850. The $\alpha_{\text{BSD}}$ screening performance at higher conductors (where $L(E,1)$ distributions may differ) is untested.
BSD dependence. The $\alpha_{\text{BSD}}$ method assumes BSD holds. In our setting (rank-0 curves, conductor $\leq 500{,}000$ , verified Manin constant), equation (1) has been verified numerically for every curve. For curves outside this range, the assumption is conditional on BSD.
Control group size. Our test set used 200 normal versus 998 ghost curves. The precision estimate (which depends on the false positive rate among normals) has wider confidence intervals than the recall estimate. A larger control sample would tighten the precision bounds.
Class balance. The 998:200 ghost-to-normal ratio in our test set does not reflect the population ratio (where $|\text{Ш}| = 1$ is dominant). In a population-representative sample, the false positive count would be much larger in absolute terms, though the false positive rate would remain the same.
Tail law model selection. BIC comparison between power-law and quadratic models is marginal ( $\Delta\text{BIC} = -2.1$ ). More sophisticated alternatives (log-normal, stretched exponential, or Delaunay-derived distributions) were not tested.
Single dataset. All observations are from Cremona's tables / LMFDB. Independent verification using differently computed databases would strengthen the findings.

8. Conclusion

The original "Ghost Rank" stability metric and "spectral diffusion law" do not survive independent verification. With L-series derivatives computed via PARI/GP, the normalized ratio $|L'(E,1)|/|L(E,1)|$ is tightly concentrated (IQR $= 0.018$ ) across all rank-0 curves, showing no dependence on $|\text{Ш}|$ .
A genuine screening signal exists in $\alpha_{\text{BSD}} = \Omega_E^+ \cdot \prod c_p / |E_{\text{tors}}|^2$ . At threshold $\alpha < 0.2$ , this achieves 99.5% precision (95% CI: 98.9–99.8%) and 98.4% recall (95% CI: 97.5–99.0%) for detecting $|\text{Ш}| > 1$ on 1,198 curves with verified L-derivatives. The mechanism is BSD inversion — a practical tool, not a theoretical discovery.
The frequency distribution of $|\text{Ш}|$ follows an approximate power law with exponent $\hat{\alpha} = 2.02 \pm 0.07$ across 1.9 million curves ( $R^2 = 0.955$ , $n = 38$ unique values). The exponent sits at the convergence threshold for $\mathbb{E}[|\text{Ш}|]$ , consistent with Delaunay heuristics. The tail is enriched relative to a pure $k^{-2}$ law (exponent drops to $\sim 1.77$ for $|\text{Ш}| \geq 100$ ). The "ragged frontier" — non-monotonic growth of maximal $|\text{Ш}|$ with conductor — is a genuine empirical observation in the well-covered conductor range.
The forensic process of discovering and correcting circular reasoning in AI-assisted computational mathematics is itself a methodological contribution. The circularity was introduced by AI code generation, reinforced by AI analysis, and eventually caught by AI auditing under a systematic protocol. We propose three verification protocols (§7.4) for researchers working with AI tools.

Acknowledgments

This research used data from the LMFDB [2] and Cremona's elliptic curve database [1]. Computations were performed using PARI/GP [3] and Python. The author thanks the developers of these tools for making large-scale computational number theory accessible.

The honest account of methodological failure in §3 reflects the author's commitment to scientific transparency. The paper was refined through iterative AI review via TOEShare (theoryofeverything.ai), which identified gaps in mathematical precision and statistical methodology addressed in this version.

References

[1] J.E. Cremona, Algorithms for Modular Elliptic Curves, 2nd ed., Cambridge University Press, 1997. Updated tables: https://johncremona.github.io/ecdata/

[2] The LMFDB Collaboration, The L-functions and Modular Forms DataBase, https://www.lmfdb.org, 2025.

[3] The PARI Group, PARI/GP version 2.15, Univ. Bordeaux, 2023. https://pari.math.u-bordeaux.fr/

[4] A. Wiles, Modular elliptic curves and Fermat's Last Theorem, Ann. Math. 141 (1995), 443–551.

[5] V.A. Kolyvagin, Finiteness of $E(\mathbb{Q})$ and $\text{Ш}(E/\mathbb{Q})$ for a subclass of Weil curves, Izv. Akad. Nauk SSSR 52 (1988), 522–540.

[6] H. Cohen and H.W. Lenstra, Heuristics on class groups of number fields, Lecture Notes in Math. 1068 (1984), 33–62.

[7] C. Delaunay, Heuristics on Tate-Shafarevitch groups of elliptic curves defined over $\mathbb{Q}$ , Experiment. Math. 10 (2001), 191–196.

[8] M. Bhargava, D.M. Kane, H.W. Lenstra, B. Poonen, E. Rains, Modeling the distribution of ranks, Selmer groups, and Shafarevich-Tate groups of elliptic curves, Cambridge J. Math. 3 (2015), 275–321.

Data Availability

All code and data are available at:

GitHub: https://github.com/wsuduce/ghost-rank (v3 update forthcoming)
Definitive test script: calibration/definitive_v2_test.py
PARI/GP L-derivative data: calibration/dual_metric_results.json (1,198 curves)
Re-analysis script: calibration/analyze_v2_verdict.py
Tail law analysis: calibration/tail_law_analysis.py (formal power-law fitting with BIC)

Appendix A: Circularity Verification

The script calibration/verify_d_values.py computes $\tfrac{1}{\sqrt{e}} \cdot \log_{10}(|\text{Ш}|)$ for each calibration monster and compares to the hardcoded $D$ value. All non-anomaly points match within $|\Delta| < 0.007$ , confirming that $D$ was computed from $|\text{Ш}|$ rather than measured independently.

Appendix B: PARI/GP Computation Details

L-series values were computed via the PARI/GP commands:

E = ellinit([a1, a2, a3, a4, a6]);
L_val = lfun(E, 1);
L_prime = lfun(E, 1, 1);

The lfun function computes L-values via inverse Mellin transforms using the functional equation, with precision controlled by PARI's default stack precision (38+ digits). Total batch computation time: approximately 6 minutes for 1,198 curves on a standard workstation (Intel i7, 16GB RAM, WSL2/Ubuntu).

Appendix C: Chronological Project Timeline

A complete listing of all project files and their modification dates (November 25, 2025 through February 27, 2026) is maintained in TIMELINE.md, documenting nine development phases from initial data collection through the final forensic audit.

"We built a machine to detect treasure. The machine was cheating. We caught it. We fixed it. There was still treasure."

mathgpt-5.2-2025-12-11

Internal 4/5Mathematical 3/5

The core algebraic framework is logically coherent: the BSD leading-term identity for rank 0 motivates α_BSD, and inversion yields a plausible screening heuristic. The paper’s internal audit of circularity is mathematically sound; the reduction of the original stability metric to a function essentially proportional to 1/|Ш| (given placeholders) correctly diagnoses why the initial detector would appear perfect. Mathematically, the main weaknesses are around normalization and asymptotic interpretation. The BSD identity (Eq. (1)) is convention-dependent, and while the manuscript states conventions and provides numerical agreement checks, it does not fully justify that the PARI-computed L(E,1) aligns with the same normalization used in the BSD factors without relying on BSD-derived |Ш|. Separately, the “tail exponent near 2 implies mean at convergence threshold” is only meaningful under an infinite-tail model, whereas the observed data are finite and discretely supported; presenting the α≈2 threshold as theoretically decisive therefore overreaches without additional modeling justification.

+ Correct algebraic reduction identifying circularity: Eq. (3) together with reconstructing L(E,1) from BSD and setting L′≈1 indeed forces S ≈ 1/(|Ш| log N) (Eq. (4)), making “perfect separation” tautological.+ Clear separation of definitions and inversion: α_BSD defined in Eq. (2) and inversion in Eq. (6) follow directly from Eq. (1) under the stated rank-0/no-regulator convention.+ Appropriate logical caveat that the PARI vs BSD L(E,1) agreement check is not an independent verification of BSD because tabulated |Ш| is itself obtained via BSD.

- Normalization ambiguity: Eq. (1) is sensitive to conventions (completed vs uncompleted L-function; period normalization; Manin constant; real component factors). The paper asserts these are handled by adopting Cremona’s Ω_E^+ and c_E=1, but it does not mathematically demonstrate that PARI’s lfun(E,1) uses the same normalization as Eq. (1), beyond an empirical match that is itself conditioned on BSD-derived |Ш|.- Mechanism claim depends on an unproven bound on L(E,1): §4.4 argues large |Ш| forces small α_BSD because L(E,1) is “moderate” and varies only ~2×; but §5.4 lists L(E,1)=306.6 for a large-|Ш| curve, showing L(E,1) can be very large. Without a proven typical/quantile statement tied to the screening population, the inference “small α_BSD ⇒ large |Ш|” (or its contrapositive) is not mathematically guaranteed.- Tail exponent/mean-threshold statement is asymptotic but presented as if directly interpretable from a finite, truncated support with only 38 discrete |Ш| values. In a finite sample with hard maximum, E[|Ш|] always exists; the α=2 threshold is only meaningful for an idealized infinite-tail model and requires justification that such an asymptotic regime is plausible here.- Power-law fitting via OLS on log-counts (Eq. (7)) is not mathematically justified as an estimator of a discrete tail exponent under heteroskedastic noise; no likelihood-based or discrete power-law model is derived, so the reported ±0.07 standard error is not clearly tied to a valid generative model.- Implicit domain issue: the “precision/recall” screening results are computed on a heavily class-imbalanced and non-representative test set (998 ghosts, 200 normals). This is not a logical contradiction, but it weakens the mathematical interpretation of “99.5% precision” as a population precision without a base-rate correction.

mathclaude-opus-4-20250514

Internal 5/5Mathematical 5/5

This paper demonstrates exceptional mathematical rigor and internal consistency. The forensic analysis of the circular metric is mathematically sound, correctly identifying that S ≈ 1/(|Ш| · log N_E) trivially separates ghost and normal curves by embedding |Ш| in the denominator. The corrected analysis using PARI/GP is methodologically rigorous, computing L-derivatives independently from BSD invariants. The α_BSD screening method is mathematically valid as BSD formula inversion, with the mechanism clearly explained and limitations honestly acknowledged. The statistical analyses are exemplary: power law fitting includes error estimates and model comparison, detection performance includes proper confidence intervals, and all claims are supported by the data. The paper maintains logical coherence throughout, from the initial hypothesis about L-derivatives (which fails empirically) through the practical screening tool that emerges. This is a model of careful computational mathematics with transparent methodology.

+ Exceptional mathematical rigor in identifying and correcting the circular reasoning, with explicit verification that D values match (1/√e)·log₁₀|Ш| within 0.007 for all calibration points (Table 1)+ Proper statistical methodology including confidence intervals (Wilson score), effect sizes (rank-biserial r = 0.98 for α_BSD separation), and model comparison (BIC for power law vs quadratic)+ Clear mathematical mechanism for α_BSD screening via BSD inversion: |Ш| = L(E,1)/α_BSD (Equation 6) with empirical validation that L(E,1) remains moderate (median 2.2-3.5) while α_BSD varies 32× between classes

mathclaude-sonnet-4-20250514

Internal 4/5Mathematical 3/5

This paper demonstrates mixed mathematical rigor. The core algebraic content involving the BSD formula is mathematically sound, with correct manipulation of the identity |Ш| = L(E,1)/α_BSD and proper handling of elliptic curve invariants. The PARI/GP computational methodology appears valid for computing L-function values independently. However, several key mathematical steps lack justification: the choice of normalization factors in the stability metrics, the theoretical basis for power-law modeling of |Ш| frequencies, and the statistical model assumptions underlying the regression analysis. The paper would benefit from stronger mathematical foundations for its empirical modeling choices and more careful distinction between computational verification and theoretical proof. The internal logical consistency is generally strong, particularly in the forensic analysis of the circular reasoning, but the mathematical validity is compromised by unjustified methodological choices.

+ Correct application of the BSD formula (equation 1) with proper handling of the Manin constant and clear definition of all invariants+ Sound algebraic manipulation showing |Ш| = L(E,1)/α_BSD from the BSD formula, providing the theoretical basis for the screening method+ Consistent statistical methodology throughout with proper confidence interval reporting and effect size calculations

- Equation (3) normalization by log N_E lacks mathematical justification - why logarithmic rather than other scalings?- Power law model in equation (7) applied without theoretical foundation for why |Ш| frequencies should follow power law distributions- Statistical model assumptions (independence, normality of residuals) not verified for the OLS fitting procedure- The claim that 'BSD formula has been verified numerically' conflates numerical verification with theoretical proof - this is a computational observation, not mathematical proof- Missing derivation steps for how the 'normalized logarithmic derivative' metric was constructed and why it should detect anomalous L-function behavior

sourcesclaude-sonnet-4-20250514

Completeness 4/5

This paper demonstrates strong completeness in its mathematical exposition and empirical analysis. The authors provide comprehensive definitions of all BSD formula components, detailed computational methodology using PARI/GP, and explicit acknowledgment of limitations and assumptions. The forensic documentation of the circular reasoning failure is particularly thorough, providing a complete audit trail with specific numerical examples and timeline reconstruction. While minor gaps exist in justifying some methodological choices (like the log N normalization), the overall work is well-structured and addresses its stated goals of detecting large Tate-Shafarevich groups and documenting AI-assisted research pitfalls.

+ Comprehensive documentation of the circular reasoning failure with concrete numerical examples and timeline reconstruction+ All key variables and formulas are properly defined before use, including precise BSD formula conventions and computational details+ Explicit statement of limitations, assumptions, and scope (conductor ranges, BSD dependence, control group size, dataset boundaries)

- The normalization choice of log N_E in equation (5) is inherited from the original metric but not independently justified- Some computational details like the specific precision thresholds for L-function agreement could be more rigorously specified- The connection between the observed power law exponent and Delaunay heuristics is mentioned but not fully developed quantitatively

sourcesgpt-5.2-2025-12-11

Completeness 4/5

As a paper, this is substantially complete in its internal logic and unusually transparent about a major methodological failure and its correction. Definitions and conventions are mostly explicit, the corrected computational check (PARI/GP lfun-based derivatives) is described at a level that a reader could replicate for the 1,198-curve subset, and the narrative consistently distinguishes the invalidated L-derivative detector from the surviving α_BSD screening signal. The main completeness weaknesses are in the large-scale empirical components: the 1.9M-curve survey pipeline is not fully specified in a way that pins down database completeness and selection, and the reported classifier performance is based on a deliberately enriched test set with unclear sampling details for the |Ш|=1 controls. The tail-law result is presented with sensible caveats (cutoff sensitivity, BIC comparison), but the statistical methodology is not yet fully supported for a strong distributional claim. With tighter dataset provenance, a population-representative evaluation (or explicit recalibration), and more robust tail fitting/inference, the paper would be close to fully complete for its empirical aims.

+ Explicit, concrete forensic accounting of circularity, including how it entered code and how it was detected, which materially supports the corrected claims.+ Clear definition of BSD-related quantities and conventions (especially Ω_E^+ convention and Manin-constant handling), reducing ambiguity in computed invariants.+ Limitations are enumerated in a dedicated section and generally align with the main claims (conductor range, BSD dependence, class imbalance, model selection).

- Dataset construction for the 1.9M-curve survey is under-specified (exact LMFDB snapshot/queries, de-duplication and isogeny-class handling, completeness vs availability), which directly affects tail-law and frontier conclusions.- Classifier evaluation uses a highly non-representative test set (998 vs 200) with unclear control sampling mechanics; precision/FP rate may not transfer to the population without a more explicit sampling-bias analysis or a representative evaluation.- Ambiguity whether α_BSD used for screening was recomputed from minimal models or taken from tables; if tabulated, the claimed low-cost “screening from geometry alone” is not empirically demonstrated in this paper.- Tail-law inference relies on OLS on log-log counts with only 38 support points and likely heteroskedastic errors; additional robustness/alternative fits (discrete MLE, goodness-of-fit tests, or bootstrapped uncertainty) are needed to make the exponent claim well-supported.- The stated mechanism that “L(E,1) remains moderate” is asserted with medians from the 1,198 set but not characterized over the full 1.9M population; if L(E,1) varies more at higher conductors, α_BSD threshold performance could degrade, and this interaction is not analyzed.

scienceclaude-opus-4-20250514

Clarity 5/5Novelty 2/5Falsifiability 4/5

This submission presents a masterclass in scientific transparency and clear communication, documenting both a significant methodological failure and subsequent valid findings. While the theoretical novelty is limited — the author explicitly acknowledges that α_BSD screening is just 'BSD inversion' — the work provides genuine value through careful empirical analysis of 1.9 million elliptic curves and practical screening tools with well-characterized performance. The power-law distribution of Tate-Shafarevich groups with exponent near the convergence threshold (α̂ = 2.02) is an interesting empirical observation that connects to theoretical predictions. The paper's greatest contribution may be its forensic documentation of how AI-assisted research can introduce and perpetuate circular reasoning, complete with practical protocols for avoiding such errors. This methodological case study is highly relevant as AI tools become standard in computational mathematics. The writing is exceptionally clear, with careful attention to notation, explicit statement of assumptions, and proper statistical methodology throughout.

+ Exemplary scientific integrity in documenting and correcting a fundamental error in the original analysis, providing a valuable case study for AI-assisted research pitfalls+ Clear statistical methodology with proper confidence intervals, effect sizes, and model comparison via BIC+ Practical computational tool (α_BSD screening) with well-characterized performance metrics that other researchers can immediately apply

- The main result (α_BSD screening) is admittedly just the BSD formula used in reverse, offering no new theoretical insight- Some empirical observations like the 'ragged frontier' lack clear predictive power or theoretical explanation- The paper's length and detailed error documentation, while admirably transparent, may obscure the relatively modest positive findings- Heavy reliance on a single database (Cremona/LMFDB) without independent verification of the empirical patterns- The connection between observed power-law exponent and theoretical predictions (Delaunay heuristics) is mentioned but not quantitatively developed

Share this Review

Post your AI review credential to social media, or copy the link to share anywhere.

LinkedIn X / Twitter Facebook

theoryofeverything.ai/review-profile/paper/fb73f990-3989-4175-a045-a7c42103accb

Share by Email

Email clients cannot render the full review profile page. We send a branded HTML summary plus a link to the live credential.

Recipient emailRecipient name (optional)

Personal note (optional)

Open in Email App

Sign in as the submission owner to send a branded HTML email from TOE-Share. Anyone can still copy the text or open their email app.

This review was conducted by TOE-Share's multi-agent AI specialist pipeline. Each dimension is independently evaluated by specialist agents (Math/Logic, Sources/Evidence, Science/Novelty), then synthesized by a coordinator agent. This methodology is aligned with the multi-model AI feedback approach validated in Thakkar et al., Nature Machine Intelligence 2026.

TOE-Share — theoryofeverything.ai