Blog

Methodology notes, calibration updates, and product transparency.

2026-07-30 - Adam Murphy

GPT-5.6 Sol Proposed Solutions to Six Erdős Problems. We Reviewed One.

A Columbia PhD student used OpenAI GPT-5.6 Sol to propose solutions to six open Erdős problems in five days. We imported Problem 486 as a reference paper and ran a full multi-agent review. Score: 4.2/5. Here's what the report actually says.

Read more ->

2026-05-11 - Adam Murphy

TheoryOfEverything.ai Beta Is Live

TheoryOfEverything.ai is officially in public beta. We launched the founder video, shared the platform publicly, welcomed our first subscribers, and shipped the review, calibration, video, and trust layers that make the beta ready.

Read more ->

2026-05-07 - Adam Murphy

We Gave TOE-Share a Paper With a Known Error. It Found the Same Problem.

A physicist had already flagged a foundational error in a Proceedings of the Royal Society paper. We handed the original paper to our AI review system without telling it what to look for — and it independently landed on the same missing term.

Read more ->

2026-05-04 - Adam Murphy

AI Just Found a 20-Year-Old Error in a Major Physics Paper. Three of Our Reviews Flagged the Same Neighborhood.

Lean caught a 20-year-old error in a widely-cited physics paper. We ran the same paper through TheoryOfEverything.ai three times — every run flagged the exact equations where the flaw lives.

Read more ->

2026-03-24 - Adam Murphy

My Bet Is on the Automobile

Why the next breakthrough in physics won't come from a data center or a department — it'll come from the open road.

Read more ->

2026-03-20 - Adam Murphy

Five AI Models Wrote the Same Paper. Here's What Happened.

Same quantum-centric supercomputing prompt, five frontier models, one blind multi-agent review panel — a 1.9-point score spread, a dispute that held, and QCAB going from 3.3 to published in four revision cycles.

Read more ->

2026-03-18 - Adam Murphy

WSU Study Confirms What We Built TOE-Share to Fix

A Washington State University study found ChatGPT identifies false scientific claims correctly only 16.4% of the time. Here's why multi-agent architecture changes that equation.

Read more ->

2026-03-17 - Adam Murphy

Why We're Calibrating TOE-Share — And Why It Matters

Every measurement instrument needs calibration. We're applying that same discipline to AI-powered peer review — running papers of known quality through the system and publishing the results.

Read more ->

2026-03-16 - Adam Murphy

Introducing the TOE-Share Calibration Study

We're running a preregistered calibration study to demonstrate how our multi-agent AI review system performs across papers of known quality. Here's the methodology and what we're testing.

Read more ->