About this project

The real AI story depends almost entirely on where you sit. For someone in finance or strategy who hasn't written code, the demos are genuinely astonishing — the feeling is 0 to 100. For engineers and researchers who work adjacent to these systems daily, the practical ceiling looks very different. More like 10 to 15 on the things that actually matter for their work.

Neither perception is dishonest. But the gap between them is generating a lot of decisions — capital allocation, hiring, product bets, policy — that are made with incomplete signal.

This site is an attempt to provide that signal. Not through advocacy in either direction, but through:

Objective benchmarks — scores sourced from papers, reproducible runs, and community submissions, with clear sourcing and verification status.
Subjective experiments — documented tests run against real systems, with full conversation replays so you can evaluate the methodology yourself.
Honest analysis — essays that try to say what the data actually shows, including where it's limited or where the narrative outpaces the evidence.

The goal isn't to be bearish or bullish on AI. It's to give technically literate readers — and decision-makers who want to become more technically literate — a resource they can actually trust.

Have data to contribute or a methodology question? Use the discussion sections on any post, or submit benchmark data via the Submit page.