Independent AI Analysis

What AI can actually do.
Not what the pitch says.

Objective benchmarks, first-hand experiments, and honest analysis across all major AI systems. Built for the technically curious — and for anyone making real decisions about AI.

View Benchmarks Browse Experiments

Latest

All articles

No posts yet

Articles and experiments will appear here once published.

Recent Scores

All

GPT-4 Turbo

MMLU

86.5%✓

Claude 3.5 Sonnet

MMLU

88.7%✓

Claude 3 Opus

MMLU

86.8%✓

Gemini 1.5 Pro

MMLU

85.9%✓

GPT-4o

MMLU

88.7%✓

Models Tracked

GPT-4o GPT-4 Turbo Claude 3.5 Sonnet Claude 3 Opus Claude 3.7 Sonnet Gemini 1.5 Pro Gemini 2.0 Flash Llama 3.1 405B

About this project

The gap between AI hype and AI reality depends heavily on where you start. For someone outside tech, demos can feel like going from 0→100. For engineers, it's often 10→15.

This site exists to quantify that gap — with real numbers and documented experiments anyone can scrutinize.

What AI can actually do.Not what the pitch says.

Latest

Recent Scores

Models Tracked

About this project

What AI can actually do.
Not what the pitch says.