Offsec Evals: Growing Up In The Dark Forest

If you contribute a public benchmark, are you giving free capability to your competitors?

October 28, 2025 · 11 min · 2258 words · Shane Caldwell

GPT-5 is Good, Actually: The Agony and Ecstasy of Public Benchmarks

An attempt to explain why benchmarks are either bad or secret, and why the bar charts don’t matter so much.

August 17, 2025 · 17 min · 3488 words · Shane Caldwell