GPT-5 is Good, Actually: The Agony and Ecstasy of Public Benchmarks
An attempt to explain why benchmarks are either bad or secret, and why the bar charts don’t matter so much.
An attempt to explain why benchmarks are either bad or secret, and why the bar charts don’t matter so much.
A brief(ish) anecdote and investigation into the religious devotion of Haskell programmers.
So, you mixed user input and instructions.
Exploring the unique challenges of doing real science in the world’s most paranoid industry.
A manifesto on RL in cybersecurity, from when deep RL was the thing.
Because you shouldn’t try and automate anything you can’t do yourself.