GPT-5 is Good, Actually: The Agony and Ecstasy of Public Benchmarks

An attempt to explain why benchmarks are either bad or secret, and why the bar charts don’t matter so much.

August 17, 2025 · 17 min · 3488 words · Shane Caldwell

The Religious Devotion of Haskell

A brief(ish) anecdote and investigation into the religious devotion of Haskell programmers.

July 1, 2024 · 10 min · 2024 words · Shane Caldwell

The Input Sanitization Perspective on Prompt Injection

So, you mixed user input and instructions.

July 2, 2023 · 23 min · 4772 words · Shane Caldwell

Infosec's Data Problem

Exploring the unique challenges of doing real science in the world’s most paranoid industry.

June 2, 2022 · 7 min · 1298 words · Shane Caldwell

Deep Reinforcement Learning for Security: Toward an Autonomous Pentesting Agent

A manifesto on RL in cybersecurity, from when deep RL was the thing.

April 28, 2020 · 30 min · 6178 words · Shane Caldwell

An ML Eng's Review of OSCP

Because you shouldn’t try and automate anything you can’t do yourself.

April 26, 2020 · 16 min · 3343 words · Shane Caldwell