AI security researcher · Dreadnode

Full Contact
Alignment

I build scalable oversight and control systems for autonomous agents operating in adversarial, high-consequence environments.

View research Dreadnode

Shane Caldwell
Researcher & engineer

Latest research

July 2026 · arXiv

ScopeJudge: Cost-Aware Pre-Execution Gating for Offensive Security Agents

A benchmark of 4,897 offensive-security agent tool calls for evaluating how well runtime monitors detect out-of-scope actions as the amount of available context varies.

Notes from the work

Recent writing

All writing

Apr 20, 2026llms 9 min read

No Autonomy Without Scalable Oversight

What to expect as we enter the Year of The Judge.

Mar 29, 2026llms 8 min read

ProofJudge: Can we align vibe-proving with human taste?

Towards measuring alignment with human taste in autoformalization with judge agents.

Mar 14, 2026llms 12 min read

The Tests All Pass

METR’s SWE-bench analysis shows us taste isn’t verifiable.

Jan 5, 2026llms 31 min read

Intro to GPUs For the Researcher

Getting comfortable with the hardware on a quest for more MFU.

Browse by thread

Research themes

Evaluation & judges Language models Offensive security Training systems

Full ContactAlignment