Hackbot R&D

METR’s SWE-bench analysis shows us taste isn’t verifiable.
Getting comfortable with the hardware on a quest for more MFU.
The practical realities of devestatingly high communication cost in training.
So, you mixed user input and instructions.
A manifesto on RL in cybersecurity, from when deep RL was the thing.
Because you shouldn’t try and automate anything you can’t do yourself.