Hackbot R&D

Optimizing training a Llama 3.2 1B model so we can pretrain in a day without going broke.
If you contribute a public benchmark, are you giving free capability to your competitors?
Distributed training sans datacenter.
So, you mixed user input and instructions.
A manifesto on RL in cybersecurity, from when deep RL was the thing.
Because you shouldn’t try and automate anything you can’t do yourself.