Twenty Billion Tokens of What, Exactly?
Looking at the data and letting it look back at us.
Looking at the data and letting it look back at us.
Optimizing training a Llama 3.2 1B model so we can pretrain in a day without going broke.
Distributed training sans datacenter.