Pretraining at home: 20B tokens from 222 hours to 12Optimizing training a Llama 3.2 1B model so we can pretrain in a day without going broke.