Twenty Billion Tokens of What, Exactly?

Looking at the data and letting it look back at us.

December 1, 2025 · 23 min · 4761 words · Shane Caldwell

Pretraining at home: 20B tokens from 222 hours to 12

Optimizing training a Llama 3.2 1B model so we can pretrain in a day without going broke.

November 23, 2025 · 21 min · 4329 words · Shane Caldwell

DiLoCo: Data Parallelism for the Datacenter Poor

Distributed training sans datacenter.

October 3, 2025 · 25 min · 5275 words · Shane Caldwell