Offsec Evals: Growing Up In The Dark Forest
If you contribute a public benchmark, are you giving free capability to your competitors?
If you contribute a public benchmark, are you giving free capability to your competitors?
An attempt to explain why benchmarks are either bad or secret, and why the bar charts don’t matter so much.