Nawk indie coding bench

An independent benchmark of local and hosted models on real coding tasks, run on my own machine and written up as I work things out. The posts make sense of the numbers. The numbers themselves all live in the data page.

Latest

Where it started

The data

Browse the benchmark data

The full interactive scorecard behind these writeups, with the exact prompts, the per-task detail, and every run.

13 models
8 tasks
445 bench runs
Open the scorecard →