◆ Nawk indie coding bench

An independent benchmark of local and hosted models on real coding tasks, run on my own machine and written up as I work things out. The posts make sense of the numbers. The numbers themselves all live in the data page.

Latest

29 June 2026 · local vs API

DeepSeek V4 Flash finishes coding tasks faster than Sonnet and Opus

On this workload a local DeepSeek V4 finished real coding tasks faster in wall-clock than Sonnet or Opus over the API, at roughly Sonnet-grade quality, with the caveats left in rather than sanded off.

Read the writeup →

Where it started

24 June 2026· updated 29 June 2026 · the first post

Which local models stay fast at long context

Most local models feel quick at 8k and turn to dial-up by 150k, it comes down to how they handle attention, and chasing the few that stay fast is where all of this began.

Read the writeup →

The data

Browse the benchmark data

The full interactive scorecard behind these writeups, with the exact prompts, the per-task detail, and every run.

13 models

8 tasks

445 bench runs

Open the scorecard →