Problem
Offline narrative generation needs a local model that is both fast enough for streaming and reliable enough for structured state extraction.
experiment
Running llama3.2, mistral, and phi-3 through Ollama for game narrative quality.
Problem
Offline narrative generation needs a local model that is both fast enough for streaming and reliable enough for structured state extraction.
Approach
Benchmarked llama3.2, mistral, and phi-3 across narrative quality, JSON success rate, and tokens per second.
Result
llama3.2:3b became the best tradeoff for RPtext because it balanced speed, narrative quality, and parse reliability.
For RPtext’s offline mode, I needed a local model that could generate decent narrative and produce reliable structured JSON — on a MacBook Air M2 with 8GB RAM.
I tested three models through Ollama: llama3.2:3b, mistral:7b (quantised to 4-bit), and phi-3:3.8b. Each got the same 20 RPG scenarios with the XML sandwich prompt, and I measured narrative quality (subjective 1-5 rating), JSON parse success rate, and tokens per second.
Results:
llama3.2:3b won for RPtext. The speed matters more than you’d think — the typewriter streaming effect breaks immersion if tokens come in too slowly.