Problem
LLM-driven games break when narrative output drifts and the underlying state becomes inconsistent.
in progress
A real-time AI game engine that turns unreliable model output into resilient game state.
Problem
LLM-driven games break when narrative output drifts and the underlying state becomes inconsistent.
System
Streaming narrative generation, XML-wrapped state extraction, multi-fallback JSON parsing, and dual-model support.
Result
A playable macOS RPG loop that keeps world state intact even with smaller local models.
RPtext is the clearest example of the kind of engineering work I want to do: taking a messy, probabilistic system and forcing it to behave like reliable software.
The product surface is a text RPG. The actual challenge is systems design. The engine has to stream narrative in real time, extract structured state updates from the same response, validate them, and apply them without breaking the session.
RPtext is a native macOS app where AI generates the entire narrative in real time. There’s no scripted content — the world reacts to what you do and how you say it. Vague or lazy input gets punished. Thoughtful play gets rewarded. NPCs remember you. Factions track your reputation.
It’s a text RPG where the engine is the AI. Every encounter, every line of dialogue, every consequence is generated on the fly based on your choices and the world state.
The player types natural language actions. The AI responds with narrative text, but hidden inside that response is a structured JSON block containing game state updates — health changes, inventory modifications, reputation shifts, NPC memory updates.
This is what I call the XML sandwich architecture: the AI’s response is wrapped in markers that let me extract reliable structured data from an inherently unpredictable output. The model writes narrative for the player, then writes machine-readable state changes for the engine, all in one completion.
Reliable structured output from LLMs. Getting an AI to consistently produce valid JSON inside a narrative response is harder than it sounds. Models forget the schema, hallucinate fields, or break JSON syntax mid-stream. RPtext uses a multi-fallback parsing system: try strict JSON parse first, then regex extraction, then a repair pass. The game never breaks mid-session.
Streaming responses. The narrative streams token-by-token to the UI for that typewriter feel, but the JSON block can’t be parsed until it’s complete. So the engine buffers the structured section while streaming everything else, then processes state changes once the block closes.
Dual model support. RPtext works with both Claude API (for quality) and Ollama running llama3.2:3b locally (for speed and offline play). The prompt architecture is the same, but the parsing needs to be more forgiving with smaller local models.
The result is a system that behaves like software instead of a demo. Narrative can stay fluid and model-driven while the game state remains structured, inspectable, and durable enough for a real session.
That same extraction pipeline now informs other work on the site, especially the lab experiments around prompt design and local-model benchmarking.