How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access

Murni Marcus Posted on May 25 • Originally published at vantage-digital.online How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access # gamedev # ai # llm # npc How We Built Dynamic NPC Dialogue with LLMs We're a small team at Vantage Digital Labs building AI tooling for game developers. Our first product is an NPC dialogue engine powered by LLMs — and we've been running it in early access for a few months now. Here's what we've learned. The Problem Traditional NPC dialogue is written by hand. Every line, every branch, every response to every possible player input. For a small studio making an RPG with 50 NPCs, that's thousands of lines of dialogue — and it's all static. What if NPCs could respond dynamically? What if a merchant could actually react to what the player says, instead of cycling through 3 pre-written lines? Our Architecture We went with a simple but effective pipeline: Player Input → Context Builder → LLM API → Response Parser → Game Engine ↑ | └──── Memory / State ──────────┘ Enter fullscreen mode Exit fullscreen mode Context Builder — Injects the NPC's personality, location, knowledge, and recent conversation history into a system prompt. LLM API — We started with GPT-4o-mini, then tested DeepSeek and Qwen. For cost-sensitive indie games, smaller models work surprisingly well if the prompt is good. Response Parser — Extracts the dialogue text plus metadata like emotion tags ( [emotion:happy] ) and action tags ( [action:wave] ). Memory — A simple relevance-scored store that lets NPCs "remember" past interactions. What Actually Matters After running this for a few months, here's what we found: 1. System Prompt Engineering > Model Size A well-crafted system prompt with a 7B model beats a generic prompt with GPT-4. We spend more time on personality definitions and context injection than on model selection. You are Goron, a friendly dwarven merchant who loves haggling. Location: Marketplace You know about: prices, rare items, local rumors Respond in character. Keep replies under 3 sentences. Enter fullscreen mode Exit fullscreen mode Short, specific, constrained. That's it. 2. Response Parsing is Underrated LLMs are chatty. Games need structured output. We use simple tag extraction: const emotionMatch = raw . match ( / \[ emotion: (\w + )\] /i ); const actionMatch = raw . match ( / \[ action: ([^\]] + )\] /i ); const text = raw . replace ( / \[( emotion|action ) : [^\]] * \] /gi , '' ). trim (); Enter fullscreen mode Exit fullscreen mode This gives us clean dialogue text plus metadata for animation triggers. 3. Latency Matters More Than Quality Players won't wait 3 seconds for an NPC to respond. We target <500ms total latency. This means: Streaming responses (display text as it generates) Smaller models for non-critical NPCs Aggressive caching of common responses 4. Conversation History Windowing Sending the full conversation history is expensive and slow. We window to the last 10 exchanges, with a separate

How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access

Related Articles

The Singleton Labyrinth

Build your first MCP server in TypeScript: the 2026 setup that takes 30 minutes.

Check Wallet Balances Across 4 Chains with Zero Dependencies — chain_balance.py

Comments