A deep dive into intent-driven agent architecture — from theory to a fully working chat agent with memory, strategies, and knowledge extraction.
Most “AI chatbots” are stateless wrappers around an LLM. They receive a message, generate a response, and forget everything. A true agent is fundamentally different: it perceives, thinks, acts, and remembers — building up knowledge over time that makes it better at its job.This guide walks you through the architecture behind intent-driven agents and builds a complete, production-ready chat agent using minns-sdk and Minns Memory Layer.
Consider a customer who contacts your bot three times:
Monday: “I like Sci-Fi movies” → Bot responds, forgets.
Wednesday: “What should I watch?” → Bot has no idea they like Sci-Fi.
Friday: “I tried to book but it failed” → Bot has no context about the failure.
With an agent backed by Minns Memory Layer, every interaction is stored as an event. The agent forms memories, extracts claims (“User likes Sci-Fi”), and learns strategies (“When booking fails, offer alternative showtimes”). By Friday, the agent knows the user, remembers the failure, and has a strategy for recovery.
The most critical design decision in an agent is: how does the LLM’s free-form text output translate into structured actions?The naive approach is to prompt the LLM to return JSON. This breaks constantly — LLMs hallucinate brackets, forget commas, and wrap JSON in markdown fences. The intent model solves this cleanly.
Instead of asking the LLM to be a JSON serializer, you separate its output into two distinct parts:
The assistant response — natural language for the user (what the LLM is good at)
The intent block — a structured action declaration in a fenced, parseable format
The LLM produces both in a single generation. A local parser (the sidecar) extracts the intent block without any additional network calls.
Copy
Hey! I found 3 available seats for Interstellar tonight.Would you like the aisle seat in row H or the center seat in row J?---INTENT---action: show_optionsmovie: Interstellaroptions: ["H12-aisle", "J15-center", "J16-center"]showtime: 2026-02-06T19:30:00Z---END---
The sidecar parser splits this into:
assistantResponse: "Hey! I found 3 available seats..."
An intent spec defines what actions your agent can take. Think of it as the agent’s “tool belt.” You declare the spec once, and the SDK generates the prompt instructions automatically.
Copy
const intentSpec = { actions: [ { name: "search_movies", description: "Search for movies by title, genre, or showtime", parameters: { query: { type: "string", description: "Search query", required: true }, genre: { type: "string", description: "Genre filter" }, date: { type: "string", description: "Date filter (ISO 8601)" }, }, }, { name: "check_availability", description: "Check seat availability for a specific movie and showtime", parameters: { movie: { type: "string", required: true }, showtime: { type: "string", required: true }, }, }, { name: "book_ticket", description: "Book a ticket for the user", parameters: { movie: { type: "string", required: true }, seat: { type: "string", required: true }, showtime: { type: "string", required: true }, }, }, { name: "respond_only", description: "No action needed — just respond to the user conversationally", parameters: {}, }, ],};
The SDK converts this into a clear instruction block that gets appended to the LLM’s system prompt. The LLM learns to emit the ---INTENT--- block naturally.
Here’s the full architecture of an intent-driven agent:The key insight: the LLM is just one component. It sits between the retrieval layer and the execution layer. Memory, strategies, and claims flow into the LLM as context, and structured intents flow out for execution.
This example uses OpenAI, but the intent model works with any LLM — Claude, Llama, Mistral, Gemini, or a local model. The sidecar parser doesn’t care which model produced the output.
Define every action the agent can perform. Be specific — vague actions lead to vague intents.
agent.ts
Copy
const intentSpec = { actions: [ { name: "search_movies", description: "Search for movies by title, genre, or showtime. Use when the user is browsing or asking what's available.", parameters: { query: { type: "string", description: "Search terms", required: true }, genre: { type: "string", description: "Genre filter (e.g., 'sci-fi', 'comedy')" }, date: { type: "string", description: "Date (ISO 8601)" }, }, }, { name: "check_availability", description: "Check seat availability for a specific movie and showtime. Use after the user has chosen a movie.", parameters: { movie: { type: "string", description: "Movie title", required: true }, showtime: { type: "string", description: "Showtime (ISO 8601)", required: true }, }, }, { name: "book_ticket", description: "Book a ticket. Use only when the user has confirmed their movie, seat, and showtime.", parameters: { movie: { type: "string", description: "Movie title", required: true }, seat: { type: "string", description: "Seat ID (e.g., 'H12')", required: true }, showtime: { type: "string", description: "Showtime (ISO 8601)", required: true }, }, }, { name: "cancel_booking", description: "Cancel an existing booking by confirmation ID.", parameters: { confirmation_id: { type: "string", description: "Booking confirmation ID", required: true }, }, }, { name: "respond_only", description: "Just respond conversationally. No tool call needed. Use for greetings, clarifications, or when the user's intent is unclear.", parameters: {}, }, ],};
Write action descriptions as if you’re training a new employee. Include when to use each action, not just what it does. This dramatically improves intent accuracy.
Sidecar instructions — how to format the intent block (auto-generated by the SDK)
Dynamic context — memories, claims, and suggestions injected at runtime
agent.ts
Copy
function buildSystemPrompt( sidecarInstruction: string, memories: string, claims: string, suggestions: string): string { return `You are a friendly and knowledgeable movie booking assistant. You help users find movies, check availability, and book tickets.## Your personality- Warm and enthusiastic about movies- Proactive — suggest options, don't just answer questions- Concise — keep responses under 3 sentences unless the user asks for detail## What you know about this user${claims || "No known preferences yet."}## Relevant past interactions${memories || "No previous interactions found."}## Suggested next actions${suggestions || "No suggestions available — use your best judgment."}## Intent format instructions${sidecarInstruction}IMPORTANT: Always include an intent block in your response, even if the action is "respond_only".`;}
Before every LLM call, the agent gathers context from Minns Memory Layer. This is the Perceive phase.
agent.ts
Copy
interface AgentContext { memories: string; claims: string; suggestions: string; goalProgress: number; contextHash?: number;}async function gatherContext( sessionId: number, userId: string, currentGoal: string): Promise<AgentContext> { // Run all retrievals in parallel for speed const [claimsResult, memoriesResult, strategiesResult] = await Promise.all([ // Soft facts — what do we know about this user? client.searchClaims({ query_text: `User preferences and history for ${currentGoal}`, top_k: 5, min_similarity: 0.6, }), // Past episodes — what happened last time in a similar situation? client.getContextMemories( { active_goals: [ { id: 101, description: currentGoal, priority: 0.9, progress: 0.0, subgoals: [] }, ], environment: { variables: { user_id: userId }, temporal: { deadlines: [], patterns: [] }, }, resources: { computational: { cpu_percent: 10, memory_bytes: 1024, storage_bytes: 1024, network_bandwidth: 100 }, external: {}, }, }, { limit: 3, min_similarity: 0.5 } ), // Strategies — what worked before? client.getSimilarStrategies({ goal_ids: [101], tool_names: ["search_movies", "book_ticket"], result_types: [], limit: 2, min_score: 0.6, }), ]); const claims = claimsResult .map(c => `- ${c.claim_text} (confidence: ${c.confidence.toFixed(2)})`) .join("\n"); const memories = memoriesResult .map(m => `- [${m.memory_type}] ${m.outcome} (strength: ${m.strength.toFixed(2)})`) .join("\n"); const suggestions = strategiesResult .map(s => `- Strategy "${s.name}": ${s.action_hint} (quality: ${s.quality_score.toFixed(2)})`) .join("\n"); return { memories, claims, suggestions, goalProgress: 0.0 };}
The three retrievals run in parallel with Promise.all. This keeps latency low — you’re adding ~1 round-trip to Minns Memory Layer, not 3 sequential ones.
The Remember phase. Every turn gets logged to Minns Memory Layer — the user message, the agent’s reasoning, the action, and the outcome. This is what powers memory formation, claim extraction, and strategy learning.
agent.ts
Copy
async function logTurn( sessionId: number, userId: string, userMessage: string, parsed: ParsedTurn, actionResult: ActionResult | null, goalProgress: number) { // 1. Log the user's message as a Context event (enables claim extraction) await client.event(AGENT_TYPE, { agentId: AGENT_ID, sessionId }) .context(userMessage, "conversation") .goal("book_movie", 5, goalProgress) .state({ user_id: userId, turn: "user" }) .enqueue(); // 2. If there was an action, log it with its outcome if (parsed.intent && parsed.intent.action !== "respond_only" && actionResult) { const { action, ...params } = parsed.intent; if (actionResult.success) { await client.event(AGENT_TYPE, { agentId: AGENT_ID, sessionId }) .action(action, params) .outcome(actionResult.data) .goal("book_movie", 5, goalProgress) .state({ user_id: userId, turn: "action" }) .enqueue(); } else { // Log failed actions too — they become Negative memories await client.event(AGENT_TYPE, { agentId: AGENT_ID, sessionId }) .action(action, params) .goal("book_movie", 5, goalProgress) .state({ user_id: userId, turn: "action", error: actionResult.error }) .enqueue(); } } // 3. Log the assistant's response for claim extraction await client.event(AGENT_TYPE, { agentId: AGENT_ID, sessionId }) .context(parsed.assistantResponse, "conversation") .goal("book_movie", 5, goalProgress) .state({ user_id: userId, turn: "assistant" }) .enqueue();}
Use enqueue() for logging — it returns immediately and batches events in the background. This keeps the agent loop fast. Reserve send() for the final event where you need confirmation.
Because every event carries the same goal("book_movie", ...), Minns Memory Layer groups them into an episode. When goalProgress hits 1.0, the episode completes and becomes a candidate for long-term memory.The next time a similar context appears (same goal, same user), getContextMemories() returns the past episode. The agent can say: “Last time you booked Interstellar row H — would you like the same?”
After several successful booking episodes, Minns Memory Layer detects a pattern:
Strategy: “standard_booking”
Greet user, ask for movie
Search available movies
Check seat availability
Confirm and book
Quality: 0.92 | Confidence: 0.88
This strategy appears in getSimilarStrategies() and gets injected into the system prompt as a “suggested next action.” The agent follows proven recipes instead of improvising every time.
When a booking fails — say the payment gateway is down — the failed episode is stored as a Negative memory. The next time the same context appears, the agent retrieves it and can proactively say: “I notice payments were slow earlier — let me check the status before we proceed.”
LLMs have limited context windows. For long conversations, trim the history:
Copy
function trimHistory( history: OpenAI.ChatCompletionMessageParam[], maxTurns: number = 10): OpenAI.ChatCompletionMessageParam[] { if (history.length <= maxTurns * 2) return history; // Keep the first 2 messages (initial context) and the last maxTurns pairs const start = history.slice(0, 2); const recent = history.slice(-(maxTurns * 2)); return [...start, { role: "system", content: "[Earlier conversation trimmed]" }, ...recent];}
The beauty of Minns Memory Layer is that trimmed messages aren’t lost — they’re already stored as events. Claims extracted from early turns persist in semantic memory even after the conversation window moves forward.
The intent model gives your agent a clean separation between thinking (LLM) and doing (tools). Minns Memory Layer gives it the ability to remember and improve. Together, they turn a stateless chatbot into an agent that gets better with every conversation.