How to Build an AI Agent That Talks to Customers on WhatsApp
Skip the brittle state-machine chatbots. Use an LLM agent + MCP tools for WhatsApp. Complete architecture, code, and production tips.
Most "WhatsApp chatbots" you'll find tutorials for are state machines. They're brittle, hard to maintain, and break the moment a user phrases something unexpectedly.
In 2026, there's a much better pattern: let an AI agent be the brain, and give it WhatsApp as a tool via MCP. The agent handles intent, context, ambiguity, and tool selection — naturally. You add new capabilities by adding new tools, not by editing if/else trees.
This post is the complete architecture, with working code.
The old way (state machine)
User msg → Webhook → Intent classifier → State machine → Response generator → Reply
Every new feature requires a new branch. Edge cases break the flow. Ambiguous user inputs (the common case) trip up the classifier.
The new way (agent + MCP tools)
User msg → Webhook → LLM agent (Claude/GPT) with WhatsApp + business tools → Natural response
The agent's tools include:
- WhatsApp messaging (
send_message,send_template,send_media— provided by@gaviwhatsapp/mcp) - Your business APIs (lookup order, check inventory, create ticket, book appointment)
- Memory store (read past messages, remember user preferences)
When a user says "hey, where's my order?" the agent figures out it needs to call lookup_order(user_phone), then send_whatsapp(...) with the formatted result. It also handles "actually wait, scratch that, I want to cancel" without re-classifying.
Stack
| Layer | Choice |
|---|---|
| WhatsApp transport | Gavi WhatsApp (@gaviwhatsapp/whatsapp SDK or @gaviwhatsapp/mcp MCP server) |
| Agent runtime | Anthropic API (Claude) or OpenAI API (GPT-4 / GPT-5), or a framework like OpenAI Agents SDK / LangChain |
| App runtime | Node.js / Next.js / Python / whatever |
| Memory | Postgres / Supabase / Redis (whatever your app already uses) |
Step 1: Register a webhook
The webhook is how WhatsApp tells your app "a user sent a message". Register it once:
import { WhatsApp } from '@gaviwhatsapp/whatsapp'
const wa = new WhatsApp({ apiKey: process.env.GAVIWHATSAPP_API_KEY })
await wa.webhooks.create({
url: 'https://yourapp.com/api/whatsapp-webhook',
events: ['message.received']
})
Step 2: Handle incoming messages with an agent
This example uses Claude with native tool use. The exact same pattern works with GPT, OpenAI Agents SDK, LangChain, etc.
import { WhatsApp, verifyWebhookSignature } from '@gaviwhatsapp/whatsapp'
import Anthropic from '@anthropic-ai/sdk'
const wa = new WhatsApp({ apiKey: process.env.GAVIWHATSAPP_API_KEY })
const claude = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })
const tools = [
{
name: 'send_whatsapp',
description: 'Send a WhatsApp text message to a user.',
input_schema: {
type: 'object',
properties: {
to: { type: 'string', description: 'E.164 phone number, e.g. +919876543210' },
text: { type: 'string', description: 'Message body' }
},
required: ['to', 'text']
}
},
{
name: 'lookup_order',
description: 'Look up the latest order for a user by phone number.',
input_schema: {
type: 'object',
properties: { phone: { type: 'string' } },
required: ['phone']
}
}
// ...add more business tools as you go
]
export async function POST(req: Request) {
const body = await req.text()
const signature = req.headers.get('X-GaviVentures-Signature')!
if (!verifyWebhookSignature(body, signature, process.env.WEBHOOK_SECRET!)) {
return new Response('Invalid signature', { status: 401 })
}
const event = JSON.parse(body)
if (event.event !== 'message.received') return Response.json({ ok: true })
const { from, text } = event
// Load conversation history (last 10 turns)
const history = await loadHistory(from)
const response = await claude.messages.create({
model: 'claude-3-5-sonnet-latest',
max_tokens: 1024,
tools,
system: 'You are a helpful customer-support agent. Always respond to the user via the send_whatsapp tool.',
messages: [
...history,
{ role: 'user', content: text }
]
})
// Execute tool calls
for (const block of response.content) {
if (block.type !== 'tool_use') continue
if (block.name === 'send_whatsapp') {
await wa.send(block.input as { to: string; text: string })
} else if (block.name === 'lookup_order') {
const order = await db.orders.findLatest({ phone: (block.input as any).phone })
// feed back to Claude in the next turn
await saveToolResult(from, block.id, order)
}
}
await saveTurn(from, text, response)
return Response.json({ ok: true })
}
In production, you'll loop until the agent stops calling tools (max ~5 iterations to prevent runaways).
Step 3: Memory
Store the last N turns per phone number. On each new user message, prepend them as context. The agent uses them to maintain continuity ("remind me what I asked you about earlier?").
async function loadHistory(phone: string) {
const turns = await db.messages.findAll({
where: { phone },
order: [['created_at', 'desc']],
limit: 10
})
return turns.reverse().map(t => ({ role: t.role, content: t.content }))
}
Step 4: Add business tools as you grow
The agent's intelligence scales with the tools you give it. Some that make sense for most chatbots:
lookup_order(phone)— order statuscheck_inventory(sku)— stock checkcreate_ticket(phone, summary)— escalate to human supportbook_appointment(phone, slot)— calendar integrationprocess_refund(order_id)— for support agents (with human approval)send_template(name, variables)— for transactional messages outside the 24h window
When you add a new tool, you don't change the existing logic. The agent figures out when to call it from the description alone.
Why this beats traditional chatbot frameworks
- No state machines. The agent handles intent, context, and tool selection naturally.
- Handles ambiguity. Real users say weird things. LLMs are designed for this.
- Easy to evolve. New feature? New tool. The agent picks it up automatically.
- Testable. Tools are pure functions — unit-testable. Conversations are integration-testable.
- Multi-language. No need to retrain a classifier per language; the LLM handles it.
Production tips
- Rate limit per phone number (e.g. 20 messages/hour) to prevent abuse and runaway loops
- Log everything: tool calls, agent thinking, failures — you'll debug 10x faster
- Use templates outside the 24h conversation window — Meta requires this for cold/transactional messages
- Fallback to human handoff if the agent fails or repeats itself 2+ times in a row
- Cap tool iterations at ~5 per user turn to bound cost and latency
- Stream tool calls if your runtime supports it for snappier UX
Compliance: things WhatsApp will absolutely enforce
Production chatbots get suspended fast if they ignore the rules below. Bake these into the system prompt and your code, not as an afterthought.
1. Disclose that it's an AI
Meta's Business Messaging Policy and several jurisdictional laws (California SB 1001, EU AI Act Art. 50, India's DPDP guidance) require you to tell users when they're talking to a bot. The cleanest way: open every new conversation with a one-liner like "Hi! I'm Acme's WhatsApp assistant — I can help with orders, returns, and account questions. Type human anytime to reach a person." Add the same disclosure to your business profile description so it's visible before the conversation starts.
2. Honor the 24-hour window
Free-form text from your bot only works inside the 24h customer-service window (i.e. within 24 hours of the user's last message). Outside that window, the bot must use a Meta-approved template — even for what feels like a follow-up. Bots that send free-form text to a 30-day-old conversation are violating policy and will fail silently or get blocked.
3. Match template categories to content
Templates are approved under a category — Authentication, Utility, or Marketing. Sending marketing payloads through a Utility-approved template (or vice versa) is a violation. Submit one template per category you actually use.
4. Watch your quality rating
Meta tracks complaint rates per phone number. Even a 2% block-or-spam-report rate flips you from Green → Yellow → Red, which caps your daily messaging tier. Bots are quality-rating risk #1 because they scale fast. Mitigations: throttle outbound, only message opted-in users, give users a clear STOP/UNSUBSCRIBE path that you actually honor, and surface complaints in your admin dashboard.
5. Honor opt-out instantly
If a user says "stop," "unsubscribe," or anything similar, mark them as opted-out in your DB and never send another business-initiated message to that number. This is non-negotiable per Meta's policy and most consumer-protection laws (GDPR, TCPA, DPDP).
6. Don't auto-message contacts who never opted in
Webhook-driven replies to people who messaged you first are fine. But spinning up a chatbot that initiates conversations with a list of phone numbers you scraped or imported is a one-way ticket to a suspended WABA. Every recipient of a bot-initiated message must have explicit, documented opt-in.
7. Don't claim to be human
If a user asks "Are you a person?" the bot must answer honestly. Lying about being human is a policy violation and can be a legal violation depending on jurisdiction.
Same agent, multiple runtimes
The architecture above works whether you call Claude/GPT directly or use a framework. If you want the WhatsApp tools available as MCP (so the same setup works in OpenAI Agents SDK, LangChain, n8n, Claude Desktop, Cursor, etc.), just point your runtime to:
npx @gaviwhatsapp/mcp --api-key gv_YOUR_KEY
See our other guides for runtime-specific setup:
Pricing
$9.99/mo flat from Gavi. Meta's per-message charges go directly to your WhatsApp Business Account. No markup, no per-conversation fees, no per-tool-call fees.
Try it: gaviventures.com · Webhooks docs · GitHub
Ready to try Gavi WhatsApp?
Send WhatsApp messages from your code, AI agent, or CRM in under 5 minutes.