Back to blog
·7 min read

How to Build an AI Agent That Talks to Customers on WhatsApp

Skip the brittle state-machine chatbots. Use an LLM agent + MCP tools for WhatsApp. Complete architecture, code, and production tips.

whatsappai agentsmcpchatbotllm

Most "WhatsApp chatbots" you'll find tutorials for are state machines. They're brittle, hard to maintain, and break the moment a user phrases something unexpectedly.

In 2026, there's a much better pattern: let an AI agent be the brain, and give it WhatsApp as a tool via MCP. The agent handles intent, context, ambiguity, and tool selection — naturally. You add new capabilities by adding new tools, not by editing if/else trees.

This post is the complete architecture, with working code.

The old way (state machine)

User msg → Webhook → Intent classifier → State machine → Response generator → Reply

Every new feature requires a new branch. Edge cases break the flow. Ambiguous user inputs (the common case) trip up the classifier.

The new way (agent + MCP tools)

User msg → Webhook → LLM agent (Claude/GPT) with WhatsApp + business tools → Natural response

The agent's tools include:

  • WhatsApp messaging (send_message, send_template, send_media — provided by @gaviwhatsapp/mcp)
  • Your business APIs (lookup order, check inventory, create ticket, book appointment)
  • Memory store (read past messages, remember user preferences)

When a user says "hey, where's my order?" the agent figures out it needs to call lookup_order(user_phone), then send_whatsapp(...) with the formatted result. It also handles "actually wait, scratch that, I want to cancel" without re-classifying.

Stack

LayerChoice
WhatsApp transportGavi WhatsApp (@gaviwhatsapp/whatsapp SDK or @gaviwhatsapp/mcp MCP server)
Agent runtimeAnthropic API (Claude) or OpenAI API (GPT-4 / GPT-5), or a framework like OpenAI Agents SDK / LangChain
App runtimeNode.js / Next.js / Python / whatever
MemoryPostgres / Supabase / Redis (whatever your app already uses)

Step 1: Register a webhook

The webhook is how WhatsApp tells your app "a user sent a message". Register it once:

import { WhatsApp } from '@gaviwhatsapp/whatsapp'

const wa = new WhatsApp({ apiKey: process.env.GAVIWHATSAPP_API_KEY })

await wa.webhooks.create({
  url: 'https://yourapp.com/api/whatsapp-webhook',
  events: ['message.received']
})

Step 2: Handle incoming messages with an agent

This example uses Claude with native tool use. The exact same pattern works with GPT, OpenAI Agents SDK, LangChain, etc.

import { WhatsApp, verifyWebhookSignature } from '@gaviwhatsapp/whatsapp'
import Anthropic from '@anthropic-ai/sdk'

const wa = new WhatsApp({ apiKey: process.env.GAVIWHATSAPP_API_KEY })
const claude = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })

const tools = [
  {
    name: 'send_whatsapp',
    description: 'Send a WhatsApp text message to a user.',
    input_schema: {
      type: 'object',
      properties: {
        to: { type: 'string', description: 'E.164 phone number, e.g. +919876543210' },
        text: { type: 'string', description: 'Message body' }
      },
      required: ['to', 'text']
    }
  },
  {
    name: 'lookup_order',
    description: 'Look up the latest order for a user by phone number.',
    input_schema: {
      type: 'object',
      properties: { phone: { type: 'string' } },
      required: ['phone']
    }
  }
  // ...add more business tools as you go
]

export async function POST(req: Request) {
  const body = await req.text()
  const signature = req.headers.get('X-GaviVentures-Signature')!

  if (!verifyWebhookSignature(body, signature, process.env.WEBHOOK_SECRET!)) {
    return new Response('Invalid signature', { status: 401 })
  }

  const event = JSON.parse(body)
  if (event.event !== 'message.received') return Response.json({ ok: true })

  const { from, text } = event

  // Load conversation history (last 10 turns)
  const history = await loadHistory(from)

  const response = await claude.messages.create({
    model: 'claude-3-5-sonnet-latest',
    max_tokens: 1024,
    tools,
    system: 'You are a helpful customer-support agent. Always respond to the user via the send_whatsapp tool.',
    messages: [
      ...history,
      { role: 'user', content: text }
    ]
  })

  // Execute tool calls
  for (const block of response.content) {
    if (block.type !== 'tool_use') continue
    if (block.name === 'send_whatsapp') {
      await wa.send(block.input as { to: string; text: string })
    } else if (block.name === 'lookup_order') {
      const order = await db.orders.findLatest({ phone: (block.input as any).phone })
      // feed back to Claude in the next turn
      await saveToolResult(from, block.id, order)
    }
  }

  await saveTurn(from, text, response)
  return Response.json({ ok: true })
}

In production, you'll loop until the agent stops calling tools (max ~5 iterations to prevent runaways).

Step 3: Memory

Store the last N turns per phone number. On each new user message, prepend them as context. The agent uses them to maintain continuity ("remind me what I asked you about earlier?").

async function loadHistory(phone: string) {
  const turns = await db.messages.findAll({
    where: { phone },
    order: [['created_at', 'desc']],
    limit: 10
  })
  return turns.reverse().map(t => ({ role: t.role, content: t.content }))
}

Step 4: Add business tools as you grow

The agent's intelligence scales with the tools you give it. Some that make sense for most chatbots:

  • lookup_order(phone) — order status
  • check_inventory(sku) — stock check
  • create_ticket(phone, summary) — escalate to human support
  • book_appointment(phone, slot) — calendar integration
  • process_refund(order_id) — for support agents (with human approval)
  • send_template(name, variables) — for transactional messages outside the 24h window

When you add a new tool, you don't change the existing logic. The agent figures out when to call it from the description alone.

Why this beats traditional chatbot frameworks

  1. No state machines. The agent handles intent, context, and tool selection naturally.
  2. Handles ambiguity. Real users say weird things. LLMs are designed for this.
  3. Easy to evolve. New feature? New tool. The agent picks it up automatically.
  4. Testable. Tools are pure functions — unit-testable. Conversations are integration-testable.
  5. Multi-language. No need to retrain a classifier per language; the LLM handles it.

Production tips

  • Rate limit per phone number (e.g. 20 messages/hour) to prevent abuse and runaway loops
  • Log everything: tool calls, agent thinking, failures — you'll debug 10x faster
  • Use templates outside the 24h conversation window — Meta requires this for cold/transactional messages
  • Fallback to human handoff if the agent fails or repeats itself 2+ times in a row
  • Cap tool iterations at ~5 per user turn to bound cost and latency
  • Stream tool calls if your runtime supports it for snappier UX

Compliance: things WhatsApp will absolutely enforce

Production chatbots get suspended fast if they ignore the rules below. Bake these into the system prompt and your code, not as an afterthought.

1. Disclose that it's an AI

Meta's Business Messaging Policy and several jurisdictional laws (California SB 1001, EU AI Act Art. 50, India's DPDP guidance) require you to tell users when they're talking to a bot. The cleanest way: open every new conversation with a one-liner like "Hi! I'm Acme's WhatsApp assistant — I can help with orders, returns, and account questions. Type human anytime to reach a person." Add the same disclosure to your business profile description so it's visible before the conversation starts.

2. Honor the 24-hour window

Free-form text from your bot only works inside the 24h customer-service window (i.e. within 24 hours of the user's last message). Outside that window, the bot must use a Meta-approved template — even for what feels like a follow-up. Bots that send free-form text to a 30-day-old conversation are violating policy and will fail silently or get blocked.

3. Match template categories to content

Templates are approved under a category — Authentication, Utility, or Marketing. Sending marketing payloads through a Utility-approved template (or vice versa) is a violation. Submit one template per category you actually use.

4. Watch your quality rating

Meta tracks complaint rates per phone number. Even a 2% block-or-spam-report rate flips you from Green → Yellow → Red, which caps your daily messaging tier. Bots are quality-rating risk #1 because they scale fast. Mitigations: throttle outbound, only message opted-in users, give users a clear STOP/UNSUBSCRIBE path that you actually honor, and surface complaints in your admin dashboard.

5. Honor opt-out instantly

If a user says "stop," "unsubscribe," or anything similar, mark them as opted-out in your DB and never send another business-initiated message to that number. This is non-negotiable per Meta's policy and most consumer-protection laws (GDPR, TCPA, DPDP).

6. Don't auto-message contacts who never opted in

Webhook-driven replies to people who messaged you first are fine. But spinning up a chatbot that initiates conversations with a list of phone numbers you scraped or imported is a one-way ticket to a suspended WABA. Every recipient of a bot-initiated message must have explicit, documented opt-in.

7. Don't claim to be human

If a user asks "Are you a person?" the bot must answer honestly. Lying about being human is a policy violation and can be a legal violation depending on jurisdiction.

Same agent, multiple runtimes

The architecture above works whether you call Claude/GPT directly or use a framework. If you want the WhatsApp tools available as MCP (so the same setup works in OpenAI Agents SDK, LangChain, n8n, Claude Desktop, Cursor, etc.), just point your runtime to:

npx @gaviwhatsapp/mcp --api-key gv_YOUR_KEY

See our other guides for runtime-specific setup:

Pricing

$9.99/mo flat from Gavi. Meta's per-message charges go directly to your WhatsApp Business Account. No markup, no per-conversation fees, no per-tool-call fees.


Try it: gaviventures.com · Webhooks docs · GitHub

Ready to try Gavi WhatsApp?

Send WhatsApp messages from your code, AI agent, or CRM in under 5 minutes.