n8n

How to Automate WhatsApp AI Support with Gemini?

Turn WhatsApp into a smart support inbox that understands text, audio, images, and video. Messages are read, prepared with AI, and answered in the same chat. Teams can give fast help without switching tools or copying files.

Incoming messages are captured by a WhatsApp event trigger. The flow splits the message parts and routes each type with a switch. Media files are fetched from WhatsApp using secure links and downloaded in n8n. Audio is transcribed with a Gemini model. Video is described with a Gemini video node. Images are analyzed with an image explainer. Text is summarized for quick context. All paths send a clean, unified message to an AI agent that keeps chat history and can look up facts on Wikipedia. The reply is sent back to the user through WhatsApp.

You will need access to the WhatsApp Business Platform and a Google Gemini API key. A public URL is required so WhatsApp can reach your webhook. Expect faster replies, fewer handoffs, and steady quality across many chats. Use it for customer support, booking help, or quick document and media checks during a conversation.

What are the key features?

  • WhatsApp event trigger collects incoming messages and starts the flow.
  • Split Out and Switch nodes route text, audio, image, and video to the right path with a fallback to text.
  • WhatsApp media URL fetch and authenticated HTTP download pull audio, video, and image files securely.
  • Google Gemini Audio node transcribes voice notes into clean text.
  • Google Gemini Video node generates short descriptions and key details from video clips.
  • Image Explainer interprets images and returns plain language context.
  • Text Summarizer condenses long text messages for quick understanding.
  • Set node unifies outputs and adds message type and content for the AI agent.
  • Window Buffer Memory keeps conversation history per WhatsApp user for better context.
  • AI Agent uses a Wikipedia tool for factual lookups and sends the final reply through WhatsApp.

What are the benefits?

  • Reduce manual triage from 10 minutes to 30 seconds by auto routing each message type.
  • Automate 60 to 80 percent of common WhatsApp questions with the AI agent.
  • Handle up to 5 times more concurrent chats with the same team size.
  • Eliminate 90 percent of media download and copy paste work with direct API fetch.
  • Improve reply consistency and cut repeated questions by keeping chat memory.

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with WhatsApp Business Platform, Google Gemini and Wikipedia. See the Tools Required section above for links to create accounts with these services.
  3. Open the WhatsApp Trigger node. In the credential field, click Create new credential and follow the on screen steps to connect your WhatsApp Business account. Make sure your n8n instance has a public URL so WhatsApp can reach it.
  4. In your Meta for Developers dashboard, add the n8n webhook URL from the WhatsApp Trigger. Subscribe to message events and verify the callback. Send a test message to confirm that n8n receives it.
  5. Double click the Get Audio URL, Get Video URL, Get Image URL, Download Audio, Download Video, Download Image, and Respond to User nodes. For each, choose the same WhatsApp credential you created so media fetch and replies are authorized.
  6. Set up Google Gemini. In n8n, open any Google Gemini node, choose Create new credential, and paste your API key from the Google Gemini API page. Reuse this credential for all Gemini nodes.
  7. No key is needed for Wikipedia. Ensure the Wikipedia tool node is connected to the AI Agent as a tool input.
  8. Check message routing. Open the Redirect Message Types node and review the rules for audio, video, image, and text. Keep the fallback output set to Text Message.
  9. Validate media processing. Send an audio note, an image, and a short video to your WhatsApp number. In n8n, watch the Download and Gemini nodes and confirm text outputs are produced.
  10. Test text handling. Send a long text message. Confirm the Text Summarizer creates a short summary and that the Set node builds a clean payload for the agent.
  11. Open the AI Agent node. Review the system prompt, tool list, and memory settings. The Window Buffer Memory node uses the sender number to keep context between messages.
  12. Open the Format Response and Format Response1 nodes if present and adjust formatting of AI outputs to suit your tone and style.
  13. Activate the workflow. Send mixed content messages from your phone. Confirm that replies return to WhatsApp quickly and match your expected format.
  14. Troubleshoot: If media downloads fail with 401 or 403, recheck WhatsApp credentials and app permissions. If the trigger does not fire, verify the webhook URL, public reachability, and event subscriptions.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

WhatsApp Business Platform

Sign up

Service conversations: $0 (unlimited; effective Nov 1, 2024)

Wikipedia

Sign up

Free: $0 (public Wikimedia APIs). Enterprise Free: $0 with 5,000 on‑demand requests / mo and twice‑monthly snapshots

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.