n8n

How to Automate WhatsApp Multimodal Support with Gemini and Wikipedia?

Turn WhatsApp into a smart support channel that understands text, images, audio, and video. Great for teams that handle customer questions, booking requests, and simple troubleshooting over chat. The result is faster replies and less manual work for your team.

Incoming messages arrive through a WhatsApp event trigger. The message is split into parts and routed by type using a Switch. Media branches fetch secure URLs from WhatsApp, download the files, and pass them to Google Gemini to transcribe audio, describe video, and explain images. Text is summarized for clarity. All branches normalize the content into a single message object, which goes to an AI Agent with conversation memory and a Wikipedia lookup tool for factual help. The final answer is sent back to the user on WhatsApp.

Setup needs a WhatsApp Cloud API connection and a Google Gemini API key. Once active, chats are handled end to end, which can cut first response time and raise capacity without adding headcount. Use it for FAQs, appointment intent capture, simple product guidance, and document or image checks, all inside WhatsApp.

What are the key features?

  • WhatsApp event trigger receives inbound messages and metadata.
  • Switch node routes messages by type into audio, video, image, or text branches.
  • Media URL fetch and secure download from WhatsApp for audio, video, and images.
  • Google Gemini models transcribe audio, describe video content, and explain images in plain text.
  • Text summarizer condenses long messages to the key points.
  • Window Buffer Memory keeps chat context per phone number for coherent multi turn replies.
  • AI Agent generates the final answer and can consult Wikipedia for factual checks.
  • Response formatter normalizes outputs before sending a clean reply.
  • WhatsApp send node delivers the response back to the user in the same thread.

What are the benefits?

  • Reduce manual triage from 10 minutes to 1 minute per chat by auto analyzing text, images, audio, and video.
  • Automate up to 70 percent of first line questions with AI generated replies.
  • Improve routing accuracy by using message type detection and structured parsing.
  • Handle 5 times more chat volume during peak times without extra staff.
  • Connect WhatsApp and AI knowledge in one place so agents work less and customers get faster answers.

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You will need accounts with WhatsApp Cloud API, Google Gemini and Wikipedia. See the Tools Required section above for links to create accounts with these services.
  3. Open the WhatsApp Trigger node in n8n Cloud and create a new WhatsApp credential. Follow the on screen steps to connect your WhatsApp Cloud API app.
  4. In Meta for Developers, add the n8n webhook URL from the WhatsApp Trigger node, verify the token, and subscribe to message events. Send a test message to confirm n8n receives it.
  5. Double click any WhatsApp media nodes and select the same WhatsApp credential so downloads use the correct token.
  6. Open a Google Gemini node, choose Create new credential, and paste your API key from Google AI Studio. Save and test a sample call to verify access.
  7. No credential is needed for the Wikipedia tool. Ensure the node is connected to the AI Agent as a tool.
  8. Check the Switch node rules. Confirm the values match your message types for audio, video, image, and text so each path triggers correctly.
  9. Verify the Window Buffer Memory session key uses the phone number so replies stay in context across multiple messages.
  10. Activate the workflow and send test messages: one text, one image, one audio note, and one short video. Check the Execution log to confirm each branch runs and a reply is sent.
  11. If media fails to download, confirm your WhatsApp app is in live mode and the user is approved for messages. If video analysis errors, try a shorter clip or confirm your Gemini model supports that format.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

WhatsApp Cloud API

Sign up

Free: Service messages $0 per message (unlimited). Utility templates within 24h customer service window: $0. Other templates billed per message by country.

Wikipedia

Sign up

Free: $0 (public Wikimedia APIs). Enterprise Free: $0 with 5,000 on‑demand requests / mo and twice‑monthly snapshots

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.