n8n

How to Automate WhatsApp Multimodal Support Assistant?

Turn your WhatsApp inbox into a smart assistant that understands text, voice notes, photos, and PDF documents. It is ideal for support and pre sales teams that want fast answers without switching tools or copying data between apps.

Incoming messages are received by the WhatsApp trigger and routed by type. Images are fetched and analyzed with an AI vision model to produce a clear description. Voice notes are downloaded and transcribed to text. PDFs are checked for the right format, downloaded, and their text is extracted. All content is then normalized and sent to an AI agent with short term memory so replies stay on topic across messages. The system can answer with text or return an AI generated audio reply. If a file type is not allowed, a helpful notice is sent back.

Setup needs a WhatsApp Business Platform app, a verified phone number, and an OpenAI API key. Once connected, teams can cut reply time, handle more chats, and support users around the clock. Common uses include answering FAQs, explaining product photos, and reading attached brochures to solve issues faster.

What are the key features?

  • WhatsApp Trigger captures new messages and media instantly
  • Automatic routing by input type for text, image, audio, and document
  • Secure media download using HTTP Header Auth with bearer token
  • AI vision analysis of images with GPT 4o mini for clear descriptions
  • Audio transcription to text for voice notes using OpenAI
  • PDF text extraction to read attached documents and brochures
  • AI agent with session memory to keep context across replies
  • Smart response delivery as text or synthesized audio in WhatsApp
  • File validation and helpful error messages for unsupported formats

What are the benefits?

  • Reduce triage time from minutes per chat to seconds by auto routing message types
  • Automate up to 80 percent of common questions with AI driven replies
  • Handle text, images, voice, and PDFs in one place without extra tools
  • Improve answer quality by using image details and PDF content directly
  • Scale to support many more WhatsApp conversations with the same team

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with WhatsApp Business Platform and OpenAI. See the Tools Required section above for links to create accounts with these services.
  3. In Meta for Developers, add the WhatsApp product to your app, connect your Business Manager, add a phone number, and subscribe to messages and message_status events.
  4. In the WhatsApp app settings, set the webhook callback URL to your n8n WhatsApp Trigger URL and keep the verify token consistent in both places.
  5. In the n8n credentials manager, create or edit the WhatsApp credentials. If unsure, double click each WhatsApp node, open the 'Credential to connect with' dropdown, click 'Create new credential', and follow the on screen instructions.
  6. Create an HTTP Header Auth credential named WhatsApp with Authorization set to Bearer your_access_token for the HTTP Request nodes that download media.
  7. In the n8n credentials manager, create an OpenAI credential. Generate an API key in your OpenAI account and paste it into the credential form.
  8. Open the Get Image Url, Get Audio Url, and Get File Url nodes and confirm they reference the correct media IDs from the WhatsApp Trigger fields.
  9. Check the Only PDF File condition to ensure only PDFs pass. If testing with other document types, expect the Incorrect format message to be sent.
  10. Open the AI Agent and Simple Memory nodes. Confirm the session key uses the WhatsApp user id so replies keep context across the chat.
  11. Test the setup by sending a text, then a photo, a voice note, and a PDF to your WhatsApp number. Watch the executions in n8n and confirm the correct branch runs each time.
  12. If media download fails, verify your access token is valid and not expired. If audio is not recognized, check the Fix mimeType for Audio code node and confirm the file type is supported.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

OpenAI

Sign up

Pay-as-you-go: GPT-5 at $1.25 per 1M input tokens and $10 per 1M output tokens

WhatsApp Business Platform

Sign up

Service conversations: $0 (unlimited; effective Nov 1, 2024)

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.