Turn WhatsApp into a smart support inbox that understands text, audio, images, and video. Messages are read, prepared with AI, and answered in the same chat. Teams can give fast help without switching tools or copying files.
Incoming messages are captured by a WhatsApp event trigger. The flow splits the message parts and routes each type with a switch. Media files are fetched from WhatsApp using secure links and downloaded in n8n. Audio is transcribed with a Gemini model. Video is described with a Gemini video node. Images are analyzed with an image explainer. Text is summarized for quick context. All paths send a clean, unified message to an AI agent that keeps chat history and can look up facts on Wikipedia. The reply is sent back to the user through WhatsApp.
You will need access to the WhatsApp Business Platform and a Google Gemini API key. A public URL is required so WhatsApp can reach your webhook. Expect faster replies, fewer handoffs, and steady quality across many chats. Use it for customer support, booking help, or quick document and media checks during a conversation.