n8n

How to Automate Gemini Voice Support?

Turn any voice message into a helpful spoken answer. A caller or app sends audio to a public URL, and the system replies with clean, natural speech in seconds. Great for hotlines, help centers, and voice chat on websites.

Here is how it works. A webhook receives the audio file. OpenAI speech to text turns the voice into text. The memory nodes load past messages so answers stay on topic. An aggregate step gathers the context, and the Gemini model creates the reply through a basic chain. Both the user text and the AI reply are saved back to memory for the next turn. A limit node keeps only one item, then ElevenLabs turns the reply text into audio. The response is returned as a binary audio file to the caller.

Setup needs API keys for OpenAI, Google Gemini, and ElevenLabs, plus a voice ID from ElevenLabs. Put the voice ID into the HTTP request URL and add the xi api key header. Expect faster replies, fewer simple tickets, and more consistent answers. This is useful for support hotlines, kiosks, or a voice FAQ that speaks with your brand voice.

What are the key features?

  • Webhook intake receives audio and returns a binary audio reply.
  • OpenAI speech to text converts the caller voice into clean text.
  • Conversation memory uses window buffer, get chat, and insert chat to keep context.
  • Aggregate node collects context so the model sees the right history.
  • Google Gemini model generates quick, helpful answers via a basic chain.
  • Limit node ensures a single item flows to audio generation and reply.
  • ElevenLabs API creates natural speech from the model reply text.
  • Respond to webhook sends the audio file back to the caller instantly.

What are the benefits?

  • Reduce live agent time on common questions by up to 80 percent
  • Cut response time from minutes to under 5 seconds
  • Handle three times more voice inquiries without extra staff
  • Keep a consistent brand tone using a fixed voice profile
  • Maintain full conversation history for quality review

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with OpenAI, Google Gemini and ElevenLabs. See the Tools Required section above for links to create accounts with these services.
  3. Open the Webhook node and copy the Production URL. This is the endpoint where you will send audio files for testing and live use.
  4. Open the OpenAI speech to text node. In the Credential to connect with menu, click Create new credential, paste your OpenAI API key from the OpenAI API page, and save. Send a short audio sample later to confirm it returns text.
  5. Open the Google Gemini chat model node. In the credential dropdown, click Create new credential and add your Google Gemini key from the official API page. Save and test the node with sample input in the workflow execution view.
  6. Open the ElevenLabs HTTP request node. Replace {{voice id}} in the URL with your ElevenLabs voice ID. In the credential dropdown, create a new HTTP custom auth credential and add the header xi-api-key with your ElevenLabs API key. Ensure the Content-Type header is application/json.
  7. Check the window buffer memory node. Keep the default session key for testing, or change it to a value you pass from your app so each user keeps their own conversation history.
  8. Verify the insert chat node messages map the user text from the OpenAI speech to text node and the AI reply from the basic chain node. This ensures the memory stays in sync.
  9. Send a test audio file to the webhook using Postman or curl. Confirm the OpenAI node returns text, the Gemini chain outputs a reply, and the ElevenLabs node returns binary audio.
  10. Play the audio returned by the respond to webhook node. If you get an authentication error from ElevenLabs, recheck the xi-api-key header. If the audio is empty, confirm the text field in the request body comes from the basic chain output.
  11. If multiple items appear in the run, make sure the limit node is set to 1 before the audio generation step. This prevents multi item payloads from breaking the reply.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

ElevenLabs

Sign up

Free: $0 / mo, 10k credits / mo, includes API access

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

OpenAI

Sign up

Pay-as-you-go: GPT-5 at $1.25 per 1M input tokens and $10 per 1M output tokens

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.