n8n

How to Generate OpenAI ElevenLabs Voice Support?

Turn voice questions into fast, helpful voice answers. Great for support teams that want a simple voice assistant for FAQs, status updates, or guided help. The flow accepts audio, understands the message with AI, writes a clear reply, and returns a natural voice response.

Here is how it works end to end. A webhook receives an audio file from your app or form. OpenAI Speech to Text turns the audio into text. The system pulls past messages with Get Chat, Aggregate, and a Window Buffer Memory to keep context, so replies stay on topic. Google Gemini writes the answer through a Basic LLM Chain. The new messages are saved back with Insert Chat. A Limit node passes one clean item to ElevenLabs, which generates the reply as audio using an HTTP Request. The response returns as a binary audio file through Respond to Webhook.

To set it up, you need API keys for OpenAI, Google Gemini, and ElevenLabs, plus a Voice ID in ElevenLabs. Expect faster replies, less typing, and consistent tone. Use it for helpdesk portals, voice widgets, or internal tools that need clear spoken answers. Map your session key strategy for multi user conversations and choose a voice that matches your brand.

What are the key features?

  • Webhook intake captures audio from any app or form that can send HTTP requests
  • OpenAI Speech to Text converts voice into accurate text for processing
  • Memory tools Get Chat and Window Buffer Memory keep conversation context
  • Aggregate compiles prior messages into a single context block
  • Google Gemini through a Basic LLM Chain generates clear, useful replies
  • Insert Chat writes the latest user and AI messages back to memory
  • Limit ensures a single clean item goes to text to speech
  • ElevenLabs HTTP Request creates natural voice audio from the AI reply
  • Respond to Webhook returns the audio file as binary for instant playback

What are the benefits?

  • Reduce response time from minutes to seconds by answering voice questions automatically
  • Automate up to 80% of common support replies with consistent tone and style
  • Handle 5 to 10 concurrent conversations through the webhook without extra staff
  • Cut manual transcription and typing work by over 90% using speech to text
  • Keep context across turns to lower repeated questions and improve clarity

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You will need accounts with OpenAI, Google Gemini, ElevenLabs, Typeform, Gravity Forms, Zapier and Webhook.site. See the Tools Required section above for links to create accounts with these services.
  3. Open the Webhook node and copy the unique URL. In your form tool or test tool, configure a POST request to this URL with an audio file attachment.
  4. Double click the OpenAI Speech to Text node. In the Credential to connect with dropdown, click Create new credential, then add your OpenAI API key from the OpenAI API portal. Save the credential.
  5. Open the Google Gemini Chat Model node. Create a new Google Gemini PaLM API credential and paste your API key from the Google AI Studio API page. Save the credential.
  6. Open the ElevenLabs HTTP Request node. In authentication, choose custom HTTP auth. Create a new credential and add header xi-api-key with your ElevenLabs API key. Set Content-Type to application/json.
  7. In the ElevenLabs node URL, replace {{voice id}} with your chosen ElevenLabs Voice ID from the Voice Library.
  8. Review the Window Buffer Memory session key. Replace the static key with a dynamic value from your request if you need multi user conversations, for example a user id passed in the webhook.
  9. Send a test POST to the webhook using Webhook.site, Typeform, or a REST client with a short audio clip. Confirm that the OpenAI node outputs text.
  10. Check the Basic LLM Chain output for the generated reply and verify that Insert Chat writes messages back to memory.
  11. Validate that the Respond to Webhook node returns a binary audio file. Play the response to confirm voice quality and content.
  12. If you receive empty audio, check that the ElevenLabs node body includes the text field and that your API key and Voice ID are correct.
  13. If transcription fails, verify the audio format and size. Use common formats like MP3 or WAV and keep test clips short.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

ElevenLabs

Sign up

Free: $0 / mo, 10k credits / mo, includes API access

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

Gravity Forms

Sign up

Elite License: $259/year (includes Webhooks Add-On for sending submissions to n8n)

OpenAI

Sign up

Pay-as-you-go: GPT-5 at $1.25 per 1M input tokens and $10 per 1M output tokens

Typeform

Sign up

Basic: $29 / mo — includes API access and webhooks

Webhook.site

Sign up

Free tier: $0, public API available; free URLs expire after 7 days and accept up to 100 requests

Zapier

Sign up

Professional: from $19.99 / mo (billed annually) — includes Webhooks (lowest tier suitable for n8n via API)

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.