n8n

How to Generate Google Gemini Image Captions?

Create ready to publish images by auto generating a caption from any photo and placing it on the image in a clean overlay. Great for social posts, product shots, and quick editorial visuals where you need text on the image fast.

The flow starts on a manual run, downloads an image from a URL, and resizes it for a vision model. A Google Gemini model then looks at the image and returns a title and caption in a structured format. The workflow reads the image size, calculates a safe font size and position, draws a semi transparent bar, and adds the text. Merge nodes keep the data from the model and the image aligned, while a Code node handles placement math.

Setup is simple. You only need a Google Gemini API key and an image source URL. Expect to cut caption work from minutes to under a minute per image, while keeping a consistent style across your posts. Use it for social banners, blog headers, and on brand watermarks that stay readable on any photo.

What are the key features?

  • Image import from a direct URL using HTTP Request with file download
  • Automatic resize to 512 by 512 to fit vision model inputs
  • Google Gemini vision captioning returns a title and caption
  • Structured output parser enforces a clean JSON format for text
  • Image size detection to guide font size and line length
  • Code based positioning to place text at the bottom safely
  • Overlay creation with a semi transparent bar and readable text
  • Merge nodes keep caption data and image properties in sync

What are the benefits?

  • Reduce manual caption design from 10 minutes to 1 minute per image
  • Automate 90% of layout steps with consistent placement and sizing
  • Improve readability with smart sizing based on image dimensions
  • Keep a consistent brand look by using the same overlay style every time
  • Handle more images with batch runs by feeding a list of URLs

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with Google Gemini. See the Tools Required section above for links to create accounts with these services.
  3. Open the Google Gemini chat model node. In the Credential to connect with dropdown, click Create new credential. Follow the on screen steps, paste your Google Gemini API key from the API page, name it clearly such as gemini prod, and save. Confirm the model is set to models/gemini-1.5-flash.
  4. Open the HTTP Request node. Set the image URL you want to use. Enable file download so the image is returned as binary data. Keep the default binary property unless your setup differs.
  5. Check the Resize For AI node. Keep width and height at 512 so the image is optimized for the vision model. Make sure it receives the binary output from the HTTP Request node.
  6. Confirm the Get Info node reads the original image to capture width and height. This feeds sizing data used for caption placement.
  7. Open the Image Captioning Agent node. Ensure it uses the Google Gemini chat model and the structured output parser with the fields caption_title and caption_text. Verify the resized image is passed as the vision input.
  8. Review both Merge nodes. Keep Combine by position so the image data and caption data stay aligned item by item.
  9. Open the Calculate Positioning code node. Adjust line height or font scale if your images are much larger or smaller. This prevents overflow and improves readability.
  10. Open the Apply Caption to Image node. Tweak overlay color, text color, and font to match your brand. Keep the rectangle behind the text for contrast.
  11. Click Test workflow. Check the final image output. If you see empty captions, verify your Gemini credential and model quota. If text wraps poorly, lower font size or change the max line length in the code node.
  12. Optional: Replace the manual trigger with a Webhook or a Schedule trigger to process new images from your CMS or run daily batches.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.