What tools does this template use?

This template uses n8n and integrates with googlegemini.

How do I set up this template?

Import the template into n8n, configure the integration credentials, and activate the workflow. Detailed step-by-step instructions are available on the template page.

n8n

How to Generate Google Gemini Image Captions?

Create ready to publish images by auto generating a caption from any photo and placing it on the image in a clean overlay. Great for social posts, product shots, and quick editorial visuals where you need text on the image fast.

The flow starts on a manual run, downloads an image from a URL, and resizes it for a vision model. A Google Gemini model then looks at the image and returns a title and caption in a structured format. The workflow reads the image size, calculates a safe font size and position, draws a semi transparent bar, and adds the text. Merge nodes keep the data from the model and the image aligned, while a Code node handles placement math.

Setup is simple. You only need a Google Gemini API key and an image source URL. Expect to cut caption work from minutes to under a minute per image, while keeping a consistent style across your posts. Use it for social banners, blog headers, and on brand watermarks that stay readable on any photo.

What are the key features?

Image import from a direct URL using HTTP Request with file download
Automatic resize to 512 by 512 to fit vision model inputs
Google Gemini vision captioning returns a title and caption
Structured output parser enforces a clean JSON format for text
Image size detection to guide font size and line length
Code based positioning to place text at the bottom safely
Overlay creation with a semi transparent bar and readable text
Merge nodes keep caption data and image properties in sync

What are the benefits?

Reduce manual caption design from 10 minutes to 1 minute per image
Automate 90% of layout steps with consistent placement and sizing
Improve readability with smart sizing based on image dimensions
Keep a consistent brand look by using the same overlay style every time
Handle more images with batch runs by feeding a list of URLs

How do you set it up?

Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
You'll need accounts with Google Gemini. See the Tools Required section above for links to create accounts with these services.
Open the Google Gemini chat model node. In the Credential to connect with dropdown, click Create new credential. Follow the on screen steps, paste your Google Gemini API key from the API page, name it clearly such as gemini prod, and save. Confirm the model is set to models/gemini-1.5-flash.
Open the HTTP Request node. Set the image URL you want to use. Enable file download so the image is returned as binary data. Keep the default binary property unless your setup differs.
Check the Resize For AI node. Keep width and height at 512 so the image is optimized for the vision model. Make sure it receives the binary output from the HTTP Request node.
Confirm the Get Info node reads the original image to capture width and height. This feeds sizing data used for caption placement.
Open the Image Captioning Agent node. Ensure it uses the Google Gemini chat model and the structured output parser with the fields caption_title and caption_text. Verify the resized image is passed as the vision input.
Review both Merge nodes. Keep Combine by position so the image data and caption data stay aligned item by item.
Open the Calculate Positioning code node. Adjust line height or font scale if your images are much larger or smaller. This prevents overflow and improves readability.
Open the Apply Caption to Image node. Tweak overlay color, text color, and font to match your brand. Keep the rectangle behind the text for contrast.
Click Test workflow. Check the final image output. If you see empty captions, verify your Gemini credential and model quota. If text wraps poorly, lower font size or change the max line length in the code node.
Optional: Replace the manual trigger with a Webhook or a Schedule trigger to process new images from your CMS or run daily batches.

Tools Required

n8n

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Google Gemini

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

How to Generate Google Gemini Image Captions?

How to Generate Google Gemini Image Captions?

What are the key features?

What are the benefits?

How do you set it up?

Tools Required

n8n

Google Gemini

Similar Templates

Automate Google Sheets Product Price Alerts

Automate Telegram DeepSeek Support with Google Docs Memory

Automate Google Drive OCR Document Processing

Join Futurise to access 1,200+ automation templates