n8n

How to Generate Gemini Image Captions for Social Posts?

Turn any image into a ready to post asset. The system writes a title and caption, then places the text neatly on the picture. Great for social posts, blogs, and quick creative needs without a designer.

Here is how it works. A manual trigger starts the run and an HTTP Request node downloads an image. The image is resized to 512 by 512 for fast AI processing, and basic image details are read. A Gemini vision model receives the image and returns a structured title and caption using a built in parser. A Code node calculates where the text should go based on the image size. Merge nodes combine the image, the caption, and the positions. Finally, the Edit Image node draws a subtle background box and overlays the title and caption on the image.

Setup is straightforward. You need a Google Gemini API key and n8n Cloud credentials for the Gemini node. Expect faster production of branded graphics, less editing time, and clear text placement. Use it for auto captioned hero images, watermarked visuals, and quick social content. With small changes, it can batch process many images and keep your style consistent.

What are the key features?

  • Image download with HTTP Request to pull assets from a URL as binary data.
  • Image resizing to 512 by 512 for faster AI processing and stable results.
  • Vision captioning using Google Gemini 1.5 Flash through an LLM chain.
  • Structured Output Parser to return a title and caption in a clean JSON shape.
  • Code node calculates font size, line length, and bottom placement based on image size.
  • Merge nodes combine the image, AI text, and positions before editing.
  • Edit Image multi step draws a semi transparent background and overlays the text.
  • Manual trigger for safe testing with easy switch to other triggers later.

What are the benefits?

  • Reduce manual captioning and layout from 10 minutes to 1 minute per image
  • Automate 80% of repetitive design work for simple social graphics
  • Keep text placement consistent across all images to improve brand quality
  • Handle dozens of images in a single run without extra tools
  • Eliminate copy paste steps by generating and applying text in one flow

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with Google Gemini. See the Tools Required section above for links to create accounts with these services.
  3. In n8n Cloud, open the Google Gemini Chat Model node, click the Credential to connect with dropdown, select Create new credential, and follow the on screen steps.
  4. Get your Google Gemini API key from the Google AI Studio site, paste it into the new credential, and save. Name the credential clearly, for example Gemini Prod.
  5. Open the HTTP Request node and confirm the image URL works. Set Response Format to File and make sure Binary Property is set to data to store the image as binary.
  6. Check the Resize For AI node. Set width to 512 and height to 512. Confirm the input binary property matches the HTTP Request output.
  7. Open the Image Captioning Agent node. Ensure the language model is set to the Google Gemini node and the Structured Output Parser is connected with caption_title and caption_text fields.
  8. Review the Code node named Calculate Positioning. Adjust the lineHeight value if the text appears too large or too small for your images.
  9. Open the Apply Caption to Image node. Confirm it draws the background rectangle and then writes the title and caption using the fields from the merged data.
  10. Click Test workflow to run with the manual trigger. Verify the final image shows a readable overlay and correct text. If the image fails, check API key permissions and confirm the model is available in your region.
  11. If text is cut off, increase the image width or reduce the font size logic in the Code node. If the overlay hides content, adjust the rectangle height or placement values.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.