n8n

How to Automate Apify API Schema Generation?

Need a faster way to turn API docs into structured data? This workflow finds developer pages, reads them, extracts API operations, and builds custom schemas you can share with your team. It is useful for product, integration, and IT teams that track many vendors.

The run starts from a manual or event trigger, pulls a list from Google Sheets, and uses Apify to search and scrape web pages. Results are cleaned and deduped, then split into readable chunks and stored in Qdrant as embeddings. Google Gemini checks if the content is real API documentation, identifies products, and extracts the endpoints and methods. Operations are saved back to Google Sheets. A code step groups the operations into a clear JSON schema and uploads the file to Google Drive. Status nodes and wait steps manage research, extract, and generate phases with clear success and error paths.

Set up requires accounts for Apify, Google Sheets and Drive, Qdrant, and Google Gemini. Add credentials in n8n, point nodes to your sheet and Drive folder, and confirm your Qdrant collection. Expect large time savings, consistent outputs, and a reusable API knowledge base. Great for vendor reviews, partner onboarding, and building internal API catalogs at scale.

What are the key features?

  • Pulls service lists and writes results with Google Sheets nodes
  • Uses Apify search and scraper via HTTP Request with batching and memory settings
  • Filters, removes duplicates, and splits large pages into 4k chunks for better parsing
  • Creates embeddings with Google Gemini and stores them in Qdrant for semantic search
  • Classifies content to confirm real API docs before extraction
  • Extracts endpoints, methods, and parameters with Google Gemini prompts and templates
  • Merges and cleans operations, then builds a custom JSON schema with a Code node
  • Uploads final schema to Google Drive and tracks phase status with EventRouter and Wait nodes

What are the benefits?

  • Reduce API research time from 3 hours to 15 minutes per service
  • Automate 80 percent of reading and data entry from developer pages
  • Improve accuracy by 30 percent with consistent extraction and dedupe steps
  • Handle 50 or more services in one run with batch scraping
  • Unify Google Sheets, Apify, Qdrant, Google Gemini, and Google Drive in one flow
  • Create reusable JSON schemas that standardize vendor onboarding

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with Google Sheets, Google Drive, Apify, Qdrant and Google Gemini. See the Tools Required section above for links to create accounts with these services.
  3. In the n8n credentials manager, create Google Sheets and Google Drive OAuth credentials. Double click the related nodes, choose Create new credential, and follow the on screen steps to connect your Google account.
  4. In Apify, create an API token. For the Search node, set HTTP Header auth with your token. For the Scraper node, set Query auth with the token as shown in the node fields.
  5. In Google AI Studio, create an API key for Google Gemini. Open each Gemini node, click Create new credential, and paste the key. Keep model names as models/gemini-1.5-flash-latest and models/text-embedding-004 unless you must change them.
  6. For Qdrant, create a cluster and API key. In the Qdrant vector store node, enter the base URL, API key, and the target collection name. If the collection does not exist, create it in Qdrant first.
  7. Open each Google Sheets node and point it to your spreadsheet with the service list. Confirm column names match what the node expects, such as service, url, and status fields.
  8. Open the Google Drive upload node and set the target folder. Confirm your Google user has upload permission to that folder.
  9. Check the HTTP Request nodes for batching and memory settings. If pages are large, keep memory at 2048 and batch size at 2 to avoid timeouts.
  10. Click Test workflow to run a small sample. Watch the Research Result, Extract Result, and Generate Result nodes to confirm each phase completes.
  11. Validate outputs: review Qdrant collections, check the appended rows in Google Sheets, and open the JSON schema file in Google Drive.
  12. Troubleshooting: if scraping returns empty results, verify Apify actor inputs and your token. If Qdrant returns auth errors, recheck the base URL and key. If the Drive upload fails, confirm the folder ID and Google credential scope.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Apify

Sign up

Free plan: $0 / mo with $5 monthly platform credits; API access via token

Google Drive

Sign up

Drive API: $0 (no additional cost; quota-limited)

Google Gemini

Sign up

Free tier: $0 via Gemini API; e.g., Gemini 2.5 Flash-Lite free limits 1,000 requests/day (15 RPM, 250k TPM). Paid from $0.10/1M input tokens and $0.40/1M output tokens.

Google Sheets

Sign up

Free: $0 (Google Sheets API usage has no additional cost; quota limits apply)

Qdrant

Sign up

Free tier: $0, 1 GB free cluster (no credit card), accessible via REST/GRPC API

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.