n8n

How to Automate Sitemap Content Discovery?

Need a fast way to find files and links on your site? This setup reads a sitemap URL, turns it into a clean list, and filters only the items you care about like PDFs. It helps marketing and SEO teams build content inventories and spot resources for campaigns without manual copy and paste.

Here is how it runs from start to finish. A manual trigger starts the run. A Set block holds one field called sitemapUrl. The HTTP Request block pulls the sitemap file from the web. The XML block converts the XML into JSON and normalizes the keys so they are easy to read. Split Out breaks the urlset.url array into single items. The Filter block returns only links that match your rules. Out of the box it focuses on .pdf links, which is great for asset audits.

Getting started is simple. Add your sitemap address to the Set block and adjust the Filter rules for your link patterns. Expect a big time savings on content audits and fewer mistakes from manual scans. Common uses include collecting gated resources, mapping old files for redirects, and tracking compliance documents across large sites.

What are the key features?

  • Manual trigger lets you run audits on demand
  • Set node stores a single sitemapUrl so changes are easy
  • HTTP Request fetches the sitemap.xml from the web
  • XML to JSON conversion with trim and normalize for clean keys
  • Split Out turns the urlset.url array into one item per link
  • Filter returns only URLs that match your rules like .pdf
  • Default filter focuses on PDF files for quick asset reviews
  • No credentials needed for public sitemaps

What are the benefits?

  • Reduce manual site audit time from 2 hours to 5 minutes
  • Streamline asset discovery by 80 percent using simple filters
  • Improve link accuracy by 90 percent by reading from the sitemap
  • Handle thousands of URLs without slowing down your review
  • Cut copy and paste work and lower the chance of missed links

How do you set it up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. Open the Set sitemap URL node and replace the sitemapUrl value with your full sitemap link. Use the main sitemap or a section sitemap if needed.
  3. Open the Get Sitemap node. Confirm the URL field uses the expression {{$json.sitemapUrl}}. Leave the method as GET and click Execute Node to check for a 200 status.
  4. Open the Convert Sitemap to JSON node and keep the options on: trim, normalize, merge attributes, ignore attributes, and normalize tags. Run the node to preview the JSON.
  5. Open the Split Out node and make sure fieldToSplitOut is set to urlset.url. If your sitemap has a different structure, update this path to match the JSON keys you see.
  6. Open the Filter URLs node and adjust the conditions. The default looks for .pdf links. Add more rules for folders, file types, or keywords as needed.
  7. Click Execute workflow to run the full chain. Inspect the output of the Filter URLs node to see the final list of links.
  8. Validation tips: Try a second sitemap to compare results. Check the Get Sitemap response code. If you see 403 or 404, verify the URL and that the file is public.
  9. Troubleshooting: If no results appear, recheck the Split Out path against the Convert node output. If some links are missing, update the Filter conditions or turn off case sensitive matching.

Tools Required

$24 / mo or $20 / mo billed annually to use n8n in the cloud. However, the local or self-hosted n8n Community Edition is free.

Similar Templates

Join Futurise to access 1,200+ automation templates

Get instant access to ready-made automation workflows for n8n, Make.com, AI agents, and more. Download, customise, and deploy in minutes.