Turn long tax rule PDFs into a simple chat assistant your team can use. The build loads public documents, organizes them by chapter and section, and answers questions with clear citations. Finance and compliance teams can find the right clause in minutes.
The flow starts with a manual run that downloads a zip of PDFs, unzips them, and extracts text. It maps chapters and section labels with pattern matching, then splits text into 2000 character parts with metadata for chapter and section. Embeddings are created with Mistral AI and saved to a Qdrant collection. A batching loop with a short wait and batch size of 5 helps avoid rate limits. A chat webhook uses an OpenAI agent and two tools: Ask for semantic search via the Qdrant Search API, and Search to fetch exact sections via the Qdrant Scroll API. Chat memory keeps context for follow up questions.
You need a Qdrant endpoint, a Mistral AI key, and an OpenAI key. Set your collection name and base URL, then run the index one time to load the data. After that, send chat questions to the webhook to get relevant snippets or full section text on request. Teams cut research time, improve answer consistency, and can reuse the same design for any policy or rulebook library.