RAG query rewriting is a technique in retrieval-augmented generation where a user’s input is transformed into one or more improved retrieval queries to increase the relevance and coverage of documents fetched for grounding a model’s answer.
What is RAG Query Rewriting?
In many RAG systems, the raw user question is not the best search query for a vector database or keyword index. It may be ambiguous, underspecified, or contain conversational context that hurts retrieval. Query rewriting uses an LLM or rules to convert the user message into a retrieval-oriented form. This can include resolving coreferences, adding missing entities, expanding acronyms, generating multiple sub-queries, or producing a structured query with filters. Some systems create “hypothetical answers” or key points to retrieve against, while others generate a set of focused questions that cover different aspects of the task. The rewriting step is typically run before retrieval, and the rewritten queries, not the original user text, are embedded or searched. Good designs preserve user intent, avoid introducing new facts, and log both the original and rewritten queries for debugging.
Where it is used and why it matters
Query rewriting is used in enterprise knowledge assistants, support bots, research copilots, and multi-step agents that retrieve repeatedly. It matters because retrieval quality often dominates answer quality in grounded systems. Better queries reduce irrelevant context, increase recall of the right passages, and lower hallucination risk. It can also reduce cost by retrieving fewer, more relevant chunks, which shortens the context sent to the generator.
Types
- Contextual rewrite: Incorporate chat history and resolve pronouns.
- Multi-query expansion: Produce several diverse queries to improve recall.
- Filtered rewrite: Add metadata constraints like product, region, and time.
- Decomposition: Split a complex question into sub-questions for targeted retrieval.
FAQs
1. Is query rewriting always beneficial?
Not always. Poor rewriting can drift from the user intent and harm retrieval. Evaluation with real queries is important.
2. How do you prevent the rewrite from adding facts?
Use strict instructions, keep rewrites short, and validate rewrites against allowed entities or metadata.
3. How is this different from RAG chunking?
Chunking prepares documents for indexing, while query rewriting changes the search query used to fetch chunks.
4. Can query rewriting be done without an LLM?
Yes. Rules, templates, and synonym expansion can help, but LLM-based rewriting usually handles ambiguity and chat context better.