QCon London 2026: Reliable Retrieval for Production AI Systems
Briefly

QCon London 2026: Reliable Retrieval for Production AI Systems
"Most failures in RAG systems stem from indexing and retrieval, rather than the language model itself. Enterprise documents often have complex layouts like tables and infographics, and simply converting them to plain text can strip away important structure, causing misread numbers or misinterpreted tables. To fix this, she built a pipeline combining traditional text extraction with visual-language models that understand layouts."
"Even with modern language models, chunking content is necessary to avoid overwhelming the model and increasing costs. Chu tested different methods and found that breaking documents into sections worked best for her dataset, reaching high accuracy, though she stressed that the right strategy depends on the specific data."
"Standard retrieval systems rely on vector similarity, but this can miss important context, like the timing of a document. Her system added temporal scoring to favor newer documents and a routing layer to decide whether to retrieve relevant information."
Rabobank's production AI search system serves over 300 users searching 10,000 internal documents using a RAG pipeline. The system ingests documents through parsing, chunking, and embedding, retrieves relevant chunks, and generates answers via LLM. Key challenges emerge in document quality, retrieval relevance, and evaluation. Accurate parsing is critical since enterprise documents contain complex layouts like tables and infographics that plain text conversion destroys. Visual-language models help preserve structure. Chunking by document sections proved most effective for accuracy. Standard vector similarity retrieval misses contextual factors like document recency, requiring temporal scoring and routing layers to improve relevance and system performance.
Read at InfoQ
Unable to calculate read time
[
|
]