Modern RAG — Agentic, Graph, Memory.
Modern Retrieval for Enterprise AI.
Vector databases are foundational, but modern RAG is so much more. Agentic retrieval, knowledge graphs, persistent LLM memory, self-reflection, and multi-modal understanding — all in one open, self-hosted platform.
The Modern RAG Stack
Simple keyword search was just the beginning. Today's enterprise AI demands a layered retrieval strategy — combining semantic vectors, knowledge graphs, agentic orchestration, and persistent memory into a single coherent system.
Retrieval is no longer a single lookup. Autonomous agents decide when to retrieve, which sources to query, how to refine results, and whether to iterate. Multi-hop reasoning across distributed knowledge bases, with self-correction loops that improve answer quality.
Knowledge graphs enhance semantic retrieval with structured relationships. Entities, concepts, and their connections form a semantic web that captures context vectors alone miss. Traverse relationship edges to discover insights no flat index can surface.
Persistent, evolving memory that learns from every interaction. Short-term session context, long-term user preferences, and episodic memory of past queries. Your AI remembers who you are, what you've asked, and how you prefer answers.
The model reflects on its own retrieved context, checking for relevance, hallucination, and completeness before generating. If retrieved passages are insufficient, it triggers a new retrieval cycle. Built-in quality gates reject bad retrievals.
When retrieval quality is low, CRAG doesn't give up — it reformulates queries, searches alternative sources, or decomposes the question into sub-queries. Quality gates evaluate retrieved documents before generation.
Recursive abstractive processing summarizes document clusters into hierarchical summaries. Retrieval happens at multiple abstraction levels — from raw chunks to high-level topic summaries, matching queries at the right granularity.
Layered Retrieval Architecture
Layered retrieval that adapts to your data, your queries, and your domain — from hybrid search to agentic orchestration.
Dense vector embeddings + sparse keyword search (BM25, SPLADE) combined through reciprocal rank fusion. Semantic meaning meets lexical precision. No query falls through the cracks.
Entity extraction builds a dynamic knowledge graph from your documents. Queries traverse relationships to find information that no vector similarity can surface — turning disconnected facts into connected knowledge.
Cross-encoder re-rankers score initial results for precision. Multi-source fusion combines results from vector, keyword, graph, and SQL queries into a single ranked list before passing to the LLM.
Semantic chunking respects document boundaries — paragraphs, sections, tables. Small-to-big retrieval retrieves fine-grained chunks but passes broader context to the LLM. Contextual retrieval enriches each chunk.
Persistent memory across sessions. User-level memory stores facts, preferences, and history. Session memory maintains conversation state. Episodic memory recalls past interactions. Your AI builds a relationship with each user over time.
Autonomous agents plan retrieval strategies, select tools, evaluate results, and iterate. Multi-hop reasoning decomposes complex questions into sub-queries, retrieves for each, and synthesizes a coherent answer.
Vector Databases: Still the Foundation
Semantic search with Qdrant remains the core retrieval engine — sub-200ms across millions of vectors. But modern RAG layers on top: graph relationships, persistent memory, agentic orchestration, and self-reflection. Pragmatismo integrates them all into a single, self-hosted platform. No SaaS markups, no per-seat fees, no data leaving your network.
Ready for Modern RAG?
From basic retrieval to agentic orchestration with persistent memory — deploy the full modern RAG stack on your own infrastructure.
Related Reading
From the Pragmatismo blog
The Illusion of Intelligence
Why LLMs are high-dimensional autocomplete engines and how combining them with deterministic BASIC logic creates enterprise-grade reliability.
Standardized AI Templates
Pre-configured orchestration blueprints for RAG, advanced retrieval, and domain-specific AI across Education, Finance, Healthcare, and more.
What Is an LLM?
A practical guide to tokens, parameters, context windows, RAG architecture, and the sovereign pivot toward open-weight models.