Give Claude Code a Memory: Connecting RAG via MCP
Stop re-explaining context every session.
Every Claude Code session starts the same way: explain the codebase, explain the architecture, explain what you’re building. It works, but it’s friction. What if Claude Code could just know?
The Problem
We’re a small team building multiple products—Nova (enterprise AI assistant), Herald (social media automation), shared infrastructure. Each has its own patterns, decisions, and context. Every time we start a new Claude Code session, we’re re-explaining things that are already documented somewhere.
The context files help. CLAUDE.md in each repo gives Claude Code a starting point. But it’s static. It can’t answer “how does Nova handle RAG queries?” or “what’s our approach to multi-tenant isolation?” without us pasting in the relevant docs.
The Solution: MCP + RAG
MCP (Model Context Protocol) lets you give Claude Code access to external tools. RAG (Retrieval-Augmented Generation) lets you search a knowledge base semantically. Combine them: Claude Code can now query our internal documentation, architecture docs, and code context on demand.
Here’s what it looks like in practice:
You: How does Herald handle the approval workflow?
Claude: [calls search("Herald approval workflow")]
→ Returns relevant chunks from Herald docs
→ Claude synthesizes: "Herald uses a Slack-based approval flow
where generated replies are sent to a channel with approve/reject
buttons. Approved content is posted via the X API..."
No copy-pasting. No “let me find that doc.” Claude Code just knows.
How We Built It
The Stack
- PostgreSQL + pgvector: Vector storage for embeddings
- Amazon Titan Embeddings: 1024-dimension vectors
- Hybrid Search: 50% semantic + 30% keyword + 20% recency
- Python MCP Server: Thin wrapper exposing search tools
The MCP Server
The server is simple—under 150 lines. It exposes two tools:
@server.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="search",
description="Search the RAG knowledge base for relevant context.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string"},
"tenant_id": {"type": "string", "default": "wildpines"},
"n_results": {"type": "integer", "default": 5}
},
"required": ["query"]
}
),
Tool(
name="list_agents",
description="List available knowledge bases.",
# ...
),
]
When Claude Code calls search, we run a hybrid query against PostgreSQL:
SELECT
content,
(vector_score * 0.5) + (keyword_score * 0.3) + (recency_score * 0.2) as combined
FROM rag_chunks
WHERE tenant_id = $1
ORDER BY combined DESC
LIMIT $2
The hybrid approach matters. Pure vector search misses exact matches (“RAGClient” the class name). Pure keyword search misses semantic similarity (“document retrieval” ≈ “RAG”). Recency keeps fresh content ranked higher.
Connecting to Claude Code
Add to ~/.claude/mcp.json:
{
"mcpServers": {
"wildpines-rag": {
"command": "poetry",
"args": ["run", "wildpines-mcp"],
"cwd": "/path/to/wildpines-core",
"env": {
"DATABASE_URL": "postgresql://...",
"AWS_REGION": "us-east-2"
}
}
}
}
Restart Claude Code. Now it has access to search and list_agents tools.
What We Index
We’re selective about what goes into RAG:
- Architecture docs: System design, data flows, integration patterns
- Dev plans: What we’re building and why
- Methodology: Our processes and principles
- Code (selective): Key modules, not every file
We skip: node_modules, generated files, binary assets, anything that changes constantly.
Multi-Tenant Isolation
Different products get different tenants:
# Herald context
results = search("approval workflow", tenant_id="herald")
# Nova context
results = search("RAG query implementation", tenant_id="nova")
# Shared methodology
results = search("code review process", tenant_id="wildpines")
Claude Code can query across tenants when needed, or scope to a specific product.
The Results
Session startup is faster. Instead of explaining our RAG implementation for the 10th time, we just ask Claude Code to look it up. It finds the relevant architecture doc, reads it, and proceeds with full context.
The bigger win: we’re building institutional knowledge that persists. When a new team member joins, they get the same RAG access. When we revisit a decision six months later, the context is there.
Building something similar? The MCP Python SDK makes it easy. Start with a single search tool and expand from there.