Give Claude Code a Memory: Connecting RAG via MCP

Every Claude Code session starts the same way: explain the codebase, explain the architecture, explain what you’re building. It works, but it’s friction. What if Claude Code could just know?

The Problem

We’re a small team building multiple products—Nova (enterprise AI assistant), Herald (social media automation), shared infrastructure. Each has its own patterns, decisions, and context. Every time we start a new Claude Code session, we’re re-explaining things that are already documented somewhere.

The context files help. CLAUDE.md in each repo gives Claude Code a starting point. But it’s static. It can’t answer “how does Nova handle RAG queries?” or “what’s our approach to multi-tenant isolation?” without us pasting in the relevant docs.

The Solution: MCP + RAG

MCP (Model Context Protocol) lets you give Claude Code access to external tools. RAG (Retrieval-Augmented Generation) lets you search a knowledge base semantically. Combine them: Claude Code can now query our internal documentation, architecture docs, and code context on demand.

Here’s what it looks like in practice:

You: How does Herald handle the approval workflow?

Claude: [calls search("Herald approval workflow")]
→ Returns relevant chunks from Herald docs
→ Claude synthesizes: "Herald uses a Slack-based approval flow
   where generated replies are sent to a channel with approve/reject
   buttons. Approved content is posted via the X API..."

No copy-pasting. No “let me find that doc.” Claude Code just knows.

How We Built It

The Stack

PostgreSQL + pgvector: Vector storage for embeddings
Amazon Titan Embeddings: 1024-dimension vectors
Hybrid Search: 50% semantic + 30% keyword + 20% recency
Python MCP Server: Thin wrapper exposing search tools

The MCP Server

The server is simple—under 150 lines. It exposes two tools:

@server.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="search",
            description="Search the RAG knowledge base for relevant context.",
            inputSchema={
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "tenant_id": {"type": "string", "default": "wildpines"},
                    "n_results": {"type": "integer", "default": 5}
                },
                "required": ["query"]
            }
        ),
        Tool(
            name="list_agents",
            description="List available knowledge bases.",
            # ...
        ),
    ]

When Claude Code calls search, we run a hybrid query against PostgreSQL:

SELECT
    content,
    (vector_score * 0.5) + (keyword_score * 0.3) + (recency_score * 0.2) as combined
FROM rag_chunks
WHERE tenant_id = $1
ORDER BY combined DESC
LIMIT $2

The hybrid approach matters. Pure vector search misses exact matches (“RAGClient” the class name). Pure keyword search misses semantic similarity (“document retrieval” ≈ “RAG”). Recency keeps fresh content ranked higher.

Connecting to Claude Code

Add to ~/.claude/mcp.json:

{
  "mcpServers": {
    "wildpines-rag": {
      "command": "poetry",
      "args": ["run", "wildpines-mcp"],
      "cwd": "/path/to/wildpines-core",
      "env": {
        "DATABASE_URL": "postgresql://...",
        "AWS_REGION": "us-east-2"
      }
    }
  }
}

Restart Claude Code. Now it has access to search and list_agents tools.

What We Index

We’re selective about what goes into RAG:

Architecture docs: System design, data flows, integration patterns
Dev plans: What we’re building and why
Methodology: Our processes and principles
Code (selective): Key modules, not every file

We skip: node_modules, generated files, binary assets, anything that changes constantly.

Multi-Tenant Isolation

Different products get different tenants:

# Herald context
results = search("approval workflow", tenant_id="herald")

# Nova context
results = search("RAG query implementation", tenant_id="nova")

# Shared methodology
results = search("code review process", tenant_id="wildpines")

Claude Code can query across tenants when needed, or scope to a specific product.

The Results

Session startup is faster. Instead of explaining our RAG implementation for the 10th time, we just ask Claude Code to look it up. It finds the relevant architecture doc, reads it, and proceeds with full context.

The bigger win: we’re building institutional knowledge that persists. When a new team member joins, they get the same RAG access. When we revisit a decision six months later, the context is there.

Building something similar? The MCP Python SDK makes it easy. Start with a single search tool and expand from there.