Code Search: How Agents Search Across Snap’s Codebase

This is Part 2 of a three-part series on how Snap is building AI-assisted software development workflows. Part 1 introduced CodePal, our AI code reviewer. This post is about Code Search, the retrieval layer that lets both agents and engineers explore and understand all of Snap's code, fast. In Part 3, we'll cover the fleet of autonomous remote agents that has changed how we write and ship code at Snap.

Snap's codebase spans thousands of repos across multiple organizations: millions of files and close to a terabyte of source. Until last year, our code search was a single VM regex tool that couldn't scale past one machine's memory, went dark during reindexing, and had no story for AI agents. Meanwhile, AI agents were multiplying fast, and the infrastructure they'd need to work across that codebase didn't exist yet.

So we built it, agent-first, ahead of demand rather than in reaction to it. We replaced the old tool with a sharded code search platform powered by Zoekt and exposed over MCP: it indexes every repo across our organizations and stays current within minutes of a push to the default branch. It's now the first major piece of the context substrate agents need to do real coding work, and the foundation for cross-repo navigation, for AI agents and engineers alike.

This is the story of why we built it, the decisions that surprised us, and what worked.

The Trigger: You Can't Git Clone Your Way to an Answer

What made code search infrastructure-grade urgent at Snap was the rise of coding agents. A local IDE agent like Claude Code, Cursor, or Codex CLI can lean on the user's working tree, but only for the one repo that's checked out; it still can't find prior art across the thousands of others without cloning each one. Remote agents, chiefly our own internal ones, have no working tree to lean on at all.

Snap's code isn't a single monorepo. It's thousands of repos across multiple organizations. Worse, the agent usually has nothing obvious to clone in the first place. A remote task tends to arrive as a vague prompt, a bug from a Jira ticket, or a cryptic user-facing error, none of which name a repo, and often the person who reported it doesn't know which one is at fault either. Across thousands of repos, even diagnosing where the problem lives is impossible by cloning at random and grepping one repo at a time. While GitHub's built-in code search is a helpful starting point, it didn't quite meet the specific requirements for our scale and organizational structure.

Two questions an agent has to answer before it can do useful work:

Which repo(s) should I be editing? A natural-language prompt like "fix the rate limiter in the upload service" doesn't tell the agent which of our hundreds of services owns the upload path.
Is there prior art for this? Common patterns recur across services. Without cross-repo search, agents reinvent solutions or, worse, propose changes that conflict with how the rest of the org already does things. And reinventing isn't free: a from-scratch implementation carries blind spots someone else already ran into, so you burn time iterating toward an answer that's already tried and tested in another repo.

A human engineer answers both questions from years of accumulated context. An agent starts every task cold; search is the only way it can acquire that context.

Neither problem is solvable inside a single repo. Code search has to be infrastructure.

Inspiration: Claude Code's Grep

When we started designing this, the prevailing assumption in the AI tooling space was that code retrieval meant RAG. Embed every chunk of every file, store in a vector DB, query with a natural-language prompt, retrieve top-k chunks, feed them to the model.

The thing that changed our minds was watching how Claude Code actually works in a local checkout. It doesn't query a vector store. It runs grep. It runs it many times. It greps for an error string, reads a few hits, refines its query, greps again. The model itself decides what to search for, evaluates results, and iterates. The retrieval system is dumb. The intelligence is in the loop.

This worked surprisingly well: fresh by construction, exact when you need it, and no retrieval infrastructure to babysit. (We bet against RAG deliberately; more on that below.)

We asked the obvious question: what if remote agents could grep across all of Snap's code, the way Claude Code greps a local checkout? That's the platform we set out to build.

Two Surfaces: Exact Search and AI Search

We exposed code search through two modes.

Exact search is regex / literal / symbol search across all indexed repos. It has to be fast: we set a sub-second p95 target and beat it handily. This is the "grep," the deterministic, fast primitive that engineers and agents both rely on. It supports filtering by repo, path, language, and case sensitivity, and returns matches with surrounding context lines.

AI search answers natural-language questions like "Where do we handle OAuth token refresh?" or "Which service owns the upload retry policy?" Internally, AI search is not RAG. It's an agent loop: the model receives the question, plans a sequence of exact searches, evaluates results, refines, and synthesizes an answer with citations. A typical AI search runs 5–10 underlying exact searches before producing an answer.

The important point: AI search is built on top of exact search. We didn't build two retrieval systems. We built one fast retrieval primitive and let an agent compose it into something semantic.

Why we didn't build RAG

This was the most debated decision in the design phase, so it's worth being direct about why.

First, in fairness to RAG: it has genuine strengths, and the biggest is speed. For a vague, natural-language query with no keyword to anchor on, a single embedding lookup returns plausible candidates in one hop, where our agent loop fires off several exact searches and reasons between them, so a semantic search that way takes seconds, not milliseconds. We weighed that and still chose exact search as the foundation, because for code, the failure that hurts most is a confident wrong answer. We deliberately traded some speed for accuracy: a fast result is worthless if it sends the agent to the wrong code.

Stateless over stateful. A vector database is a stateful system you have to operate, secure, scale, and keep consistent with the source-of-truth code. We already had to run an index pipeline for exact search; doubling that with an embedding pipeline and a vector store roughly doubled our operational surface area for marginal user-facing benefit. The cost is real, too: embeddings burn GPU compute to generate and re-generate as code changes, and the vectors need a database to host and scale, none of which exact search needs. Every stateful component you avoid is one you don't have to operate.
Engineers and agents both want exact matches more often than fuzzy ones. When someone searches parseAuthHeader, they want every occurrence of that exact string. Vector similarity will surface "related" functions that aren't the one they asked for. And many of those queries aren't fixed strings at all; they're regex patterns like func .*Handler or parseAuth.*, which a vector store has no way to express, while exact search runs them natively.
The model is the smart part. The whole bet of agentic systems is that the model can plan, evaluate, and iterate. RAG flattens that into a single non-iterative retrieval step. Agentic search keeps the iteration loop intact and lets the model decide when it has enough context. And the bet compounds: models keep getting better at reasoning and orchestrating multi-step work, and because the intelligence lives in the loop, our search inherits those gains directly: a stronger model means better search with nothing for us to change. RAG can't ride that curve; its ceiling is the retriever, not the reasoner.

We may add embeddings later for specific use cases (e.g., genuinely semantic queries with no good keyword anchor). But it'll be additive, not the foundation.

The Architecture, Briefly

A few decisions worth calling out for systems readers.

Zoekt for the index. Zoekt is a trigram-based code search engine: it memory-maps its indices, is extremely fast for regex over large corpora, and has a battle-tested binary format. Just as important, it's language-agnostic: it indexes text straight from a git tree, so a repo in any language (Go, Java, C++, Objective-C, Swift, Kotlin, Python, TypeScript, proto, Starlark, and more) is indexed with zero per-language setup: no build integration, no compiler plugin, no language server. Across thousands of heterogeneous repos that's decisive. Anything that needs a working build to index (like Kythe) is a non-starter at this scale. We didn't try to write our own.

Sharded by repos. We split the corpus across multiple high-memory shards. Each repo lives entirely on one shard. Sharding by repo keeps query semantics simple: a regex either matches a file or it doesn't, with no cross-shard merging at the file level. New repos land on whichever shard holds the least data, so shards stay balanced by size, not repo count.

Shards serve locally, storage holds the truth. Each shard downloads the indices it's assigned and serves from its own local copy, while shared storage keeps the source of truth. Moving a repo to a different shard just means that shard downloads what it needs. No copying data shard to shard.

A manifest is the source of truth for shard membership. Each shard has a manifest in object storage listing the repos it should be serving and where each repo's current index lives. Shards poll this manifest periodically and reconcile to match it.

Zero-downtime updates. By default every update stays on the same shard: a repo gets a new index version, and the shard it already lives on downloads that version alongside the old one, then cuts over in a single atomic swap. A query always hits exactly one complete index: never a half-written one, never none. There's no overlap window and no cross-shard coordination; the swap is instant and local to the one shard that owns the repo.

Scaling and rebalancing. A repo only moves to a different shard when we scale the fleet up or down, or rebalance to even out shard sizes (say, when a large repo is added). It happens a handful of times a year, if that. That's the one case where a repo's home shard changes, and we handle it with an add-first, remove-later overlap: the new shard starts serving the repo before the old one drops it (after a grace period), so the repo is always served by at least one shard during the move. For the brief window it lives on two shards at once, the orchestrator deduplicates the overlapping hits and the caller never sees a duplicate.

Only re-index what changed. A workflow walks every repo, checks whether its latest commit matches what was last indexed, and re-indexes only the ones that actually changed. Of thousands of repos, only a small fraction change in a typical week, so most of the work is the cheap "skip" path. For the repos that do change, we go a level deeper: most pushes touch only a handful of files, so instead of rebuilding the index from scratch we build a delta of just what changed and stack it on the previous one. A one-line change to a giant monorepo costs a one-line re-index, not a full rebuild.

Built For Agents From Day One

A few things we did specifically because the dominant caller is an agent, not a browser tab.

MCP server. We expose code search as an MCP server, so any MCP-capable agent can call it with one config line. That includes Claude Code, Cursor, Codex, Casper (our internal remote coding agent), and CodePal (our AI code reviewer). Agents get code_search as a tool and code files as resources. No custom integration per agent.

Identity propagation. Agents inherit their human user's repo access. A search only returns hits the calling user is allowed to see.

Integration with CodePal, Our AI Review Tool

The first production consumer of code search wasn't an engineer at a search box. It was CodePal, our AI code review tool (covered in Part 1 of this series). CodePal is a first-class client of the code search read path, calling the same orchestrator that serves the web UI.

The cross-repo blind spot

CodePal builds deep context for a pull request (PR) by resolving the symbols a diff touches and pulling in the files that define and use them. That analysis is scoped to the repository the PR lives in. A function signature change rarely stays contained: a method renamed in a shared library can have callers scattered across dozens of other services, and a reviewer looking only at the diff, human or AI, has no way to see them. Closing that gap is the whole point of the integration.

So we gave the review model code search as a set of tools it can call mid-review. The differentiated capability is cross-repo blast radius. The model can ask "who calls this symbol, across every repo at Snap?" and get back the call sites, even when they live in repositories the PR never touches. When the model sees a call site its change would break, it can file a finding on the PR, which a human then acts on before merge. This extends visibility beyond the code changed in the individual pull request, to include reviewing changes that would happen in downstream systems, that previously could only be identified through human experience.

A Finding is a Judgment, Not a Detector Output

There is no automatic path from a search hit to a posted finding. A tool result comes back to the model as just another piece of context. The only way a finding gets created is the model deciding to read the callers, judging that its change breaks them, and choosing to file, and that finding then runs the same validation and verifier gauntlet as every other finding (Part 1 covers the verifier, which retracts anything it can't substantiate). The reviewer is making a judgment call, not running a compiler. "Before they merged" is true only because reviews run on open PRs, not because anything is build-verified.

The Tools

The deterministic primitive is code_search, a regex and literal search over the indexed corpus, paired with a tool to fetch file contents. On top of that sits a small family of higher-level lookups the model reaches for more often, the most important being find_callers (and relatives like find_consumers_of). Worth noting for anyone building something similar: these higher-level tools are not separate indexes. Each one frames a templated question and routes it through the same agentic search loop described earlier in this post, the one that plans a sequence of exact searches and synthesizes a result. find_callers is the search agent answering a who-calls question with a tuned prompt and a bounded result set, not a precomputed call graph. The fast literal primitive and the agent loop are the only two retrieval mechanisms underneath all of it.

Indexing the PR's Own Commit

Searching other repos is the headline, but the review also has to search the PR's own changed code, including symbols the PR just added. So when code-search tools are enabled, CodePal requests an on-demand index of the exact commit under review. Crucially, that index builds in parallel with the symbol resolution and context-building steps CodePal already runs for every review, so the work overlaps rather than stacking up. For a typical change the index is ready by the time the review needs it, and it often adds little or no wall-clock time. This piece makes searching the changes in the repo more accurate.

A Note on Freshness

Cross-repo lookups resolve against each repository's most recently indexed default branch. In production that index is event-driven rather than daily: a push to a default branch is picked up and re-indexed within seconds to a few minutes, so a review reasons about call sites that are nearly live. The main exception is a brand-new repository that hasn't been admitted to the index yet, which waits for the daily batch that handles onboarding and shard assignment.

Why This is the Worked Example

CodePal is the reason we argue code search belongs in the infrastructure layer rather than inside any single tool. Build the fast, exact retrieval layer well, expose it through one orchestrator that enforces access and freshness in a single place, and the products that sit on top of it, cross-repo review included, get cheaper to build. Part 1 covered CodePal on its own terms; this is the seam where the two systems meet.

Results

A few numbers from the first months of production. The agent-first bet we opened with didn't stay hypothetical for long:

Almost all Code Search traffic comes from agents and it’s only trending higher
Sub-second p95 latency for search
All orgs indexed, with per-user access control.
Zero downtime during index updates.

Qualitatively, the wins we didn't fully predict:

Remote agents stopped cloning. When a remote agent needs to do anything cross-repo, it searches instead of clones. Several agent workflows that were previously infeasible just work now.
"Which repo owns this?" became a one-shot question. Onboarding engineers, on-call responders, and agents all ask this constantly. Prior art was institutional knowledge.
Cross-repo refactors became cheaper to plan. Knowing every call site of a function, across thousands of repos, in under a second changes how you scope a change.

What's Next

A few directions we're working on:

Symbol navigation. Real "jump to definition" / "find references" across the org, not regex approximations.
Hybrid retrieval where it actually helps. For genuinely semantic queries with no keyword anchor, we'll layer embedding-based retrieval on top of exact search and merge results. Additive, not replacing.
Tighter agent loops. More integrations with internal agents that need to plan changes across many repos.

Takeaways

If you're considering similar infrastructure, three things stood out for us:

For code, exact search is the foundation. Build that first, well. Add semantic layers on top only when you have a use case that genuinely needs them.
Agents are first-class users. Designing the API for agents — MCP, identity propagation, multi-query patterns — made it better for humans too.
Stateless services beat stateful ones for ops. Every stateful component you avoid is one you don't have to operate, secure, or migrate. Keeping the index files in shared storage and tracking them with a simple list of what lives where got us a long way.

We made a bet: build code search as agent-first infrastructure ahead of demand, before the agents that would lean on it had fully arrived. That bet paid off. Today almost all code search traffic comes from agents, and work that was infeasible a year ago is now routine: a remote agent pins down which of thousands of repos a vague task even belongs to, maps a change's blast radius across every caller, and reuses prior art instead of reinventing it.

The architectural bet paid off just as well. One fast exact-search primitive, composed by the model instead of a stateful RAG pipeline, gave us infrastructure that's cheap to operate and that gets better on its own: every gain in model reasoning makes our search smarter with nothing for us to change. Get that foundation right, and everything built on top of it gets cheaper, faster, and more capable, including the fleet of autonomous agents we'll cover in Part 3, which lean on code search to move across all of Snap's code at a scale no engineer could by hand.