Wiki scaling — search architecture and growth path
How Hussh wiki search and serving work across Drive source, GCS/Firestore indexes, MCP, and the reader; what breaks as the corpus grows; and how the architecture scales without redeploying for content CRUD.
TL;DR: Hussh wiki now uses a hosted serving plane: Drive is the private source workspace, Cloud Storage stores derived public/private snapshots, Firestore stores metadata and indexes, and MCP is the read/write capability boundary. The scaling story is still clean tiers: keep content editing human-friendly in Drive, keep public reads off Drive fanout, preserve Git as a scheduled audit/export mirror during the stability window, and move heavier search/rerank work to dedicated services only when usage justifies it.
Status as of 2026-05-26: Production is live on Cloud Run. The current corpus sync publishes the wiki corpus into GCS/Firestore. Ordinary content CRUD should not require a reader redeploy or a per-write Git commit; code/runtime changes still do.
Relations
- Hussh wiki — scalable app + MCP architecture — the foundational architecture this scales.
- LLM Wiki pattern — landscape & R&D — Karpathy's three-layer pattern.
- Wiki search — dual-audience UX + AX — the human-vs-agent UX side.
1. How search works today
The wiki has three layers — storage / index / serving — each replaceable independently.
Storage layer — Drive source plus hosted snapshots
The source workspace is the private Shared Drive. It stores markdown, raw captures, private files, and private artifacts under a permission boundary limited to owners and the runtime account. Every page still has:
name,description,type(user/feedback/project/reference),visibility(public/private)- A
## Relationsblock listing curated outgoing links - A
**Status as of YYYY-MM-DD**line for time-decay tracking - A
## Sourcesblock citing source provenance
The runtime serving layer is not Drive. Validated public/private snapshots are written to Cloud Storage, and Firestore stores path, title, visibility, keywords, chronology, aliases, checksums, publish state, and sensitivity state. Git remains the scheduled audit/export/bootstrap mirror during the stability window.
Index layer — Firestore/search indexes first, heavier retrieval later
The current serving plane builds public and private indexes from the same validated page snapshots:
- Metadata index — Firestore stores visibility, type, keywords, aliases, sort dates, and publish state.
- Search index — Cloud Storage stores public/private search payloads generated from title, description, TL;DR, body, relations, and keywords.
- Display relevance — raw search scores stay useful for ranking, but reader-facing relevance is capped so UI never reports above
100%. - Future retrieval — qmd/vector/rerank or managed vector stores remain an upgrade path when corpus size and query volume justify the extra operational cost.
┌─ BM25 lexical scores ─┐
query ─────────►│ │── RRF ──► top-30 ──► LLM rerank ──► final ranking
└─ Vector cosine scores ─┘The current principle is boring and fast: use hosted indexes for normal reads, avoid Drive fanout on public traffic, and add semantic retrieval only when the simpler index stops being enough.
Serving layer — MCP server + reader
- MCP server: TypeScript / Express /
StreamableHTTPServerTransport. It enforces auth, visibility, read/write tools, Drive sync, and GCS/Firestore publishing. - Tools: anonymous users see 8 public read tools; authenticated owners see 17 tools including write, capture, and artifact actions.
- Reader: Next.js 16, calls MCP from server components and route handlers. The reader does not read markdown from disk.
- Sync: MCP writes and Drive changes publish derived snapshots into Cloud Storage/Firestore. Scheduled reconcile catches missed webhooks and keeps indexes fresh.
How wiki_search answers a query end-to-end
- Client calls
wiki_search { query, max_results, type_filter }over MCP HTTP. - MCP chooses the public or private index based on auth tier.
- Search runs over the hosted index and applies visibility/type filters.
- Results return clean reader URLs, public-safe descriptions, keywords, and capped display relevance.
- Private pages, private Drive IDs, private aliases, and private-only relations stay out of anonymous responses.
The critical performance property: page/list/search reads come from GCS/Firestore indexes and cached MCP responses, not Drive listing or repo scans per request.
2. Scaling axes — what grows independently
Four axes grow at different rates. Plan against each separately:
| Axis | Today | Likely 6 months | 2 years |
|---|---|---|---|
| Data size (pages) | ~60 | ~600 | 6,000+ |
| Read throughput (queries/sec) | ~0.01 (you, occasional) | ~1 (team of 10 active) | ~50+ (multi-tenant) |
| Write throughput (edits/min) | ~1 (Claude appends after sources) | ~10 (team + agent ingestion) | ~100+ (cron pipelines) |
| Concurrent users | 1 | 10–50 | 100s–1000s |
The current architecture handles today's column comfortably. Each tier below describes what breaks first as you cross a column boundary.
3. Scaling tiers — what breaks, what to do
Tier 0 — single user, ≤ 500 pages (today)
Status: working. Nothing to change.
- qmd reindex completes in <5 s for the whole corpus.
- Searches return in 200–400 ms cold, <50 ms warm.
- Reader serves all routes from a single Next.js dev server.
- One MCP server process is plenty.
Investments: none structural. Keep the schema discipline tight (Status as of, ## Sources, visibility:) so that scaling later is cheap.
Tier 1 — team-shared, ≤ 5,000 pages
What breaks first:
wiki_listreturns the whole directory in one shot — at 5K pages this is a 1–5 MB JSON payload per call.wiki_lintrebuilds the entire edge index every call; O(N²) for cross-references.qmd embedon save scales with chunk count; debounced rebuild gets noticeable.- Single localhost server can't be shared across machines.
Fixes (in order of leverage):
- Pagination on
wiki_list— addlimit+offsetto the input schema. Default to first 50. - Async incremental indexing — chokidar already debounces; switch to
qmd update --incremental(qmd 2.x supports it) instead of full re-embed. - Lint partitioning — run
wiki_lintper-section (scope: "wiki/people") by default; add awiki_lint_fullfor the cross-section graph pass that's run nightly via a scheduled job. - MCP server moves off localhost — deploy to a small VM behind HTTPS. Add session-aware transport (
sessionIdGenerator) so each user gets stateful streaming. Add per-user auth (API keys → OAuth). - Reader gets a CDN edge cache — Next.js static export to Vercel/Cloudflare Pages for the public wiki; SSR-on-edge for the private wiki; the MCP endpoint is the only origin call.
What stays the same: qmd is still the search engine. BM25 + vector + LLM rerank still works at 5K pages on a single VM with 8 GB RAM.
Tier 2 — multi-tenant, ≤ 50,000 pages
What breaks first:
- Local GGUF embeddings on a single machine: latency creeps to 1–2 s per query as the corpus grows.
- BM25 + vector indices on disk get large; cold-start of a new server replica means re-embedding.
- Write contention: many agents writing simultaneously through
wiki_write/wiki_patchneed coordination. - Memory pressure: the LLM reranker holds a model in RAM; can't horizontally scale read replicas without separating it.
Fixes (in order of leverage):
- Move embeddings to a managed vector store — pgvector / Qdrant / Turbopuffer / Pinecone. Keep BM25 local (cheap, fast) and call out for vector + rerank. Embeddings get computed once at ingest and cached in the vector DB; servers become stateless.
- Separate read/write paths — read replicas (multiple stateless MCP servers behind a load balancer) hit a shared vector store + a read-only mirror of the markdown. Write path goes through a single coordinator that handles
wiki_write/patch/linkand publishes change events to a queue. Read replicas pick up changes async. - Hot/cold tiering — pages with
Status as of< 90 days old + high access count stay in the hot index (full BM25 + vector + rerank). Cold pages move to a single keyword-only BM25 index. The wiki_search tool unions the two. - Move the LLM reranker out-of-process — run as a dedicated GPU service or use a hosted reranker (Cohere Rerank, Voyage, etc.). Read replicas call it over RPC.
- Move append-only
log.md— at this scale,log.mdis gigabytes. Move to a proper append-only store (SQLite WAL, S3 object-per-day, or a time-series DB).wiki_log_querybecomes a query against that store. - Backups + DR — git repo gets snapshotted to S3 daily. Vector DB gets its own backup cadence. Test restore quarterly.
Tier 3 — large-scale public, 500,000+ pages
What breaks first:
- Single git repo can't hold 500K markdown files efficiently.
- Lint and relations graph are O(N²) without partitioning.
- Search rerank latency is the user-facing bottleneck even with managed vector stores.
Fixes (these are real distributed-systems decisions; sketches not specs):
- Shard the corpus — by directory tree (
wiki/people/*shard,wiki/entities/*shard) or by tenant if multi-customer. Each shard has its own qmd-equivalent and write coordinator. - Search aggregator — a thin layer that fans search out to all shards, applies a final RRF + rerank across shard top-Ks, and returns. This is how all major search engines work (Elasticsearch, Meilisearch, etc.).
- Caching tier in front — Redis / Upstash for hot queries. The 60s client cache becomes a 60s server cache; keyed by
(query, type_filter, visibility, user). - CDN-fronted reader — every wiki page gets a static-export prerender; private pages render via signed URL on the CDN edge.
- Streaming ingest — agents writing pages publish to a queue; a writer pool consumes, validates schema, writes to git, indexes incrementally. Backpressure visible in queue depth.
4. Deployment topology
The current production topology is a hosted wiki app, not a laptop-only wiki:
Humans + MCP-aware agents
|
v
wiki.hushh.ai reader ----> mcp.hushh.ai MCP server
|
+--> Drive private source workspace
+--> Cloud Storage public/private snapshots
+--> Firestore metadata and indexes
+--> GitHub daily audit/export/bootstrap mirrorThe important split:
- Drive is for authoring and review.
- Cloud Storage/Firestore is for fast serving and search/list indexes.
- MCP owns auth, tool capability, write governance, and sync.
- Git remains recovery and provenance, not the intended long-term hot database.
Future topologies should evolve this model rather than return to Drive fanout or repo scans on every request.
5. Concrete next-steps menu
In leverage order:
- Keep content CRUD on MCP/Drive and derived publishing. Do not redeploy the reader for page edits.
- Keep public reads on GCS/Firestore indexes. Do not make Drive the public database.
- Keep repo markdown until export, rollback, diff/history, and owner review are fully hosted and proven.
- Add richer search only when the hosted index stops satisfying real queries.
- Add write queues only if concurrent MCP writes become a measurable problem.
- Add analytics-driven content dashboards before adding heavyweight retrieval infrastructure.
6. What you don't have to worry about
A few things scale gracefully without intervention:
- Markdown files — the source format is lightweight. The bottleneck is not file size; it is permissioning, indexing, sync, and retrieval semantics.
- Schema correctness — the lint tool catches drift early;
Status as ofdiscipline +visibility:tags +## Sourcesblocks make every page self-describing. - MCP protocol — Streamable HTTP scales identically to any HTTPS API. No special handling needed.
- Authoring throughput — Claude (or any LLM agent) writing pages is bounded by your review cycle, not the system. The wiki can absorb writes faster than you can validate them.
7. Cost model (rough orders of magnitude)
| Tier | Compute | Storage | Bandwidth | Net /month |
|---|---|---|---|---|
| Current — Cloud Run + GCS/Firestore | Cloud Run services | GCS + Firestore + Drive | Low public traffic | Low, dominated by Cloud Run and sync calls |
| Richer search | same + search worker | vector/search index | moderate | increases only when semantic retrieval is added |
| Large public | autoscaled read path | larger indexes + CDN/cache | higher | dominated by public traffic and search/rerank |
These are rough; actual cost is dominated by the vector DB at Tier 2+ and by traffic at Tier 3.
8. Observability checklist
- MCP server: structured logs (request ID, tool name, latency, error code).
- Serving sync: Drive scan count, pages published, pages retracted, quarantines, and duration.
- Firestore/GCS: index freshness, object generations, and self-repair count.
- Reader: client-side error reporting (Sentry) + Web Vitals (LCP, INP, CLS).
- Health endpoint:
/readyzand MCP tool checks verify serving mode and connectivity. - Audit trail: MCP writes emit structured audit logs immediately; the backup export job preserves Git audit continuity while the hosted metadata plane stabilizes.
Sources
- MCP implementation notes — current server architecture, capabilities, and transport.
- Serving implementation notes — GCS/Firestore helper layer.
- Serving sync implementation notes — Drive and filesystem sync into hosted snapshots.
- Hussh wiki architecture — current architecture.
- Karpathy LLM Wiki gist (raw / wiki / schema layering) — referenced in
wiki/about/llm-wiki-pattern.md.