Skip to content

LLM Wiki Pattern

Source: Karpathy's gist Ingested: 2026-04-09


Core Idea

Stop re-deriving. Start compiling.

Traditional RAG makes the LLM rediscover connections from raw docs on every query. The LLM Wiki eliminates this: the LLM builds a structured markdown knowledge base once, then queries are answered from the compiled wiki — not the raw sources.

"The wiki is a persistent, compounding artifact."


Three-Layer Architecture

Raw Sources (immutable, human-curated)
      |
      v  [ingest]
Wiki Pages (LLM-maintained markdown)
      |
      v  [query]
Answers (grounded, with citations)

Layer 1 — Raw Sources Documents, articles, papers. Never modified by LLM. Authoritative source of truth.

Layer 2 — Wiki LLM-generated and maintained markdown files. Summaries, entity pages, concept pages, comparisons. The human reads; the LLM writes.

Layer 3 — Schema Config (SCHEMA.md or CLAUDE.md) defining wiki structure, ingestion rules, conventions.


Three Operations

Ingest

New source added → LLM reads it, extracts key info, integrates into existing pages (may touch 10-15 files), updates index, logs activity.

Query

User asks → LLM reads index first, finds relevant pages, synthesizes answer with citations. Optionally saves valuable explorations as new pages.

Lint

Periodic health check: contradictions, stale claims, orphaned pages, missing cross-references, conceptual gaps.


index.md — content catalog. LLM reads this first during queries to locate relevant pages.

log.md — append-only chronological record of all operations.


Why it beats RAG

RAG LLM Wiki
Re-derives on every query Compiled once, queried fast
Context window filled with raw chunks Context filled with synthesized knowledge
No accumulation Compounds over time
Hard to synthesize across many docs Cross-references pre-built

v2 Extensions (rohitg00)

  • Memory lifecycle: confidence scoring, supersession, gradual forgetting
  • Knowledge graph: typed entities (people, projects, libs) + relationships (uses, depends-on, contradicts)
  • Automation: event-driven hooks, auto-ingest, scheduled consolidation
  • Consolidation tiers: working memory → episodic → semantic → procedural

Implementation in this Wiki

This wiki IS the implementation. Hosted at wiki.mukhayyar.my.id.

To ingest new content: tell Ductor "wiki ingest [URL or content]" — it will create/update pages and update log.md.


Recent Finds

Updated: 2026-04-11

Single-Agent LLMs Outperform Multi-Agent Systems Under Equal Token Budgets (arXiv 2604.02460)

April 2, 2026. Using the Data Processing Inequality as a formal lens, this paper shows that when reasoning-token budgets are held equal across single-agent and multi-agent setups, single-agent systems consistently match or beat multi-agent on multi-hop reasoning (tested on Qwen3, DeepSeek-R1-Distill-Llama, Gemini 2.5). The damning finding: most reported multi-agent benchmark gains are measurement artifacts — extra unaccounted compute and context injection disguised as architectural advantage. Implication for anyone building multi-agent pipelines: benchmark honestly by equalizing total thinking tokens. Multi-agent is not inherently smarter — it's often just burning more tokens.

Multi-Agent Orchestration for Exascale Materials Screening (arXiv 2604.07681)

April 9, 2026. A productive counterpoint: multi-agent does win when the problem is parallelism-limited, not reasoning-limited. Hierarchical planner-executor architecture (gpt-oss-120b) deployed on the Aurora supercomputer to screen metal-organic frameworks for water harvesting at exascale. The planner decomposes the search space; executor agents run asynchronously. Result: high task completion rates and low orchestration overhead. Synthesis with arXiv 2604.02460: single-agent beats multi-agent in reasoning depth; multi-agent beats single-agent in throughput over embarrassingly parallel tasks. The line is now clearly drawn.

Efficient Inference of Large Vision-Language Models — Survey (arXiv 2603.27960)

March 30, 2026. Comprehensive survey of LVLM inference acceleration organized into four axes: (1) visual token compression (pruning, merging, spatiotemporal aggregation — the central problem is quadratic attention cost over long visual token sequences), (2) memory & serving (KV-cache paging, continuous batching), (3) architecture (sparse MoE, cross-modal projector optimization, hardware-aware attention kernels), (4) advanced decoding (speculative decoding adapted for multimodal). Most useful as a reference taxonomy for anyone optimizing a VLM pipeline. Open problems flagged: streaming video inference and expert routing efficiency in sparse MoE cross-modal models.

SHAPE: Stage-Aware Hierarchical Advantage for LLM Reasoning (arXiv 2604.06636)

April 8, 2026. SHAPE formalizes LLM reasoning as a trajectory through a "state space of empirical solvability" and introduces a hierarchical credit assignment mechanism. At the segment level, it uses a stage-aware advantage function to reward efficient breakthroughs in low-solvability states (the hard parts); at the token level, it uses potential-based shaping to avoid rewarding verbosity. Result: 3% average accuracy gain on math reasoning benchmarks with 30% reduced token consumption — addressing the token-length inflation problem that plagues chain-of-thought fine-tuning. Key insight: current process supervision rewards steps without distinguishing meaningful progress from verbose padding.

On Step Length Confounding in LLM Reasoning Data Selection (arXiv 2604.06834)

April 8, 2026. Identifies a subtle but critical flaw in how reasoning datasets are curated: step length is confounded with reasoning quality in most selection pipelines. Longer reasoning chains score higher on standard selection metrics (perplexity-based or reward-model-based), but longer ≠ better. The paper shows that removing this confound and selecting for compact, correct reasoning steps improves downstream fine-tuned model quality. Practical implication: if you're training on synthetic reasoning data (GPT-4 / o3 generated), blind length-based filtering actively degrades model quality.

RLHF: A Statistical Perspective (arXiv 2604.02507)

Establishes a statistical foundation for RLHF by addressing how to combine abundant but biased AI-generated labels with limited high-quality human feedback. Derives recovery guarantees for human-aligned preferences under mixed-data regimes. Key insight: the bottleneck in scaling RLHF is not the RL algorithm but the quality and quantity of human preference signal — this paper provides the theoretical grounding for hybrid data strategies.

GIFT: Group-Relative Implicit Fine-Tuning (arXiv 2510.23868)

Reformulates the RL fine-tuning objective as a normalized MSE loss between implicit and explicit reward signals, eliminating intractable normalization constants and clipping mechanisms found in PPO/GRPO. GIFT produces lower-variance gradients than standard RLHF baselines. Practical impact: reduces per-step training instability without sacrificing policy expressiveness, making RLHF more accessible for smaller research teams.

LLMOrbit: A Circular Taxonomy of Large Language Models (arXiv 2601.14053)

Maps the multi-agent LLM landscape and introduces a circular taxonomy organizing models by capability tier and deployment role. Covers scaling walls, agentic reasoning patterns, and system-level engineering challenges. Notable: introduces the concept of dynamic agent topology rewiring — multi-agent systems that restructure their communication graph during reasoning, rather than fixing it at design time.