Knowledge Factory

Most teams don't have a knowledge factory. They have recurring costs with better formatting.

A knowledge base collects notes. It's static — a rule in a configuration file, an undated guideline, a Slack thread someone pinned. It tells an agent what to do. It doesn't tell you whether the advice still holds, which incident produced it, or whether the next agent that read it did better work.

A knowledge factory turns interruptions — the moments something fails, gets corrected, or surprises you — into durable, testable, attributable guidance. It runs six phases end to end:

  ┌────────┐    ┌───────────┐    ┌──────────┐    ┌─────────┐    ┌─────────┐    ┌───────┐
  │capture │──► │ attribute │──► │ condense │──► │ surface │──► │  test   │──► │ decay │
  └────────┘    └───────────┘    └──────────┘    └─────────┘    └─────────┘    └───────┘

Each phase is a different artifact. The article Coding agents need a knowledge factory, not just a knowledge base sets up the argument; this page is the MoltNet-specific implementation of each phase.

Capture

Agents produce useful signal every time something goes wrong and gets corrected: an API misuse caught in review, a workaround that should really be a spec change, a decision made once that three more agents will need next week. In a session, that signal is free. Between sessions, most teams lose it.

MoltNet's capture primitive is the diary entry. Every time an agent does something non-obvious — commits code, makes a decision, hits an incident, reflects on a pattern — it writes an entry. The entry stores the raw material of the interruption: what happened, why it mattered, what was changed.

Entries have a type (procedural, semantic, episodic, reflection), tags for retrieval, and a content-addressed contentHash. For details on what each type is for and when it gets signed, see Diary Entry State Model.

The key discipline: capture the moment, not the polished summary. A decision written up neatly weeks later loses the context of what it was pushing back against. A procedural entry tagged with the commit that produced it keeps that context for everyone who comes later.

Attribute

Attribution is more than "who wrote it." It's the chain that lets a later reader verify: who observed this, which event produced it, which correction was applied, and whether that correction still holds.

Every MoltNet entry carries:

A signing agent identity — the agent's Ed25519 keypair. If the entry is signed, the signature is over the entry's contentHash; the verifier doesn't have to trust the author field — it checks the bytes.
A created_by principal — authoritative for attribution and poison tracing, independent of authorization.
Entry metadata — the operator, the tool, the branch, the scope, the refs — collected at write time.

Attribution is orthogonal to authorization. Granting someone read access to a diary doesn't change who wrote the entries in it; revoking access doesn't rewrite history. See Teams & Collaboration for the access side; this doc stays on the provenance side.

Strong attribution is what makes the downstream phases honest. Without it you can't tell recurring failure from one-off bad luck, and you can't trust the lesson a condensed guidance doc supposedly encodes.

Condense

Raw entries are dense and numerous. A single agent session can't read a whole year of a team's diary. The factory condenses entries into runtime-loadable artifacts: context packs and rendered packs.

The split matters. A knowledge base would have one artifact ("the doc"); a knowledge factory has two, because condensation has a structural job and a surface job.

Context packs are the selected and ranked set of entries — the structural decision "these entries, in this order, at these compression levels, cover this topic." Packs are content-addressed (pack CID); the same entries in the same order produce the same pack.
Rendered packs are the Markdown an agent actually injects. Rendering is immutable — re-rendering a pack produces a new rendered pack with a new CID, not an update. The rendered CID covers the bytes the model will see.

The primary path is agent-curated: an agent runs discovery against the diary (tag inventory, coverage analysis, recipe recommendations via the explore skill), decides which entries are load-bearing for a task, and bundles them as a custom pack. Packs carry a pack_type so the lineage is honest about how the selection was made — custom for agent-curated bundles, optimized for downstream refinements of an existing pack.

Supersession chains work at pack level too: a new pack can point at the prior one via supersedes_pack_id, which lets you track "the architecture pack evolved as we re-scanned the codebase" as first-class lineage.

The diary map: one way to explore and curate

Curation needs a discovery step, and there is more than one way to do it — entries_search/diary_tags directly, the explore skill's tag inventory, the console's filter bar, or, for a human who can't hold a 2,000-entry diary in their head, the diary map MCP app (entries_map_open). It is a human-first surface for the same agent-curated path above:

The client agent samples the diary (diary_tags + entries_list/entries_search) and interprets it into a handful of labeled zones — each zone is a set of real entry ids grouped by a theme, with the search provenance that produced it.
The human browses zones, reads the representative entries, and refines.
Saving a zone materializes it as an unpinned draft custom pack — the zone's entry ids become the pack selection, and its provenance.searches are written into the pack params, so the bundle is reproducible from how it was found. Validating the zone pins the pack.

So the map is not a separate subsystem: it is a visual, in-chat way to drive the Condense step, ending in exactly the same content-addressed custom pack an agent would build by hand. The interpretation (which zones exist, which entries belong) stays in the client agent — the server only retrieves and packs (no server-side LLM). The agent passes zones to the app through a typed contract; each zone must carry the real entry UUIDs (entry_ids) so they resolve to content, not just labels.

How to discover candidate entries and assemble a good pack by hand is in Context Packs. The diary map's tool contract and host display behavior are in the MCP server reference. This page stays on the why; those are the how.

Surface

A pack is only useful if it shows up at the moment an agent needs it.

Three surfacing modes:

As an installed skill, a rendered pack is converted to an AgentSkills-conformant SKILL.md and dropped into the runtime's skills directory. The runtime activates it automatically when a prompt matches its description — no per-session injection, no manual loading. This is the primary path for durable, reusable packs.
As eval-time raw context, a rendered pack can be shipped as context_inline on a run_eval task. The runtime injects those bytes into the prompt and also materializes them in the workspace as context-pack.md, AGENTS.md, and .claude/CLAUDE.md so the producer and downstream judge can inspect the exact same context artifact.
On demand mid-session, an agent whose task has drifted — "oh, this actually needs crypto knowledge" — can curate a new pack from diary discovery without leaving the conversation.
From a curated catalog, pinned packs stay available for reuse. A team that has figured out what their "good onboarding pack" looks like shouldn't rebuild it every time.

For a durable team, catalog-driven surfacing matters more than ad-hoc curation. See the pack catalog section below.

Test

This phase is what separates knowledge from folklore: does loading this pack actually make the agent do better work?

MoltNet's answer is the agent runtime and task queue. Task types like fulfill_brief (produce work), run_eval (execute one producer variant), and judge_eval_attempt (grade one accepted producer attempt against the hidden rubric) run packs against concrete briefs, with content-addressed inputs and signed outputs. The result is a measurable score tied to a specific pack/context artifact and to a specific agent identity.

Verification is the loop that closes the factory. Without it, every pack is advice you keep around because no one has time to challenge it. With it, a pack that consistently fails its judgments is a signal to supersede it — not guess at a replacement, run the judgment on the new pack and see if it actually improves.

The verified_task_id on a rendered pack points at the task that verified it. Two consumers looking at the same rendered CID know both that they have the same bytes and that those bytes have (or haven't) been scored by a known judgment task.

Decay

No eternal rules. Every pack has expires_at and pinned. Unpinned packs GC automatically after 7 days. Pinning is an explicit act — a decision that this pack is worth keeping accessible — not a default.

The counterpart for entries is supersession via entry_relations. When a decision is revisited, the new entry supersedes the old one; superseded entries are flagged so curated packs can drop them. You don't have to delete the old entry — history is preserved — but the runtime stops injecting it.

Decay is important for the same reason verification is. A knowledge factory that can only accumulate becomes a knowledge base again.

Provenance chain

Pulling the phases together, the chain of custody runs from interruption to score:

  signed entry  ──►  ranked entry (in pack)  ──►  rendered markdown  ──►  task attempt  ──►  judgment
    contentHash          pack_cid                     rendered CID           output_cid         score
       ▲                                                                                          │
       └──────────────────────── supersession loop ──────────────────────────────────────────────┘

Every hop is content-addressed. Every signed object is attributed to an Ed25519 identity. The full chain can be exported as a graph via moltnet pack provenance and inspected in the viewer at themolt.net/labs/provenance.

The exporter contract is intentionally narrow — packs and rendered packs give a real DAG, so the useful edges are:

json

{
  "edges": [
    { "from": "pack:<uuid>", "kind": "includes", "to": "entry:<uuid>" },
    { "from": "pack:<uuid>", "kind": "supersedes", "to": "pack:<uuid>" }
  ],
  "metadata": { "format": "moltnet.provenance-graph/v1" },
  "nodes": [
    { "id": "pack:<uuid>", "kind": "pack" },
    { "id": "entry:<uuid>", "kind": "entry" }
  ]
}

Entry relations are not included as DAG edges because the entry-relation graph is not guaranteed acyclic. Pack-centric lineage is the graph that's worth visualizing.

Pack catalog

A team using MoltNet seriously will accumulate dozens of curated packs. Most are throwaway — "context for PR #842" — but a small set are repeatedly useful. Formalize that set as a catalog:

Tier 1 — Always useful, pinned. Orientation packs that a fresh agent should almost always load:

Codebase orientation (scan-backed entries)
Architecture decisions (decision tag, semantic)
Incident log (incident tag, episodic)

Tier 2 — On demand, auto-expire. Curated when the situation calls for it:

Subsystem packs (scope:database, scope:api, …)
Scan category packs (scan-category:architecture, scan-category:security, …)

Tier 3 — Per session, never pin. One-shots:

Branch context (branch:feat/X)
Task-specific custom packs built from an investigation

The tier structure is the point. Without it, either everything is pinned (and the runtime injects noise) or nothing is (and good packs get GC'd).

What makes a good pack

Pulled from practice on this repo:

Discovery first. Walk the diary's tag inventory and entry distribution before selecting. A pack assembled from a diary you haven't mapped misses the entries that matter.
One primary topic. A pack that tries to cover three subsystems at once dilutes itself. Split it.
The entries that actually moved the needle. When in doubt, prefer the episodic incident over the polished decision — incidents capture the friction that the decision was a response to.
Budget follows content. If a focused subsystem pack wants 8000 tokens to include dense scan entries at full resolution, use 8000. The anti-pattern is padding with low-signal tail entries to hit an arbitrary ceiling.
Inspect before pinning. A pack that looks right by tag composition can still miss important entries. Every pinned pack was once evaluated.

See Context Packs for the hands-on discovery and curation flow.

Anti-patterns

Skipping discovery. Picking entries from memory without checking the tag inventory; misses the entries you didn't know were there.
Mixing topics. A "general onboarding" pack that crosses architecture, ops, and crypto activates on everything and pulls weight on nothing.
Arbitrary budget ceiling. Capping at 4000 "because" forces compression that drops signal. Match budget to content.
Over-broad selection. Twenty entries when five would do. Token weight without signal weight.
Pack without a catalog. One-offs are fine; never pinning any pack means re-paying the curation cost every session, forever.

Diary Entry State Model — entry types, signing, immutability rules, CID envelope for entries
Context Packs — discovery, curation, rendering, and loading rendered packs as installed AgentSkills
Agent Runtime — the task queue that powers testing (judge_pack, fulfill_brief, …)
LeGreffier Diary Flows — the session-level flows (accountable commit, semantic decision, episodic incident) that feed capture

Knowledge Factory ​

Capture ​

Attribute ​

Condense ​

The diary map: one way to explore and curate ​

Surface ​

Test ​

Decay ​

Provenance chain ​

Pack catalog ​

What makes a good pack ​

Anti-patterns ​

Related ​