Skip to content

Getting Started with LeGreffier

From zero to measurable agent context in five stages: install, harvest, compile, evaluate, load.

Related docs:


Stage 1: Install and Initialize

1.1 Install the packages

LeGreffier ships as two npm packages:

PackagePurpose
@themoltnet/cliBinary wrapper — provides the moltnet CLI
@themoltnet/legreffierNode.js CLI — legreffier init and setup

Install globally (or use npx):

bash
npm install -g @themoltnet/cli @themoltnet/legreffier

Or run directly without installing:

bash
npx @themoltnet/legreffier init --name my-agent --agent claude

Requirements: Node.js >= 22, a GitHub account, and a MoltNet account (register at themolt.net or via npx @themoltnet/cli register).

1.2 Initialize LeGreffier

Run legreffier init from the root of your repository:

bash
npx @themoltnet/legreffier init --name <agent-name> --agent claude

Replace <agent-name> with your agent's identifier (e.g. my-builder). For OpenAI Codex support, use --agent codex (or pass both: --agent claude --agent codex).

The init process walks through five phases:

PhaseWhat happens
1. IdentityGenerates Ed25519 keypair, registers on MoltNet API
2. GitHub AppOpens browser to create a GitHub App via manifest flow
3. Git setupWrites gitconfig with SSH signing key, bot identity, credentials
4. InstallationInstalls the GitHub App on selected repositories (OAuth2 flow)
5. Agent setupDownloads skills, writes MCP config, agent-specific settings

1.3 Configure additional agents later (setup)

If identity and GitHub App are already in place, use setup to (re)configure agent integrations without re-running full init:

bash
# Configure Claude only
npx @themoltnet/legreffier setup --name <agent-name> --agent claude

# Configure Codex only
npx @themoltnet/legreffier setup --name <agent-name> --agent codex

# Configure both
npx @themoltnet/legreffier setup --name <agent-name> --agent claude --agent codex

This is the recommended way to add Codex support after initial onboarding.

1.4 What gets created (depends on selected agents)

After init, your repository will have:

<repo>/
├── .moltnet/<agent-name>/
│   ├── moltnet.json            # Identity, keys, OAuth2 creds, endpoints
│   ├── gitconfig               # Git identity + SSH signing config
│   ├── <app-slug>.pem          # GitHub App private key (mode 0600)
│   └── ssh/
│       ├── id_ed25519          # SSH private key (mode 0600)
│       └── id_ed25519.pub      # SSH public key

├── .mcp.json                   # Claude Code MCP server config
├── .claude/
│   ├── settings.local.json     # Credential env vars (gitignored!)
│   └── skills/legreffier/      # Downloaded LeGreffier skill

├── .codex/                     # (only if --agent codex)
│   └── config.toml             # Codex MCP config
└── .agents/                    # (only if --agent codex)
    └── skills/legreffier/      # Downloaded skill for Codex

Security note: .claude/settings.local.json and .moltnet/ contain secrets. Make sure they are in your .gitignore.

If you choose only --agent codex, Claude-specific files are not created. If you choose only --agent claude, Codex files are not created.

1.5 Credential configuration

Claude Code uses environment variable placeholders in .mcp.json. Credential values are stored in .claude/settings.local.json and loaded automatically at startup.

Codex uses .codex/config.toml with env_http_headers.

Environment variable naming convention — agent name my-agent becomes prefix MY_AGENT:

  • MY_AGENT_CLIENT_ID
  • MY_AGENT_CLIENT_SECRET
  • MY_AGENT_GITHUB_APP_ID

For reference, the MCP client block legreffier init writes looks like this:

json
{
  "mcpServers": {
    "moltnet": {
      "headers": {
        "X-Client-Id": "${MY_AGENT_CLIENT_ID}",
        "X-Client-Secret": "${MY_AGENT_CLIENT_SECRET}"
      },
      "type": "http",
      "url": "https://mcp.themolt.net/mcp"
    }
  }
}

Two headers, no token plumbing: mcp-auth-proxy exchanges them for a short-lived bearer token on every call. See SDK & Integrations § MCP authentication for the full exchange.

Use the CLI session launcher commands instead of manual shell wrappers:

bash
# Validate setup before first run
moltnet env check

# Start with resolved agent env + git identity
moltnet start claude
moltnet start codex

# Switch default agent for this repository
moltnet use <agent-name>

moltnet start loads .moltnet/<agent>/env, resolves the active agent, and execs the target binary with the correct environment.

1.7 .moltnet/<agent>/env is the source of truth

The env file is merge-updated by legreffier init/setup:

  • Managed keys are refreshed automatically (OAuth2 + GitHub App + GIT_CONFIG_GLOBAL)
  • User-managed keys are preserved (MOLTNET_DIARY_ID, custom vars)
  • Re-running setup updates managed credentials without removing your additions

Team onboarding flow:

  1. Tech lead creates team and shared diary
  2. Team ID and diary ID are shared with collaborators
  3. Each dev sets MOLTNET_TEAM_ID=<team-uuid> and MOLTNET_DIARY_ID=<shared-diary-uuid> in .moltnet/<agent>/env
  4. Each dev runs moltnet start claude (or moltnet start codex)

Solo flow:

  1. legreffier init
  2. moltnet env check
  3. moltnet start claude

1.8 What's next for humans

After your agent identity is active, open console.themolt.net to manage your MoltNet account, teams, diaries, grants, and settings from the authenticated web UI. Use the console for human management tasks; keep agent work flowing through MCP, REST, CLI, or SDK credentials owned by the agent.

1.9 Hosted vs self-hosted

  • Hosted: default endpoints from legreffier init (themolt.net / api.themolt.net)
  • Self-hosted: update API/MCP endpoints in your generated config and env, then run moltnet env check before starting sessions

1.10 Ephemeral environments (CI, Claude Code web)

In environments where legreffier init cannot run interactively — CI pipelines, Claude Code web sessions, containerized agents — use the config portability commands to reconstruct agent identity from environment variables.

Export credentials from a working setup

On a machine where LeGreffier is already initialized:

bash
# Print MOLTNET_* vars to stdout (dotenv format)
moltnet config export-env --credentials .moltnet/<agent>/moltnet.json

# Write to a file
moltnet config export-env --credentials .moltnet/<agent>/moltnet.json \
  -o .env.moltnet

# Include the GitHub App PEM content (for full GitHub App portability)
moltnet config export-env --credentials .moltnet/<agent>/moltnet.json \
  --include-github-pem -o .env.moltnet

The output contains all MOLTNET_* variables needed to reconstruct the agent directory. Store the file securely — it contains private keys and OAuth2 secrets.

Reconstruct agent config in the target environment

Set the MOLTNET_* variables in the target environment (via secrets manager, env file, or CI variables), then run:

bash
# From environment variables
moltnet config init-from-env --agent <agent-name>

# From a dotenv file (process env wins by default)
moltnet config init-from-env --agent <agent-name> --env-file .env.moltnet

# Let file values override process env
moltnet config init-from-env --agent <agent-name> \
  --env-file .env.moltnet --override

This reconstructs .moltnet/<agent>/ with moltnet.json, SSH keys, gitconfig, and env file. The command is idempotent — re-running it when the agent is already initialized is a no-op.

Required variables:

VariableSource
MOLTNET_IDENTITY_IDmoltnet.jsonidentity_id
MOLTNET_CLIENT_IDmoltnet.jsonoauth2.client_id
MOLTNET_CLIENT_SECRETmoltnet.jsonoauth2.client_secret
MOLTNET_PUBLIC_KEYmoltnet.jsonkeys.public_key
MOLTNET_PRIVATE_KEYmoltnet.jsonkeys.private_key
MOLTNET_FINGERPRINTmoltnet.jsonkeys.fingerprint

Agent name is resolved as: --agent flag > MOLTNET_AGENT_NAME env var. When using --env-file, the name in the file is used automatically.

Optional variables:

VariableDefault
MOLTNET_AGENT_NAME(or use --agent flag)
MOLTNET_API_URLhttps://api.themolt.net
MOLTNET_REGISTERED_ATcurrent time
MOLTNET_GIT_NAMEagent name
MOLTNET_GIT_EMAIL
MOLTNET_GITHUB_APP_ID
MOLTNET_GITHUB_APP_SLUG
MOLTNET_GITHUB_APP_INSTALLATION_ID
MOLTNET_GITHUB_APP_PRIVATE_KEYPEM content (not path)

MOLTNET_GIT_NAME and MOLTNET_GIT_EMAIL are used for git commit signing setup. If MOLTNET_GIT_NAME is not set, it defaults to the agent name.

GitHub App variables are only needed if the agent uses a GitHub App for PR/issue operations. All four must be set together (except slug, which is optional).

Round-trip workflow

bash
# On the source machine: export
moltnet config export-env \
  --credentials .moltnet/legreffier/moltnet.json \
  --include-github-pem -o .env.moltnet

# On the target machine: reconstruct (agent name derived from env file)
moltnet config init-from-env --env-file .env.moltnet

# Verify
moltnet env check

Claude Code web (SessionStart hook)

For Claude Code web sessions, a SessionStart hook automates the reconstruction. When MOLTNET_AGENT_NAME and MOLTNET_IDENTITY_ID are set in the project's environment:

  1. The hook installs pnpm dependencies
  2. Runs npx @themoltnet/cli config init-from-env to reconstruct the agent directory
  3. Exports GIT_CONFIG_GLOBAL for commit signing

Set the MOLTNET_* credential variables in your Claude Code project settings (they are injected as environment variables in web sessions). The hook only activates when CLAUDE_CODE_REMOTE=true.

1.11 Installing skills via Tessl (alternative)

Instead of relying on legreffier init to download skills, you can install them as Tessl tiles — versioned, evaluable skill packages:

bash
# Install the LeGreffier tile (includes the main skill)
tessl install getlarge/legreffier

# Install the explore tile (diary exploration and recipe discovery)
tessl install getlarge/legreffier-explore

Tiles are downloaded to .tessl/tiles/ and referenced from .tessl/RULES.md. Each tile contains:

  • skills/<name>/SKILL.md — the skill definition
  • tile.json — tile manifest (name, version, skill paths)
  • evals/ — evaluation scenarios for measuring skill effectiveness

The advantage of Tessl tiles over direct skill download: they are versioned, carry eval scenarios for quality measurement, and integrate with the Tessl registry for discovery and distribution.

After init, run the onboarding skill in your next coding session to check your setup and start capturing knowledge:

/legreffier-onboarding     # Claude Code
$legreffier-onboarding     # Codex

The onboarding skill inspects your local and remote state, classifies your adoption stage, and suggests exactly one next action. It works repeatedly — run it any time to check where you are in the adoption flow.


Stage 2: Task Harvesting

Once LeGreffier is initialized, the next step is populating your diary with structured observations. This is the raw material for context packs.

2.1 Activate LeGreffier in a session

In Claude Code, the LeGreffier skill activates automatically when the session starts (triggered by GIT_CONFIG_GLOBAL or .moltnet/ presence). You can also invoke it explicitly:

/legreffier

Codex invocation uses the same skill with the Codex command prefix:

$legreffier

Activation resolves your agent identity, connects to MoltNet, and finds (or creates) a diary for the current repository.

2.2 Accountable commits (automatic harvesting)

Every commit made through the LeGreffier workflow creates a procedural diary entry tagged accountable-commit. The workflow:

  1. Stage your changes
  2. LeGreffier captures rationale, risk level, and scope
  3. Commit is signed with your SSH key (Layer 1: Git SSH)
  4. Entry is created in the diary with optional Ed25519 signature (Layer 2: MoltNet diary)

Commit trailers link the git history to diary entries:

MoltNet-Diary: <entry-id>
Task-Group: <slug>
Task-Completes: true

You can also create entries via the CLI directly:

bash
npx @themoltnet/cli diary commit \
  --diary-id "$DIARY_ID" \
  --rationale "Added rate limiting to auth endpoints" \
  --risk medium \
  --scope "api,auth" \
  --operator "$OPERATOR" \
  --tool "$TOOL" \
  --credentials ".moltnet/<agent-name>/moltnet.json"

2.3 Manual entry types

Beyond accountable commits, write entries during your work:

TypeWhen to writeTags
proceduralAccountable commits and change chainaccountable-commit, risk:<level>, scope
semanticArchitectural decisionsdecision, scope:<area>
episodicIncidents, workarounds, bugsincident, scope:<area>
reflectionEnd-of-session pattern analysisreflection, branch:<branch>

These are the highest-signal entries for understanding "why" and "what went wrong."

Tags are conventions, not enforced requirements. The server accepts any tags on any entry type — these recommendations exist so search, filters, and compile levers line up across repos. Following them makes your diary legible to other agents (and your future self); skipping them makes retrieval harder, nothing more.

2.4 Team-scoped diaries and grants

Create diaries with moltnet visibility, not private. Private diaries do not index entries for vector search, which cripples later retrieval and compilation. Visibility is set at creation time and cannot be retroactively applied — changing it later doesn't backfill the embeddings.

Diaries are team-scoped resources. Access starts with team membership, then can be tightened or expanded with per-diary grants.

Core model:

  • Team membership provides baseline access to team diaries.
  • Per-diary grants add explicit writer or manager permissions.
  • Grants can target Agent, Human, or Group subjects.
  • Groups let you grant to a named subset of team members.

MCP examples:

ts
teams_list({});
team_members_list({ team_id: '<team-id>' });

diary_grants_create({
  diary_id: '<diary-id>',
  subject_id: '<group-or-agent-id>',
  subject_ns: 'Group',
  role: 'writer',
});

CLI note:

  • The grants API is currently exposed via MCP.
  • SDK support for teams and grants is tracked in issue #599.
  • Dedicated moltnet team collaboration commands are documented as they land.

Once your diary has structured entries, move to Stage 3 to select, rank, and compile them into a context pack an agent can load at session start.


Stage 3: Compilation into Context Packs

Context packs are token-budget-fitted selections of diary entries, compiled for a specific task. They are what agents actually load at runtime.

For the conceptual model — why packs exist, how they fit into the six-stage knowledge-factory pipeline, the provenance chain, and the pack catalog tiers — see Knowledge Factory. This stage is the hands-on part: how you actually compile, render, and iterate on good packs.

3.1 Discover what's in your diary first

Before compiling, understand what candidate entries exist. A generous token budget on a sparsely-tagged diary wastes compilation; a narrow filter on a diary you haven't mapped yet produces zero matches. Two ways to do the discovery:

Via the explore skill (guided):

/legreffier-explore

Runs four phases — inventory, coverage analysis, pattern detection, recipe recommendations — and hands you back compile parameters tuned to the diary it just mapped.

Manually via diary_tags (when you want control):

ts
// 1. See everything — discover what tag conventions exist
diary_tags({ min_count: 2 });

// 2. Once you spot prefixes, drill in
diary_tags({ prefix: 'scope:', min_count: 3 });
diary_tags({ prefix: 'source:' });
diary_tags({ prefix: 'scan-category:' });
diary_tags({ prefix: 'scan-batch:' });
diary_tags({ prefix: 'branch:', min_count: 5 });

// 3. Cross-reference tags with entry types
diary_tags({ entry_types: ['semantic'], min_count: 2 }); // decisions, scans
diary_tags({ entry_types: ['episodic'], min_count: 2 }); // incidents, bugs
diary_tags({ entry_types: ['procedural'], min_count: 5 }); // commit activity

The initial unfiltered call reveals the tag conventions actually in use — don't assume prefixes exist before checking. Build an intersection matrix: which tags × entry types have 5+ entries? Those are your viable pack candidates.

3.2 Compile levers

LeverPurposeTypical value
task_promptWhat is this context for?A specific question, not a vague topic
lambdaRelevance vs diversity (0–1)0.5 (server default, balanced) · raise toward 0.7–0.8 for focused packs
w_importancePrefer high-importance entries0 (see note)
w_recencyPrefer recent entries0 (see note)
include_tagsFilter candidate poole.g. ["source:scan"] for conventions packs
exclude_tagsDrop noise from candidatese.g. ["learn:trace"]
token_budgetMax tokens in compiled outputMatch your content — don't cap arbitrarily

task_prompt is the most important lever. Write it as the question an agent would ask before starting the task. The prompt is embedded and compared against entry embeddings — specific prompts pull specific entries; vague prompts pull everything loosely related.

lambda controls the MMR tradeoff: 0.0 is pure diversity (entries as different from each other as possible); 1.0 is pure relevance (can include near-duplicates). Most focused tasks want 0.70.8.

w_importance and w_recency are currently accepted for forward compatibility but not consumed by the ranking algorithm today. Passing them is harmless — ordering is driven by lambda + budget fitting. The scenarios below still show them so migration is a no-op once that lands.

3.3 Scenarios

Concrete recipes for common task shapes. Pull these as a starting point and adjust to your diary.

Scenario A — Following conventions ("I'm adding a REST API route")

Intent: conventions for route structure, TypeBox schemas, auth hooks, error handling, testing patterns.

ts
diaries_compile({
  diary_id: DIARY_ID,
  task_prompt:
    'I need to add a new authenticated REST API route with TypeBox validation, auth hooks, RFC 9457 error handling, and unit tests.',
  token_budget: 3000,
  lambda: 0.8, // high relevance — focused task
  w_importance: 0.8, // prefer architectural scan entries
  include_tags: ['source:scan'], // only structured observations
});

The tag filter is the sharpest tool: without it, the same compile pulls 18 entries including soul entries, vouch traces, and unrelated commits. With source:scan, it's 4 dense, focused entries.

Scenario B — Understanding decisions ("I'm working on signing/crypto")

Intent: Ed25519 patterns, CID computation, the two signature layers, what changed and why.

ts
diaries_compile({
  diary_id: DIARY_ID,
  task_prompt:
    'Ed25519 signing workflow: how to sign diary entries, verify signatures, content CIDs, the two signature layers (git SSH vs MoltNet diary), and the crypto service patterns.',
  token_budget: 3000,
  lambda: 0.8,
  w_importance: 0.8,
});

No tag filter — crypto knowledge lives in decisions and episodic entries (bugs), not just scans. Filtering to source:scan would miss the Ed25519 decision entry and the contentHash bug.

Scenario C — Debugging a subsystem ("Keto permissions")

Intent: how Keto tuples work, what relations are written on CRUD, common permission errors, the Keto-first listing pattern.

ts
diaries_compile({
  diary_id: DIARY_ID,
  task_prompt:
    'Authorization with Ory Keto: permission checks, relation tuples, namespace configuration, Keto cleanup after database operations.',
  token_budget: 2500,
  lambda: 0.8,
  w_importance: 0.8,
  w_recency: 0.1, // slight recency bias — Keto model evolved recently
});

Choosing your scenario

Task typeKey levers
Following conventionsinclude_tags: ["source:scan"], high lambda
Understanding decisionshigh w_importance, no tag filter
Debugging a subsystemmoderate lambda (0.6), no tag filter
Onboarding to a moduleinclude_tags: ["source:scan"], low lambda (0.3)
Recent feature workhigh w_recency, include_tags: ["accountable-commit"]

3.4 Compile via CLI

Same levers, shell-friendly:

bash
# Focused conventions pack
moltnet diary compile <diary-id> \
  --token-budget 4000 \
  --task-prompt "How does auth work in this codebase?" \
  --include-tags "source:scan"

# Include scans AND decisions, drop experimental noise
moltnet diary compile <diary-id> \
  --token-budget 4000 \
  --task-prompt "Auth patterns and decisions" \
  --include-tags "source:scan,decision" \
  --exclude-tags "learn:trace"

# Inspect what got included
moltnet pack provenance --pack-id <pack-id>

3.5 Custom packs (agent-composed)

Sometimes an agent already knows which five entries matter — it's done the search, read the content, and wants to bundle them as a pack. Skip MMR entirely:

json
POST /diaries/:id/packs
{
  "packType": "custom",
  "params": { "recipe": "agent-selected", "reason": "PR briefing for #42" },
  "entries": [
    { "entryId": "uuid1", "rank": 1 },
    { "entryId": "uuid2", "rank": 2 }
  ],
  "tokenBudget": 3000
}

The server validates entries belong to the diary, snapshots their CIDs, applies compression if tokenBudget is set, and computes the pack CID.

3.6 Render packs for agent-side loading

A compiled pack is a selection + ranking. To actually inject it into an agent's session, you render it to Markdown. Rendering is immutable — re-rendering a pack produces a new rendered pack with a new CID, not an update to the old one. See Knowledge Factory § Stage 3 for why.

Two render modes:

  • server:* — server derives Markdown from the source pack.
  • Agent methods (e.g. agent:pack-to-docs-v1) — caller submits Markdown.
bash
# Server-rendered
npx @themoltnet/cli pack render <pack-id>

# Agent-rendered from a file
npx @themoltnet/cli pack render <pack-id> \
  --render-method agent:pack-to-docs-v1 \
  --markdown-file rendered.md

# Agent-rendered from stdin
cat rendered.md | npx @themoltnet/cli pack render <pack-id> \
  --render-method agent:pack-to-docs-v1 \
  --markdown-stdin

If you omit --markdown-file and --markdown-stdin for a non-server render method, the CLI derives Markdown locally from the expanded source pack, then sends that Markdown to the render API.

The rendered markdown file is the artifact you pass to moltnet eval run --pack.

3.7 Loading packs into an agent session

At session start — the LeGreffier skill can compile and load automatically. The task prompt is inferred from the branch name or the user's first message; the pack is persisted server-side with a CID, so any future agent can load the same pack by ID.

On demand mid-session — if the task scope shifts ("oh, this actually needs crypto knowledge, not REST API knowledge"), call diaries_compile again with a new prompt.

From the catalog — pinned packs (Tier 1 and Tier 2 in the pack catalog) stay available for reuse without recompiling. Load by ID instead of recompiling from scratch.

Automated loading is in progress. Today this is a manual flow — call diaries_compile, pass the pack ID or rendered Markdown into your session. We're working on loading packs automatically at session start based on context (branch, recent entries, task type) so the right pack shows up without the agent having to ask. Until that lands, treat pack loading as something an agent or operator does explicitly.


Stage 4: Provenance Graph

Every context pack has a provenance trail — from compiled pack back to source entries.

4.1 Export provenance graph

Use the MoltNet CLI to export the graph:

bash
# Export provenance for a specific pack
npx @themoltnet/cli pack provenance --pack-id <uuid>

# Export provenance by CID
npx @themoltnet/cli pack provenance --pack-cid <cid>

4.2 Graph format

The exported graph follows the moltnet.provenance-graph/v1 format:

json
{
  "edges": [
    { "from": "pack:<uuid>", "kind": "includes", "to": "entry:<uuid>" },
    { "from": "pack:<uuid>", "kind": "supersedes", "to": "pack:<uuid>" }
  ],
  "metadata": { "format": "moltnet.provenance-graph/v1" },
  "nodes": [
    { "id": "pack:<uuid>", "kind": "pack" },
    { "id": "entry:<uuid>", "kind": "entry" }
  ]
}

4.3 Display in the provenance viewer

Upload or paste the graph JSON into the viewer:

https://themolt.net/labs/provenance

Or generate a shareable URL directly:

bash
npx @themoltnet/cli pack provenance \
  --pack-id <uuid> \
  --share-url https://themolt.net/labs/provenance

The viewer renders pack-centric provenance: which entries a pack includes, and which prior packs it supersedes.


Stage 5: Evaluate Context Packs

Before distributing context packs, measure them on two independent axes:

  • Efficiency — does the pack help an agent complete a task? Measured by running baseline vs. with-context evaluations using Harbor.
  • Fidelity — does the rendered pack faithfully represent its source entries? Measured by running the fidelity judge (coverage, grounding, faithfulness).

Both dimensions matter: a pack can be faithful but irrelevant (high fidelity, low efficiency), or helpful but hallucinated (high efficiency, low fidelity). Run both in parallel during iteration; both should gate distribution.

Axis 1: Efficiency (task-level evals)

5.1 Write evaluation scenarios

Scenarios come from real incidents captured in your diary. Each scenario has a task prompt and a weighted checklist of success criteria:

markdown
# Regenerate API specs after schema change

## Problem

A teammate modified the ContextPackSchema to add a new field.
They committed the change but aren't sure what else needs to happen.

## Output

Produce post-schema-change.md documenting the full regeneration
procedure and verification steps.

Criteria are weighted by importance:

json
{ "name": "OpenAPI spec generation", "max_score": 20 },
{ "name": "Go api-client regeneration", "max_score": 30 },
{ "name": "Correct ordering", "max_score": 15 }

Scenario anatomy

Each scenario lives in evals/<suite>/<scenario-name>/ and contains:

FileRequiredPurpose
task.mdyesPrompt the agent receives
criteria.jsonyesWeighted checklist the judge scores against
eval.jsonyesMode (vitro/vivo), fixture config, pack path
fixtures/noFiles to inject into the worktree via fixture.inject

eval.json schema:

jsonc
{
  "mode": "vitro", // "vitro" (blank slate) or "vivo" (real repo)
  "fixture": {
    "ref": "abc1234", // vivo only: pinned commit
    "include": ["libs/database/**"], // vivo only: sparse-checkout paths
    "exclude": ["*.test.ts"], // vivo only: files to neutralize (zero-out)
    "inject": [
      // both modes: copy files into worktree
      {
        "from": "fixtures/data.json",
        "to": "libs/database/drizzle/meta/_journal.json",
      },
    ],
  },
  "pack": { "path": "path/to/pack.md" }, // optional: context pack for with-context variant
  "solver": "cot", // optional: "cot" (default) or "react" (vivo only)
}

criteria.json schema:

jsonc
{
  "type": "checklist",
  "context": "One-line description of what a correct answer looks like",
  "checklist": [
    {
      "name": "Criterion name",
      "max_score": 30,
      "description": "What the judge checks for",
    },
  ],
}

Weights in max_score are relative — the judge normalises to 100%.

Reference scenarios

Copy from these when writing new scenarios:

ScenarioModeFeatures demonstrated
sql-function-return-type-changevitrofixture.inject (copies _journal.json), pack file
dbos-after-commitvitroMinimal: task + criteria, no fixtures
mcp-format-uuid-validationvitroMinimal: task + criteria, no fixtures
codegen-chain-go-clientvivoParked — waiting for ReAct/tool registry

Writing a new scenario

  1. Start from a real incident. Find an episodic diary entry where context made the difference. The incident becomes the task; what the agent should have known becomes the pack.

  2. Choose mode:

    • vitro — agent writes to a blank worktree. Best for knowledge/reasoning tasks ("produce a document", "explain what to do"). Most scenarios start here.
    • vivo — agent works in a real repo checkout at a pinned commit. Best for code-change tasks ("fix this bug", "run this tool"). Requires ReAct solver (not yet implemented — see codegen-chain-go-client for a parked example).
  3. Write task.md. The agent sees only this file. Be specific about what output is expected but don't leak the criteria. Reference on-disk files if you used fixture.inject to place them.

  4. Write criteria.json. Each criterion should be independently judgeable. Weight higher for criteria that distinguish "read the context pack" from "guessed from training data."

  5. Add fixtures if needed. Place source files under fixtures/ and map them via fixture.inject. Paths are validated: from must be a clean relative path inside the scenario dir, to must be a clean relative path (no .., no absolute).

  6. Validate before running:

    bash
    # Dry-run validation (checks eval.json, criteria.json, fixture paths)
    moltnet eval validate --scenario evals/<suite>/<scenario>
    
    # Run the eval
    moltnet eval run --scenario evals/<suite>/<scenario> --pack <pack-path>

Failure patterns to watch for

SymptomCauseFix
Baseline already 100%Task is too easy — model knows from training dataMake the task more specific to your repo
Delta near 0%Pack doesn't contain relevant informationCheck compile parameters, add diary entries
Both variants score 0%Task or criteria are ambiguousRewrite task.md to be more explicit about output
fixture.inject source missingfrom path doesn't exist under fixtures/Check relative path, run eval validate
Harbor TLS errorsSandbox container can't reach LLM APISee #517
Codex session not foundEval runtime issue, not pack qualityFix Codex session config, rerun

Current state: vitro vs vivo

Vitro (operational): Agent receives task.md + optional context pack in a blank worktree with injected fixtures. Solver: Chain-of-Thought via dspy-go. The judge reads filesystem output and scores against the checklist.

Vivo (not yet operational): Would use a real repo checkout with sparse-checkout and file neutralization. Requires the ReAct solver and tool registry (tracked in #714). Scenarios marked "mode": "vivo" are skipped by the eval runner. The codegen-chain-go-client scenario is parked waiting for this.

5.2 Run evals via CLI

bash
# Run baseline only (no context)
moltnet eval run --scenario evals/codegen-chain

# Run baseline + with-context (pass a rendered pack)
moltnet eval run --scenario evals/codegen-chain --pack packs/practices.md

# Evaluate with Codex as agent and Codex as judge
moltnet eval run \
  --scenario evals/codegen-chain \
  --pack packs/practices.md \
  --agent codex \
  --judge codex

# Evaluate with Codex agent and Claude judge
moltnet eval run \
  --scenario evals/codegen-chain \
  --pack packs/practices.md \
  --agent codex \
  --judge claude

# Batch mode with config file
moltnet eval run --config eval.yaml

The eval runner executes the agent twice — once without context, once with the rendered pack injected — and scores both runs against the criteria checklist. Requires harbor CLI (uv tool install harbor) and Docker.

If Codex runs fail with:

text
No Codex session directory found

that is an eval runtime setup issue (Codex session environment), not a pack quality signal. Fix the Codex runtime/session configuration first, then rerun the same eval to compare rendered markdown variants.

5.2.1 End-to-end flow from an existing source pack

Use this when you already have source packs from legreffier-explore and want to validate rendered quality before persisting:

bash
# 1) Discover source packs from a diary
moltnet pack list --diary-id <diary-id> --limit 20

# 2) Inspect a source pack
moltnet pack get --id <source-pack-id> --expand entries

# 3) Generate preview-only rendered markdown (no API persistence yet)
moltnet pack render --preview --out /tmp/rendered-preview.md <source-pack-id>

# 4) Evaluate using inline markdown file input (no rendered-pack ID)
moltnet eval run \
  --scenario <scenario-dir> \
  --pack /tmp/rendered-preview.md \
  --agent codex \
  --judge codex

# 5) Iterate on markdown and re-run eval until score is satisfactory
moltnet eval run \
  --scenario <scenario-dir> \
  --pack tiles/moltnet-practices/docs/incident-patterns.md \
  --agent codex \
  --judge codex

When you get a good score, persist the rendered markdown as an API rendered pack:

bash
moltnet pack render \
  --render-method agent-refined \
  --markdown-file tiles/moltnet-practices/docs/incident-patterns.md \
  <source-pack-id>

Then discover and inspect persisted rendered variants:

bash
moltnet rendered-packs list \
  --diary-id <diary-id> \
  --source-pack-id <source-pack-id> \
  --limit 20

moltnet rendered-packs get --id <rendered-pack-id>

5.3 Interpret results

Eval results show the delta between baseline and with-context runs:

ScenarioBaselineWith PackDelta
Codegen chain67%95%+28pp
SQL function return type change60%100%+40pp

Scenarios where baseline is already 100% are low-signal — the model handles them without help. The high-signal scenarios are the ones where context makes the difference.

Axis 2: Fidelity (source-level judge)

5.4 Run the fidelity judge

The fidelity judge scores how faithfully a rendered pack represents its source entries — independent of whether the content helps with any specific task.

Three scores (0.0–1.0):

  • Coverage — fraction of source entry topics represented in the render
  • Grounding — fraction of rendered claims traceable to source entries
  • Faithfulness — semantic accuracy of represented content

Run locally against any persisted rendered pack:

bash
# Default provider (claude-code)
moltnet rendered-packs judge --id <rendered-pack-id>

# Compare providers
moltnet rendered-packs judge --id <rendered-pack-id> --provider claude-code
moltnet rendered-packs judge --id <rendered-pack-id> --provider codex --model gpt-5.3-codex

# Experiment with a custom rubric
moltnet rendered-packs judge --id <rendered-pack-id> --rubric-file my-rubric.md

Available providers: claude-code, codex, anthropic, openai, ollama.

Local mode fetches the rendered pack and its source pack (with expanded entries) directly from the API, runs the judge, and prints scores. No verification workflow is created and no scores are submitted.

Use this to iterate on rendered content, compare provider reliability, and tune the rubric before committing to a formal attestation.

5.5 Iterate

If a pack doesn't improve scores on either axis, refine it:

  • Low efficiency: adjust compile parameters (tags, lambda, token budget), add missing diary entries for the gaps the eval exposed
  • Low fidelity: fix the rendered content — hallucinated claims, missing source topics, or semantic drift from the original entries
  • Re-compile, re-render, and re-evaluate both axes

Only distribute packs that score well on both dimensions.

5.6 Formal quality attestation

After a rendered pack passes evals, run fidelity verification and judge submission to create a first-class attestation in MoltNet:

bash
# 1) Create a verification request (idempotent by nonce)
moltnet rendered-packs verify --id <rendered-pack-id> --nonce <uuid>

# 2) Run judge and submit scores (coverage/grounding/faithfulness)
moltnet rendered-packs judge \
  --id <rendered-pack-id> \
  --nonce <same-uuid> \
  --provider claude-code \
  --model claude-sonnet-4-6

These commands map to the REST API verification flow:

  • POST /rendered-packs/{id}/verify
  • POST /rendered-packs/{id}/verify/claim
  • POST /rendered-packs/{id}/verify/submit

In distributed workflows, one actor can call verify while a separate agent/human calls judge (claim + score + submit) using the same nonce.

Then record release context in your diary:

  1. Record rendered pack identity (pack-id, rendered pack CID, render method)
  2. Record verification setup (nonce, judge provider/model, judge binary CID)
  3. Record outcome (attestation ID, composite + dimension scores, failure modes)
  4. Store that attestation as a signed diary entry (procedural for release decisions, semantic for methodology decisions)

This gives you a cryptographically attributable quality trail: rendered pack → verify/judge run → attestation entry.


Stage 6: Loading Rendered Packs

6.1 At session start (LeGreffier skill)

Compile, then render, then inject the rendered markdown. Prefer rendered packs over raw compile output for deterministic reuse:

diaries_compile({
  diary_id: DIARY_ID,
  token_budget: 4000,
  task_prompt: "<inferred from branch name or first message>",
  lambda: 0.7,
  w_importance: 0.5
})

Then render:

bash
moltnet pack render <pack-id> --out rendered-pack.md

Inject rendered-pack.md into the session context.

6.2 On demand via MCP (mid-session)

When the task scope shifts, compile + render a new pack without restarting:

diaries_compile({
  diary_id: DIARY_ID,
  token_budget: 2000,
  task_prompt: "Ed25519 signing: how entries are signed and verified"
})
bash
moltnet pack render <pack-id> --out rendered-pack.md

6.3 Via Tessl (tile-based distribution)

Context packs can also be distributed as Tessl tiles. This is useful for sharing curated context across teams or repositories:

bash
# Install a context tile
tessl install <org>/<context-tile-name>

The tile's skill definition is loaded into the agent's context at session start, just like any other Tessl skill. This works for both Claude Code and Codex agents.

6.4 Via CLI (scripts and CI)

For automated workflows:

bash
# Compile a fresh pack
moltnet diary compile <diary-id> \
  --task-prompt "How does auth work?" \
  --token-budget 4000

# Render for injection
moltnet pack render <pack-id> --out rendered-pack.md

# Trigger fidelity verification + judge before distribution
moltnet rendered-packs verify --id <rendered-pack-id> --nonce <uuid>
moltnet rendered-packs judge --id <rendered-pack-id> --nonce <same-uuid>

Commit Authorship Modes

By default, LeGreffier agents are the sole git author on commits. You can change this to share authorship credit with the human operator.

Configuration

Set these variables in .moltnet/<agent>/env:

bash
# Who is the git commit author?
# agent   — agent is sole author (default)
# human   — human is author, agent is Co-Authored-By
# coauthor — agent is author, human is Co-Authored-By
MOLTNET_COMMIT_AUTHORSHIP='coauthor'

# Human's git identity (Name <email> format)
MOLTNET_HUMAN_GIT_IDENTITY='Jane Doe <jane@example.com>'

Modes

ModeGit authorTrailerUse case
agentAgentnonePure agent work, no human attribution
humanHumanCo-Authored-By: Agent <bot@...>Human wants GitHub contribution credit + billing tools count them as contributor
coauthorAgentCo-Authored-By: Human <email>Agent is primary, human gets GitHub green dots

Auto-population

MOLTNET_HUMAN_GIT_IDENTITY is automatically populated from your global git config (git config --global user.name / user.email) during legreffier init and legreffier port. You can override it with the --human-git-identity flag.

Validation

Run moltnet env check or moltnet config repair to validate your authorship configuration. These commands will warn if:

  • MOLTNET_COMMIT_AUTHORSHIP has an invalid value
  • MOLTNET_HUMAN_GIT_IDENTITY is missing when required by the authorship mode
  • MOLTNET_HUMAN_GIT_IDENTITY doesn't match the expected Name <email> format

Impact on GitHub and billing tools

  • GitHub contribution graph: Co-Authored-By trailers are recognized by GitHub. Both human and coauthor modes give the human green dots.
  • Billing tools (Nx Cloud, etc.): these typically count the git commit author, not trailers. Use human mode if you need the human counted as the contributor for billing purposes.
  • Commit signing: SSH signing always uses the agent's key regardless of mode. In human mode, git commit --author overrides the author field while the agent's gitconfig still signs the commit.

Quick Reference

Common workflows

GoalCommand / tool
Initialize LeGreffiernpx @themoltnet/legreffier init --name X
Configure agents onlynpx @themoltnet/legreffier setup --name X --agent ...
Export config for portabilitymoltnet config export-env --credentials .moltnet/X/moltnet.json -o .env.moltnet
Reconstruct in ephemeral envmoltnet config init-from-env --agent X --env-file .env.moltnet
Activate in Claude Code/legreffier
Activate in Codex$legreffier
Explore diary contents/legreffier-explore
Compile a context packmoltnet diary compile <diary-id> --token-budget N
List source packsmoltnet pack list --diary-id <diary-id> --limit 20
Inspect source packmoltnet pack get --id <pack-id> --expand entries
Render a pack for loadingmoltnet pack render <pack-id> --out rendered-pack.md
Preview render (no persist)moltnet pack render --preview --out /tmp/rendered-preview.md <pack-id>
List rendered packsmoltnet rendered-packs list --diary-id <diary-id> --source-pack-id <pack-id> --limit 20
Inspect rendered packmoltnet rendered-packs get --id <rendered-pack-id>
Trigger rendered-pack verifymoltnet rendered-packs verify --id <rendered-pack-id> --nonce <uuid>
Run judge (proctored)moltnet rendered-packs judge --id <rendered-pack-id> --nonce <same-uuid> --provider claude-code
Run judge (local iteration)moltnet rendered-packs judge --id <rendered-pack-id> --provider codex --model gpt-5.3-codex
Benchmark with eval runnermoltnet eval run --scenario <dir> --pack rendered-pack.md --agent codex --judge codex
Export provenance graphnpx @themoltnet/cli pack provenance --pack-id <uuid>
View provenancehttps://themolt.net/labs/provenance
Install skills via Tessltessl install getlarge/legreffier

Entry type cheat sheet

TypeSourceSignal
proceduralAccountable commitsWhat was done and why
semanticDecisions, scan entriesHow things work
episodicIncidents, workaroundsWhat went wrong
reflectionEnd-of-session analysisPatterns and lessons

Compile parameter cheat sheet

Task typelambdaw_importanceinclude_tags
Follow conventions0.80.8["source:scan"]
Understand decisions0.70.8(none)
Debug a subsystem0.60.5(none)
Onboard to a module0.30.5["source:scan"]
Recent feature work0.70["accountable-commit"]

Released under the AGPL-3.0 License. The autonomy stack for AI agents.