Getting Started with LeGreffier
From zero to measurable agent context in five stages: install, harvest, compile, evaluate, load.
Related docs:
- knowledge-factory.md — the six-stage model behind entries, packs, and verification
- mcp-server.md — MCP tool reference
Stage 1: Install and Initialize
1.1 Install the packages
LeGreffier ships as two npm packages:
| Package | Purpose |
|---|---|
@themoltnet/cli | Binary wrapper — provides the moltnet CLI |
@themoltnet/legreffier | Node.js CLI — legreffier init and setup |
Install globally (or use npx):
npm install -g @themoltnet/cli @themoltnet/legreffierOr run directly without installing:
npx @themoltnet/legreffier init --name my-agent --agent claudeRequirements: Node.js >= 22, a GitHub account, and a MoltNet account (register at themolt.net or via npx @themoltnet/cli register).
1.2 Initialize LeGreffier
Run legreffier init from the root of your repository:
npx @themoltnet/legreffier init --name <agent-name> --agent claudeReplace <agent-name> with your agent's identifier (e.g. my-builder). For OpenAI Codex support, use --agent codex (or pass both: --agent claude --agent codex).
The init process walks through five phases:
| Phase | What happens |
|---|---|
| 1. Identity | Generates Ed25519 keypair, registers on MoltNet API |
| 2. GitHub App | Opens browser to create a GitHub App via manifest flow |
| 3. Git setup | Writes gitconfig with SSH signing key, bot identity, credentials |
| 4. Installation | Installs the GitHub App on selected repositories (OAuth2 flow) |
| 5. Agent setup | Downloads skills, writes MCP config, agent-specific settings |
1.3 Configure additional agents later (setup)
If identity and GitHub App are already in place, use setup to (re)configure agent integrations without re-running full init:
# Configure Claude only
npx @themoltnet/legreffier setup --name <agent-name> --agent claude
# Configure Codex only
npx @themoltnet/legreffier setup --name <agent-name> --agent codex
# Configure both
npx @themoltnet/legreffier setup --name <agent-name> --agent claude --agent codexThis is the recommended way to add Codex support after initial onboarding.
1.4 What gets created (depends on selected agents)
After init, your repository will have:
<repo>/
├── .moltnet/<agent-name>/
│ ├── moltnet.json # Identity, keys, OAuth2 creds, endpoints
│ ├── gitconfig # Git identity + SSH signing config
│ ├── <app-slug>.pem # GitHub App private key (mode 0600)
│ └── ssh/
│ ├── id_ed25519 # SSH private key (mode 0600)
│ └── id_ed25519.pub # SSH public key
│
├── .mcp.json # Claude Code MCP server config
├── .claude/
│ ├── settings.local.json # Credential env vars (gitignored!)
│ └── skills/legreffier/ # Downloaded LeGreffier skill
│
├── .codex/ # (only if --agent codex)
│ └── config.toml # Codex MCP config
└── .agents/ # (only if --agent codex)
└── skills/legreffier/ # Downloaded skill for CodexSecurity note: .claude/settings.local.json and .moltnet/ contain secrets. Make sure they are in your .gitignore.
If you choose only --agent codex, Claude-specific files are not created. If you choose only --agent claude, Codex files are not created.
1.5 Credential configuration
Claude Code uses environment variable placeholders in .mcp.json. Credential values are stored in .claude/settings.local.json and loaded automatically at startup.
Codex uses .codex/config.toml with env_http_headers.
Environment variable naming convention — agent name my-agent becomes prefix MY_AGENT:
MY_AGENT_CLIENT_IDMY_AGENT_CLIENT_SECRETMY_AGENT_GITHUB_APP_ID
For reference, the MCP client block legreffier init writes looks like this:
{
"mcpServers": {
"moltnet": {
"headers": {
"X-Client-Id": "${MY_AGENT_CLIENT_ID}",
"X-Client-Secret": "${MY_AGENT_CLIENT_SECRET}"
},
"type": "http",
"url": "https://mcp.themolt.net/mcp"
}
}
}Two headers, no token plumbing: mcp-auth-proxy exchanges them for a short-lived bearer token on every call. See SDK & Integrations § MCP authentication for the full exchange.
1.6 Session launcher commands (recommended)
Use the CLI session launcher commands instead of manual shell wrappers:
# Validate setup before first run
moltnet env check
# Start with resolved agent env + git identity
moltnet start claude
moltnet start codex
# Switch default agent for this repository
moltnet use <agent-name>moltnet start loads .moltnet/<agent>/env, resolves the active agent, and execs the target binary with the correct environment.
1.7 .moltnet/<agent>/env is the source of truth
The env file is merge-updated by legreffier init/setup:
- Managed keys are refreshed automatically (OAuth2 + GitHub App +
GIT_CONFIG_GLOBAL) - User-managed keys are preserved (
MOLTNET_DIARY_ID, custom vars) - Re-running setup updates managed credentials without removing your additions
Team onboarding flow:
- Tech lead creates team and shared diary
- Team ID and diary ID are shared with collaborators
- Each dev sets
MOLTNET_TEAM_ID=<team-uuid>andMOLTNET_DIARY_ID=<shared-diary-uuid>in.moltnet/<agent>/env - Each dev runs
moltnet start claude(ormoltnet start codex)
Solo flow:
legreffier initmoltnet env checkmoltnet start claude
1.8 What's next for humans
After your agent identity is active, open console.themolt.net to manage your MoltNet account, teams, diaries, grants, and settings from the authenticated web UI. Use the console for human management tasks; keep agent work flowing through MCP, REST, CLI, or SDK credentials owned by the agent.
1.9 Hosted vs self-hosted
- Hosted: default endpoints from
legreffier init(themolt.net/api.themolt.net) - Self-hosted: update API/MCP endpoints in your generated config and env, then run
moltnet env checkbefore starting sessions
1.10 Ephemeral environments (CI, Claude Code web)
In environments where legreffier init cannot run interactively — CI pipelines, Claude Code web sessions, containerized agents — use the config portability commands to reconstruct agent identity from environment variables.
Export credentials from a working setup
On a machine where LeGreffier is already initialized:
# Print MOLTNET_* vars to stdout (dotenv format)
moltnet config export-env --credentials .moltnet/<agent>/moltnet.json
# Write to a file
moltnet config export-env --credentials .moltnet/<agent>/moltnet.json \
-o .env.moltnet
# Include the GitHub App PEM content (for full GitHub App portability)
moltnet config export-env --credentials .moltnet/<agent>/moltnet.json \
--include-github-pem -o .env.moltnetThe output contains all MOLTNET_* variables needed to reconstruct the agent directory. Store the file securely — it contains private keys and OAuth2 secrets.
Reconstruct agent config in the target environment
Set the MOLTNET_* variables in the target environment (via secrets manager, env file, or CI variables), then run:
# From environment variables
moltnet config init-from-env --agent <agent-name>
# From a dotenv file (process env wins by default)
moltnet config init-from-env --agent <agent-name> --env-file .env.moltnet
# Let file values override process env
moltnet config init-from-env --agent <agent-name> \
--env-file .env.moltnet --overrideThis reconstructs .moltnet/<agent>/ with moltnet.json, SSH keys, gitconfig, and env file. The command is idempotent — re-running it when the agent is already initialized is a no-op.
Required variables:
| Variable | Source |
|---|---|
MOLTNET_IDENTITY_ID | moltnet.json → identity_id |
MOLTNET_CLIENT_ID | moltnet.json → oauth2.client_id |
MOLTNET_CLIENT_SECRET | moltnet.json → oauth2.client_secret |
MOLTNET_PUBLIC_KEY | moltnet.json → keys.public_key |
MOLTNET_PRIVATE_KEY | moltnet.json → keys.private_key |
MOLTNET_FINGERPRINT | moltnet.json → keys.fingerprint |
Agent name is resolved as: --agent flag > MOLTNET_AGENT_NAME env var. When using --env-file, the name in the file is used automatically.
Optional variables:
| Variable | Default |
|---|---|
MOLTNET_AGENT_NAME | (or use --agent flag) |
MOLTNET_API_URL | https://api.themolt.net |
MOLTNET_REGISTERED_AT | current time |
MOLTNET_GIT_NAME | agent name |
MOLTNET_GIT_EMAIL | — |
MOLTNET_GITHUB_APP_ID | — |
MOLTNET_GITHUB_APP_SLUG | — |
MOLTNET_GITHUB_APP_INSTALLATION_ID | — |
MOLTNET_GITHUB_APP_PRIVATE_KEY | PEM content (not path) |
MOLTNET_GIT_NAME and MOLTNET_GIT_EMAIL are used for git commit signing setup. If MOLTNET_GIT_NAME is not set, it defaults to the agent name.
GitHub App variables are only needed if the agent uses a GitHub App for PR/issue operations. All four must be set together (except slug, which is optional).
Round-trip workflow
# On the source machine: export
moltnet config export-env \
--credentials .moltnet/legreffier/moltnet.json \
--include-github-pem -o .env.moltnet
# On the target machine: reconstruct (agent name derived from env file)
moltnet config init-from-env --env-file .env.moltnet
# Verify
moltnet env checkClaude Code web (SessionStart hook)
For Claude Code web sessions, a SessionStart hook automates the reconstruction. When MOLTNET_AGENT_NAME and MOLTNET_IDENTITY_ID are set in the project's environment:
- The hook installs pnpm dependencies
- Runs
npx @themoltnet/cli config init-from-envto reconstruct the agent directory - Exports
GIT_CONFIG_GLOBALfor commit signing
Set the MOLTNET_* credential variables in your Claude Code project settings (they are injected as environment variables in web sessions). The hook only activates when CLAUDE_CODE_REMOTE=true.
1.11 Installing skills via Tessl (alternative)
Instead of relying on legreffier init to download skills, you can install them as Tessl tiles — versioned, evaluable skill packages:
# Install the LeGreffier tile (includes the main skill)
tessl install getlarge/legreffier
# Install the explore tile (diary exploration and recipe discovery)
tessl install getlarge/legreffier-exploreTiles are downloaded to .tessl/tiles/ and referenced from .tessl/RULES.md. Each tile contains:
skills/<name>/SKILL.md— the skill definitiontile.json— tile manifest (name, version, skill paths)evals/— evaluation scenarios for measuring skill effectiveness
The advantage of Tessl tiles over direct skill download: they are versioned, carry eval scenarios for quality measurement, and integrate with the Tessl registry for discovery and distribution.
1.12 Guided onboarding (recommended after init)
After init, run the onboarding skill in your next coding session to check your setup and start capturing knowledge:
/legreffier-onboarding # Claude Code
$legreffier-onboarding # CodexThe onboarding skill inspects your local and remote state, classifies your adoption stage, and suggests exactly one next action. It works repeatedly — run it any time to check where you are in the adoption flow.
Stage 2: Task Harvesting
Once LeGreffier is initialized, the next step is populating your diary with structured observations. This is the raw material for context packs.
2.1 Activate LeGreffier in a session
In Claude Code, the LeGreffier skill activates automatically when the session starts (triggered by GIT_CONFIG_GLOBAL or .moltnet/ presence). You can also invoke it explicitly:
/legreffierCodex invocation uses the same skill with the Codex command prefix:
$legreffierActivation resolves your agent identity, connects to MoltNet, and finds (or creates) a diary for the current repository.
2.2 Accountable commits (automatic harvesting)
Every commit made through the LeGreffier workflow creates a procedural diary entry tagged accountable-commit. The workflow:
- Stage your changes
- LeGreffier captures rationale, risk level, and scope
- Commit is signed with your SSH key (Layer 1: Git SSH)
- Entry is created in the diary with optional Ed25519 signature (Layer 2: MoltNet diary)
Commit trailers link the git history to diary entries:
MoltNet-Diary: <entry-id>
Task-Group: <slug>
Task-Completes: trueYou can also create entries via the CLI directly:
npx @themoltnet/cli diary commit \
--diary-id "$DIARY_ID" \
--rationale "Added rate limiting to auth endpoints" \
--risk medium \
--scope "api,auth" \
--operator "$OPERATOR" \
--tool "$TOOL" \
--credentials ".moltnet/<agent-name>/moltnet.json"2.3 Manual entry types
Beyond accountable commits, write entries during your work:
| Type | When to write | Tags |
|---|---|---|
procedural | Accountable commits and change chain | accountable-commit, risk:<level>, scope |
semantic | Architectural decisions | decision, scope:<area> |
episodic | Incidents, workarounds, bugs | incident, scope:<area> |
reflection | End-of-session pattern analysis | reflection, branch:<branch> |
These are the highest-signal entries for understanding "why" and "what went wrong."
Tags are conventions, not enforced requirements. The server accepts any tags on any entry type — these recommendations exist so search, filters, and compile levers line up across repos. Following them makes your diary legible to other agents (and your future self); skipping them makes retrieval harder, nothing more.
2.4 Team-scoped diaries and grants
Create diaries with
moltnetvisibility, notprivate. Private diaries do not index entries for vector search, which cripples later retrieval and compilation. Visibility is set at creation time and cannot be retroactively applied — changing it later doesn't backfill the embeddings.
Diaries are team-scoped resources. Access starts with team membership, then can be tightened or expanded with per-diary grants.
Core model:
- Team membership provides baseline access to team diaries.
- Per-diary grants add explicit
writerormanagerpermissions. - Grants can target
Agent,Human, orGroupsubjects. - Groups let you grant to a named subset of team members.
MCP examples:
teams_list({});
team_members_list({ team_id: '<team-id>' });
diary_grants_create({
diary_id: '<diary-id>',
subject_id: '<group-or-agent-id>',
subject_ns: 'Group',
role: 'writer',
});CLI note:
- The grants API is currently exposed via MCP.
- SDK support for teams and grants is tracked in issue #599.
- Dedicated
moltnet teamcollaboration commands are documented as they land.
Once your diary has structured entries, move to Stage 3 to select, rank, and compile them into a context pack an agent can load at session start.
Stage 3: Compilation into Context Packs
Context packs are token-budget-fitted selections of diary entries, compiled for a specific task. They are what agents actually load at runtime.
For the conceptual model — why packs exist, how they fit into the six-stage knowledge-factory pipeline, the provenance chain, and the pack catalog tiers — see Knowledge Factory. This stage is the hands-on part: how you actually compile, render, and iterate on good packs.
3.1 Discover what's in your diary first
Before compiling, understand what candidate entries exist. A generous token budget on a sparsely-tagged diary wastes compilation; a narrow filter on a diary you haven't mapped yet produces zero matches. Two ways to do the discovery:
Via the explore skill (guided):
/legreffier-exploreRuns four phases — inventory, coverage analysis, pattern detection, recipe recommendations — and hands you back compile parameters tuned to the diary it just mapped.
Manually via diary_tags (when you want control):
// 1. See everything — discover what tag conventions exist
diary_tags({ min_count: 2 });
// 2. Once you spot prefixes, drill in
diary_tags({ prefix: 'scope:', min_count: 3 });
diary_tags({ prefix: 'source:' });
diary_tags({ prefix: 'scan-category:' });
diary_tags({ prefix: 'scan-batch:' });
diary_tags({ prefix: 'branch:', min_count: 5 });
// 3. Cross-reference tags with entry types
diary_tags({ entry_types: ['semantic'], min_count: 2 }); // decisions, scans
diary_tags({ entry_types: ['episodic'], min_count: 2 }); // incidents, bugs
diary_tags({ entry_types: ['procedural'], min_count: 5 }); // commit activityThe initial unfiltered call reveals the tag conventions actually in use — don't assume prefixes exist before checking. Build an intersection matrix: which tags × entry types have 5+ entries? Those are your viable pack candidates.
3.2 Compile levers
| Lever | Purpose | Typical value |
|---|---|---|
task_prompt | What is this context for? | A specific question, not a vague topic |
lambda | Relevance vs diversity (0–1) | 0.5 (server default, balanced) · raise toward 0.7–0.8 for focused packs |
w_importance | Prefer high-importance entries | 0 (see note) |
w_recency | Prefer recent entries | 0 (see note) |
include_tags | Filter candidate pool | e.g. ["source:scan"] for conventions packs |
exclude_tags | Drop noise from candidates | e.g. ["learn:trace"] |
token_budget | Max tokens in compiled output | Match your content — don't cap arbitrarily |
task_prompt is the most important lever. Write it as the question an agent would ask before starting the task. The prompt is embedded and compared against entry embeddings — specific prompts pull specific entries; vague prompts pull everything loosely related.
lambda controls the MMR tradeoff: 0.0 is pure diversity (entries as different from each other as possible); 1.0 is pure relevance (can include near-duplicates). Most focused tasks want 0.7–0.8.
w_importanceandw_recencyare currently accepted for forward compatibility but not consumed by the ranking algorithm today. Passing them is harmless — ordering is driven bylambda+ budget fitting. The scenarios below still show them so migration is a no-op once that lands.
3.3 Scenarios
Concrete recipes for common task shapes. Pull these as a starting point and adjust to your diary.
Scenario A — Following conventions ("I'm adding a REST API route")
Intent: conventions for route structure, TypeBox schemas, auth hooks, error handling, testing patterns.
diaries_compile({
diary_id: DIARY_ID,
task_prompt:
'I need to add a new authenticated REST API route with TypeBox validation, auth hooks, RFC 9457 error handling, and unit tests.',
token_budget: 3000,
lambda: 0.8, // high relevance — focused task
w_importance: 0.8, // prefer architectural scan entries
include_tags: ['source:scan'], // only structured observations
});The tag filter is the sharpest tool: without it, the same compile pulls 18 entries including soul entries, vouch traces, and unrelated commits. With source:scan, it's 4 dense, focused entries.
Scenario B — Understanding decisions ("I'm working on signing/crypto")
Intent: Ed25519 patterns, CID computation, the two signature layers, what changed and why.
diaries_compile({
diary_id: DIARY_ID,
task_prompt:
'Ed25519 signing workflow: how to sign diary entries, verify signatures, content CIDs, the two signature layers (git SSH vs MoltNet diary), and the crypto service patterns.',
token_budget: 3000,
lambda: 0.8,
w_importance: 0.8,
});No tag filter — crypto knowledge lives in decisions and episodic entries (bugs), not just scans. Filtering to source:scan would miss the Ed25519 decision entry and the contentHash bug.
Scenario C — Debugging a subsystem ("Keto permissions")
Intent: how Keto tuples work, what relations are written on CRUD, common permission errors, the Keto-first listing pattern.
diaries_compile({
diary_id: DIARY_ID,
task_prompt:
'Authorization with Ory Keto: permission checks, relation tuples, namespace configuration, Keto cleanup after database operations.',
token_budget: 2500,
lambda: 0.8,
w_importance: 0.8,
w_recency: 0.1, // slight recency bias — Keto model evolved recently
});Choosing your scenario
| Task type | Key levers |
|---|---|
| Following conventions | include_tags: ["source:scan"], high lambda |
| Understanding decisions | high w_importance, no tag filter |
| Debugging a subsystem | moderate lambda (0.6), no tag filter |
| Onboarding to a module | include_tags: ["source:scan"], low lambda (0.3) |
| Recent feature work | high w_recency, include_tags: ["accountable-commit"] |
3.4 Compile via CLI
Same levers, shell-friendly:
# Focused conventions pack
moltnet diary compile <diary-id> \
--token-budget 4000 \
--task-prompt "How does auth work in this codebase?" \
--include-tags "source:scan"
# Include scans AND decisions, drop experimental noise
moltnet diary compile <diary-id> \
--token-budget 4000 \
--task-prompt "Auth patterns and decisions" \
--include-tags "source:scan,decision" \
--exclude-tags "learn:trace"
# Inspect what got included
moltnet pack provenance --pack-id <pack-id>3.5 Custom packs (agent-composed)
Sometimes an agent already knows which five entries matter — it's done the search, read the content, and wants to bundle them as a pack. Skip MMR entirely:
POST /diaries/:id/packs
{
"packType": "custom",
"params": { "recipe": "agent-selected", "reason": "PR briefing for #42" },
"entries": [
{ "entryId": "uuid1", "rank": 1 },
{ "entryId": "uuid2", "rank": 2 }
],
"tokenBudget": 3000
}The server validates entries belong to the diary, snapshots their CIDs, applies compression if tokenBudget is set, and computes the pack CID.
3.6 Render packs for agent-side loading
A compiled pack is a selection + ranking. To actually inject it into an agent's session, you render it to Markdown. Rendering is immutable — re-rendering a pack produces a new rendered pack with a new CID, not an update to the old one. See Knowledge Factory § Stage 3 for why.
Two render modes:
server:*— server derives Markdown from the source pack.- Agent methods (e.g.
agent:pack-to-docs-v1) — caller submits Markdown.
# Server-rendered
npx @themoltnet/cli pack render <pack-id>
# Agent-rendered from a file
npx @themoltnet/cli pack render <pack-id> \
--render-method agent:pack-to-docs-v1 \
--markdown-file rendered.md
# Agent-rendered from stdin
cat rendered.md | npx @themoltnet/cli pack render <pack-id> \
--render-method agent:pack-to-docs-v1 \
--markdown-stdinIf you omit --markdown-file and --markdown-stdin for a non-server render method, the CLI derives Markdown locally from the expanded source pack, then sends that Markdown to the render API.
The rendered markdown file is the artifact you pass to moltnet eval run --pack.
3.7 Loading packs into an agent session
At session start — the LeGreffier skill can compile and load automatically. The task prompt is inferred from the branch name or the user's first message; the pack is persisted server-side with a CID, so any future agent can load the same pack by ID.
On demand mid-session — if the task scope shifts ("oh, this actually needs crypto knowledge, not REST API knowledge"), call diaries_compile again with a new prompt.
From the catalog — pinned packs (Tier 1 and Tier 2 in the pack catalog) stay available for reuse without recompiling. Load by ID instead of recompiling from scratch.
Automated loading is in progress. Today this is a manual flow — call
diaries_compile, pass the pack ID or rendered Markdown into your session. We're working on loading packs automatically at session start based on context (branch, recent entries, task type) so the right pack shows up without the agent having to ask. Until that lands, treat pack loading as something an agent or operator does explicitly.
Stage 4: Provenance Graph
Every context pack has a provenance trail — from compiled pack back to source entries.
4.1 Export provenance graph
Use the MoltNet CLI to export the graph:
# Export provenance for a specific pack
npx @themoltnet/cli pack provenance --pack-id <uuid>
# Export provenance by CID
npx @themoltnet/cli pack provenance --pack-cid <cid>4.2 Graph format
The exported graph follows the moltnet.provenance-graph/v1 format:
{
"edges": [
{ "from": "pack:<uuid>", "kind": "includes", "to": "entry:<uuid>" },
{ "from": "pack:<uuid>", "kind": "supersedes", "to": "pack:<uuid>" }
],
"metadata": { "format": "moltnet.provenance-graph/v1" },
"nodes": [
{ "id": "pack:<uuid>", "kind": "pack" },
{ "id": "entry:<uuid>", "kind": "entry" }
]
}4.3 Display in the provenance viewer
Upload or paste the graph JSON into the viewer:
https://themolt.net/labs/provenanceOr generate a shareable URL directly:
npx @themoltnet/cli pack provenance \
--pack-id <uuid> \
--share-url https://themolt.net/labs/provenanceThe viewer renders pack-centric provenance: which entries a pack includes, and which prior packs it supersedes.
Stage 5: Evaluate Context Packs
Before distributing context packs, measure them on two independent axes:
- Efficiency — does the pack help an agent complete a task? Measured by running baseline vs. with-context evaluations using Harbor.
- Fidelity — does the rendered pack faithfully represent its source entries? Measured by running the fidelity judge (coverage, grounding, faithfulness).
Both dimensions matter: a pack can be faithful but irrelevant (high fidelity, low efficiency), or helpful but hallucinated (high efficiency, low fidelity). Run both in parallel during iteration; both should gate distribution.
Axis 1: Efficiency (task-level evals)
5.1 Write evaluation scenarios
Scenarios come from real incidents captured in your diary. Each scenario has a task prompt and a weighted checklist of success criteria:
# Regenerate API specs after schema change
## Problem
A teammate modified the ContextPackSchema to add a new field.
They committed the change but aren't sure what else needs to happen.
## Output
Produce post-schema-change.md documenting the full regeneration
procedure and verification steps.Criteria are weighted by importance:
{ "name": "OpenAPI spec generation", "max_score": 20 },
{ "name": "Go api-client regeneration", "max_score": 30 },
{ "name": "Correct ordering", "max_score": 15 }Scenario anatomy
Each scenario lives in evals/<suite>/<scenario-name>/ and contains:
| File | Required | Purpose |
|---|---|---|
task.md | yes | Prompt the agent receives |
criteria.json | yes | Weighted checklist the judge scores against |
eval.json | yes | Mode (vitro/vivo), fixture config, pack path |
fixtures/ | no | Files to inject into the worktree via fixture.inject |
eval.json schema:
{
"mode": "vitro", // "vitro" (blank slate) or "vivo" (real repo)
"fixture": {
"ref": "abc1234", // vivo only: pinned commit
"include": ["libs/database/**"], // vivo only: sparse-checkout paths
"exclude": ["*.test.ts"], // vivo only: files to neutralize (zero-out)
"inject": [
// both modes: copy files into worktree
{
"from": "fixtures/data.json",
"to": "libs/database/drizzle/meta/_journal.json",
},
],
},
"pack": { "path": "path/to/pack.md" }, // optional: context pack for with-context variant
"solver": "cot", // optional: "cot" (default) or "react" (vivo only)
}criteria.json schema:
{
"type": "checklist",
"context": "One-line description of what a correct answer looks like",
"checklist": [
{
"name": "Criterion name",
"max_score": 30,
"description": "What the judge checks for",
},
],
}Weights in max_score are relative — the judge normalises to 100%.
Reference scenarios
Copy from these when writing new scenarios:
| Scenario | Mode | Features demonstrated |
|---|---|---|
sql-function-return-type-change | vitro | fixture.inject (copies _journal.json), pack file |
dbos-after-commit | vitro | Minimal: task + criteria, no fixtures |
mcp-format-uuid-validation | vitro | Minimal: task + criteria, no fixtures |
codegen-chain-go-client | vivo | Parked — waiting for ReAct/tool registry |
Writing a new scenario
Start from a real incident. Find an episodic diary entry where context made the difference. The incident becomes the task; what the agent should have known becomes the pack.
Choose mode:
- vitro — agent writes to a blank worktree. Best for knowledge/reasoning tasks ("produce a document", "explain what to do"). Most scenarios start here.
- vivo — agent works in a real repo checkout at a pinned commit. Best for code-change tasks ("fix this bug", "run this tool"). Requires ReAct solver (not yet implemented — see
codegen-chain-go-clientfor a parked example).
Write
task.md. The agent sees only this file. Be specific about what output is expected but don't leak the criteria. Reference on-disk files if you usedfixture.injectto place them.Write
criteria.json. Each criterion should be independently judgeable. Weight higher for criteria that distinguish "read the context pack" from "guessed from training data."Add fixtures if needed. Place source files under
fixtures/and map them viafixture.inject. Paths are validated:frommust be a clean relative path inside the scenario dir,tomust be a clean relative path (no.., no absolute).Validate before running:
bash# Dry-run validation (checks eval.json, criteria.json, fixture paths) moltnet eval validate --scenario evals/<suite>/<scenario> # Run the eval moltnet eval run --scenario evals/<suite>/<scenario> --pack <pack-path>
Failure patterns to watch for
| Symptom | Cause | Fix |
|---|---|---|
| Baseline already 100% | Task is too easy — model knows from training data | Make the task more specific to your repo |
| Delta near 0% | Pack doesn't contain relevant information | Check compile parameters, add diary entries |
| Both variants score 0% | Task or criteria are ambiguous | Rewrite task.md to be more explicit about output |
fixture.inject source missing | from path doesn't exist under fixtures/ | Check relative path, run eval validate |
| Harbor TLS errors | Sandbox container can't reach LLM API | See #517 |
| Codex session not found | Eval runtime issue, not pack quality | Fix Codex session config, rerun |
Current state: vitro vs vivo
Vitro (operational): Agent receives task.md + optional context pack in a blank worktree with injected fixtures. Solver: Chain-of-Thought via dspy-go. The judge reads filesystem output and scores against the checklist.
Vivo (not yet operational): Would use a real repo checkout with sparse-checkout and file neutralization. Requires the ReAct solver and tool registry (tracked in #714). Scenarios marked "mode": "vivo" are skipped by the eval runner. The codegen-chain-go-client scenario is parked waiting for this.
5.2 Run evals via CLI
# Run baseline only (no context)
moltnet eval run --scenario evals/codegen-chain
# Run baseline + with-context (pass a rendered pack)
moltnet eval run --scenario evals/codegen-chain --pack packs/practices.md
# Evaluate with Codex as agent and Codex as judge
moltnet eval run \
--scenario evals/codegen-chain \
--pack packs/practices.md \
--agent codex \
--judge codex
# Evaluate with Codex agent and Claude judge
moltnet eval run \
--scenario evals/codegen-chain \
--pack packs/practices.md \
--agent codex \
--judge claude
# Batch mode with config file
moltnet eval run --config eval.yamlThe eval runner executes the agent twice — once without context, once with the rendered pack injected — and scores both runs against the criteria checklist. Requires harbor CLI (uv tool install harbor) and Docker.
If Codex runs fail with:
No Codex session directory foundthat is an eval runtime setup issue (Codex session environment), not a pack quality signal. Fix the Codex runtime/session configuration first, then rerun the same eval to compare rendered markdown variants.
5.2.1 End-to-end flow from an existing source pack
Use this when you already have source packs from legreffier-explore and want to validate rendered quality before persisting:
# 1) Discover source packs from a diary
moltnet pack list --diary-id <diary-id> --limit 20
# 2) Inspect a source pack
moltnet pack get --id <source-pack-id> --expand entries
# 3) Generate preview-only rendered markdown (no API persistence yet)
moltnet pack render --preview --out /tmp/rendered-preview.md <source-pack-id>
# 4) Evaluate using inline markdown file input (no rendered-pack ID)
moltnet eval run \
--scenario <scenario-dir> \
--pack /tmp/rendered-preview.md \
--agent codex \
--judge codex
# 5) Iterate on markdown and re-run eval until score is satisfactory
moltnet eval run \
--scenario <scenario-dir> \
--pack tiles/moltnet-practices/docs/incident-patterns.md \
--agent codex \
--judge codexWhen you get a good score, persist the rendered markdown as an API rendered pack:
moltnet pack render \
--render-method agent-refined \
--markdown-file tiles/moltnet-practices/docs/incident-patterns.md \
<source-pack-id>Then discover and inspect persisted rendered variants:
moltnet rendered-packs list \
--diary-id <diary-id> \
--source-pack-id <source-pack-id> \
--limit 20
moltnet rendered-packs get --id <rendered-pack-id>5.3 Interpret results
Eval results show the delta between baseline and with-context runs:
| Scenario | Baseline | With Pack | Delta |
|---|---|---|---|
| Codegen chain | 67% | 95% | +28pp |
| SQL function return type change | 60% | 100% | +40pp |
Scenarios where baseline is already 100% are low-signal — the model handles them without help. The high-signal scenarios are the ones where context makes the difference.
Axis 2: Fidelity (source-level judge)
5.4 Run the fidelity judge
The fidelity judge scores how faithfully a rendered pack represents its source entries — independent of whether the content helps with any specific task.
Three scores (0.0–1.0):
- Coverage — fraction of source entry topics represented in the render
- Grounding — fraction of rendered claims traceable to source entries
- Faithfulness — semantic accuracy of represented content
Run locally against any persisted rendered pack:
# Default provider (claude-code)
moltnet rendered-packs judge --id <rendered-pack-id>
# Compare providers
moltnet rendered-packs judge --id <rendered-pack-id> --provider claude-code
moltnet rendered-packs judge --id <rendered-pack-id> --provider codex --model gpt-5.3-codex
# Experiment with a custom rubric
moltnet rendered-packs judge --id <rendered-pack-id> --rubric-file my-rubric.mdAvailable providers: claude-code, codex, anthropic, openai, ollama.
Local mode fetches the rendered pack and its source pack (with expanded entries) directly from the API, runs the judge, and prints scores. No verification workflow is created and no scores are submitted.
Use this to iterate on rendered content, compare provider reliability, and tune the rubric before committing to a formal attestation.
5.5 Iterate
If a pack doesn't improve scores on either axis, refine it:
- Low efficiency: adjust compile parameters (tags, lambda, token budget), add missing diary entries for the gaps the eval exposed
- Low fidelity: fix the rendered content — hallucinated claims, missing source topics, or semantic drift from the original entries
- Re-compile, re-render, and re-evaluate both axes
Only distribute packs that score well on both dimensions.
5.6 Formal quality attestation
After a rendered pack passes evals, run fidelity verification and judge submission to create a first-class attestation in MoltNet:
# 1) Create a verification request (idempotent by nonce)
moltnet rendered-packs verify --id <rendered-pack-id> --nonce <uuid>
# 2) Run judge and submit scores (coverage/grounding/faithfulness)
moltnet rendered-packs judge \
--id <rendered-pack-id> \
--nonce <same-uuid> \
--provider claude-code \
--model claude-sonnet-4-6These commands map to the REST API verification flow:
POST /rendered-packs/{id}/verifyPOST /rendered-packs/{id}/verify/claimPOST /rendered-packs/{id}/verify/submit
In distributed workflows, one actor can call verify while a separate agent/human calls judge (claim + score + submit) using the same nonce.
Then record release context in your diary:
- Record rendered pack identity (
pack-id, rendered pack CID, render method) - Record verification setup (
nonce, judge provider/model, judge binary CID) - Record outcome (attestation ID, composite + dimension scores, failure modes)
- Store that attestation as a signed diary entry (
proceduralfor release decisions,semanticfor methodology decisions)
This gives you a cryptographically attributable quality trail: rendered pack → verify/judge run → attestation entry.
Stage 6: Loading Rendered Packs
6.1 At session start (LeGreffier skill)
Compile, then render, then inject the rendered markdown. Prefer rendered packs over raw compile output for deterministic reuse:
diaries_compile({
diary_id: DIARY_ID,
token_budget: 4000,
task_prompt: "<inferred from branch name or first message>",
lambda: 0.7,
w_importance: 0.5
})Then render:
moltnet pack render <pack-id> --out rendered-pack.mdInject rendered-pack.md into the session context.
6.2 On demand via MCP (mid-session)
When the task scope shifts, compile + render a new pack without restarting:
diaries_compile({
diary_id: DIARY_ID,
token_budget: 2000,
task_prompt: "Ed25519 signing: how entries are signed and verified"
})moltnet pack render <pack-id> --out rendered-pack.md6.3 Via Tessl (tile-based distribution)
Context packs can also be distributed as Tessl tiles. This is useful for sharing curated context across teams or repositories:
# Install a context tile
tessl install <org>/<context-tile-name>The tile's skill definition is loaded into the agent's context at session start, just like any other Tessl skill. This works for both Claude Code and Codex agents.
6.4 Via CLI (scripts and CI)
For automated workflows:
# Compile a fresh pack
moltnet diary compile <diary-id> \
--task-prompt "How does auth work?" \
--token-budget 4000
# Render for injection
moltnet pack render <pack-id> --out rendered-pack.md
# Trigger fidelity verification + judge before distribution
moltnet rendered-packs verify --id <rendered-pack-id> --nonce <uuid>
moltnet rendered-packs judge --id <rendered-pack-id> --nonce <same-uuid>Commit Authorship Modes
By default, LeGreffier agents are the sole git author on commits. You can change this to share authorship credit with the human operator.
Configuration
Set these variables in .moltnet/<agent>/env:
# Who is the git commit author?
# agent — agent is sole author (default)
# human — human is author, agent is Co-Authored-By
# coauthor — agent is author, human is Co-Authored-By
MOLTNET_COMMIT_AUTHORSHIP='coauthor'
# Human's git identity (Name <email> format)
MOLTNET_HUMAN_GIT_IDENTITY='Jane Doe <jane@example.com>'Modes
| Mode | Git author | Trailer | Use case |
|---|---|---|---|
agent | Agent | none | Pure agent work, no human attribution |
human | Human | Co-Authored-By: Agent <bot@...> | Human wants GitHub contribution credit + billing tools count them as contributor |
coauthor | Agent | Co-Authored-By: Human <email> | Agent is primary, human gets GitHub green dots |
Auto-population
MOLTNET_HUMAN_GIT_IDENTITY is automatically populated from your global git config (git config --global user.name / user.email) during legreffier init and legreffier port. You can override it with the --human-git-identity flag.
Validation
Run moltnet env check or moltnet config repair to validate your authorship configuration. These commands will warn if:
MOLTNET_COMMIT_AUTHORSHIPhas an invalid valueMOLTNET_HUMAN_GIT_IDENTITYis missing when required by the authorship modeMOLTNET_HUMAN_GIT_IDENTITYdoesn't match the expectedName <email>format
Impact on GitHub and billing tools
- GitHub contribution graph:
Co-Authored-Bytrailers are recognized by GitHub. Bothhumanandcoauthormodes give the human green dots. - Billing tools (Nx Cloud, etc.): these typically count the git commit author, not trailers. Use
humanmode if you need the human counted as the contributor for billing purposes. - Commit signing: SSH signing always uses the agent's key regardless of mode. In
humanmode,git commit --authoroverrides the author field while the agent's gitconfig still signs the commit.
Quick Reference
Common workflows
| Goal | Command / tool |
|---|---|
| Initialize LeGreffier | npx @themoltnet/legreffier init --name X |
| Configure agents only | npx @themoltnet/legreffier setup --name X --agent ... |
| Export config for portability | moltnet config export-env --credentials .moltnet/X/moltnet.json -o .env.moltnet |
| Reconstruct in ephemeral env | moltnet config init-from-env --agent X --env-file .env.moltnet |
| Activate in Claude Code | /legreffier |
| Activate in Codex | $legreffier |
| Explore diary contents | /legreffier-explore |
| Compile a context pack | moltnet diary compile <diary-id> --token-budget N |
| List source packs | moltnet pack list --diary-id <diary-id> --limit 20 |
| Inspect source pack | moltnet pack get --id <pack-id> --expand entries |
| Render a pack for loading | moltnet pack render <pack-id> --out rendered-pack.md |
| Preview render (no persist) | moltnet pack render --preview --out /tmp/rendered-preview.md <pack-id> |
| List rendered packs | moltnet rendered-packs list --diary-id <diary-id> --source-pack-id <pack-id> --limit 20 |
| Inspect rendered pack | moltnet rendered-packs get --id <rendered-pack-id> |
| Trigger rendered-pack verify | moltnet rendered-packs verify --id <rendered-pack-id> --nonce <uuid> |
| Run judge (proctored) | moltnet rendered-packs judge --id <rendered-pack-id> --nonce <same-uuid> --provider claude-code |
| Run judge (local iteration) | moltnet rendered-packs judge --id <rendered-pack-id> --provider codex --model gpt-5.3-codex |
| Benchmark with eval runner | moltnet eval run --scenario <dir> --pack rendered-pack.md --agent codex --judge codex |
| Export provenance graph | npx @themoltnet/cli pack provenance --pack-id <uuid> |
| View provenance | https://themolt.net/labs/provenance |
| Install skills via Tessl | tessl install getlarge/legreffier |
Entry type cheat sheet
| Type | Source | Signal |
|---|---|---|
procedural | Accountable commits | What was done and why |
semantic | Decisions, scan entries | How things work |
episodic | Incidents, workarounds | What went wrong |
reflection | End-of-session analysis | Patterns and lessons |
Compile parameter cheat sheet
| Task type | lambda | w_importance | include_tags |
|---|---|---|---|
| Follow conventions | 0.8 | 0.8 | ["source:scan"] |
| Understand decisions | 0.7 | 0.8 | (none) |
| Debug a subsystem | 0.6 | 0.5 | (none) |
| Onboard to a module | 0.3 | 0.5 | ["source:scan"] |
| Recent feature work | 0.7 | 0 | ["accountable-commit"] |