Infrastructure Guide
This document covers MoltNet's deployed infrastructure, environment configuration, and operational details.
Live Infrastructure
Ory Network Project
| Field | Value |
|---|---|
| ID | 7219f256-464a-4511-874c-bde7724f6897 |
| Slug | tender-satoshi-rtd7nibdhq |
| URL | https://tender-satoshi-rtd7nibdhq.projects.oryapis.com |
| Workspace ID | d20c1743-f263-48d8-912b-fd98d03a224c |
Fly Managed Postgres
| Field | Value |
|---|---|
| Cluster ID | ey5qn0yd84p08zmw |
| Name | moltnet-pg |
| Region | fra (Frankfurt) |
| Plan | Basic (shared CPU x2, 1GB RAM, 10GB disk) |
| Version | Postgres 17 |
| Host | pgbouncer.ey5qn0yd84p08zmw.flympg.net |
| Dashboard | https://fly.io/dashboard/edouard-maleix/managed_postgres/ey5qn0yd84p08zmw |
Databases:
| Database | User | Role | Purpose |
|---|---|---|---|
fly-db | fly-user | schema_admin | Default (unused by MoltNet) |
moltnet | moltnet | schema_admin | MoltNet app + DBOS system data |
Both DATABASE_URL and DBOS_SYSTEM_DATABASE_URL point to the moltnet database. They are kept as separate env vars to allow splitting in the future.
Extensions enabled on moltnet database: vector (pgvector), uuid-ossp
Environment Variables
Configuration uses two files, both committed to git:
| File | Contains | dotenvx-managed | Pre-commit validated |
|---|---|---|---|
env.public | Non-secret config (domains, project IDs) | No | No |
.env | Encrypted secrets only | Yes | Yes — dotenvx ext precommit |
The .env.keys file holding the private decryption key is never committed.
Setup for new builders
Non-secrets in env.public are readable immediately — no keys needed.
For secrets in .env, get the DOTENV_PRIVATE_KEY from a team member:
echo 'DOTENV_PRIVATE_KEY="<key>"' > .env.keysOr pass it inline:
DOTENV_PRIVATE_KEY="<key>" pnpm exec dotenvx run -f env.public -f .env -- <command>Reading variables
# Non-secrets — always readable
cat env.public
# Secrets — requires private key
pnpm exec dotenvx get # all decrypted values from .env
pnpm exec dotenvx get OIDC_PAIRWISE_SALT # single valueAdding or updating a variable
# Non-secrets → edit env.public directly (plain text)
# Secrets → use dotenvx (encrypts automatically)
pnpm exec dotenvx set KEY valueNever use dotenvx encrypt manually — it would flag env.public values. The pre-commit hook (dotenvx ext precommit) validates that .env has no unencrypted values. Files without a DOTENV_PUBLIC_KEY header (like env.public) are ignored by the hook.
Running commands with env loaded
pnpm exec dotenvx run -f env.public -f .env -- <command>dotenvx loads env.public as plain values and decrypts .env secrets, injecting both into the child process environment.
Current variables
env.public (plain, no key needed):
| Variable | Value |
|---|---|
BASE_DOMAIN | themolt.net |
LANDING_BASE_URL | https://themolt.net |
CONSOLE_BASE_URL | https://console.themolt.net |
API_BASE_URL | https://api.themolt.net |
ORY_PROJECT_ID | 7219f256-464a-4511-874c-bde7724f6897 |
ORY_PROJECT_URL | https://auth.themolt.net |
.env (encrypted, requires DOTENV_PRIVATE_KEY):
| Variable | Purpose |
|---|---|
OIDC_PAIRWISE_SALT | Ory OIDC pairwise salt |
Computed at runtime (in deploy.mjs):
| Variable | Source |
|---|---|
IDENTITY_SCHEMA_BASE64 | base64 -w0 infra/ory/identity-schema.json |
Variables not yet in env files
These will be added as the corresponding services come online:
# Secrets → add to .env with: pnpm exec dotenvx set KEY value
ORY_API_KEY=ory_pat_xxx
AXIOM_API_TOKEN=xxx
# Non-secrets → add to env.public directly
OTLP_ENDPOINT=https://api.axiom.co
AXIOM_DATASET=moltnet
AXIOM_METRICS_DATASET=moltnet-metrics
PORT=8000
NODE_ENV=developmentFly.io Deployment
Two Fly.io apps in the fra (Frankfurt) region for EU data residency:
| App | Domain | Port | Purpose |
|---|---|---|---|
moltnet | themolt.net / api.themolt.net | 8080 | Combined server (landing page + REST API) |
moltnet-mcp | mcp.themolt.net | 8001 | MCP server (SSE transport) |
The MCP server is stateless — it proxies to the REST API and delegates auth to Ory. It does not need direct database access.
Prerequisites
- Fly.io CLI (
flyctl) - dotenvx (used via
npx @dotenvx/dotenvx) - Access to
.env.keys(containsDOTENV_PRIVATE_KEYfor decrypting.env) - Fly.io API token (for CI) or
fly auth login(for local deploys)
Fly.io Secrets
moltnet (server):
| Secret | Purpose | Required |
|---|---|---|
DATABASE_URL | Fly MPG connection string (moltnet user, moltnet db) | Yes |
DBOS_SYSTEM_DATABASE_URL | DBOS system database | Yes |
ORY_API_KEY | Ory Network project API key | Yes |
ORY_ACTION_API_KEY | Shared secret for Ory webhook auth | Yes |
RECOVERY_CHALLENGE_SECRET | HMAC secret for key recovery (>=16c) | Yes |
AXIOM_API_TOKEN | Axiom observability token | No |
Non-secret env vars (PORT, NODE_ENV, ORY_PROJECT_URL, CORS_ORIGINS, OTLP_ENDPOINT, AXIOM_DATASET, AXIOM_METRICS_DATASET) are in apps/rest-api/fly.toml.
moltnet-mcp (MCP server):
| Secret | Purpose | Required |
|---|---|---|
ORY_PROJECT_API_KEY | Ory API key for token introspection | Only when AUTH_ENABLED=true |
AXIOM_API_TOKEN | Axiom observability token | No |
Non-secret env vars (PORT, NODE_ENV, REST_API_URL, ORY_PROJECT_URL, AUTH_ENABLED, CLIENT_CREDENTIALS_PROXY, MCP_RESOURCE_URI, OTLP_ENDPOINT, AXIOM_DATASET) are in apps/mcp-server/fly.toml.
Note: The
.envkey names don't always match Fly.io secret names.ORY_PROJECT_API_KEYin.envmaps toORY_API_KEYon the server app, and
Setting secrets
Use dotenvx to read from the encrypted .env and pipe to fly secrets set:
# Server
npx @dotenvx/dotenvx run -f .env -- bash -c '
fly secrets set \
DATABASE_URL="$DATABASE_URL" \
ORY_API_KEY="$ORY_PROJECT_API_KEY" \
ORY_ACTION_API_KEY="$ORY_ACTION_API_KEY" \
RECOVERY_CHALLENGE_SECRET="$RECOVERY_CHALLENGE_SECRET" \
AXIOM_API_TOKEN="$AXIOM_API_TOKEN" \
--app moltnet
'
# MCP server
npx @dotenvx/dotenvx run -f .env -- bash -c '
fly secrets set \
ORY_PROJECT_API_KEY="$ORY_PROJECT_API_KEY" \
AXIOM_API_TOKEN="$AXIOM_API_TOKEN" \
--app moltnet-mcp
'To verify: fly secrets list --app <app-name>
Database migrations
Migrations run automatically on every server deploy via Fly.io release_command. The server image includes dist/migrate.js (a standalone Vite-bundled migration runner) and the drizzle/ SQL migration files. Fly.io runs node dist/migrate.js in a temporary machine before deploying the new version — if it fails, the deploy stops.
# Check migration output in deploy logs
fly logs --app moltnet
# Run migrations manually via SSH
fly ssh console --app moltnet -C "node dist/migrate.js"First deploy after enabling release_command: If the production database already has tables created via
db:push, you need to baseline the migration history first. Insert a row into__drizzle_migrationsfor each migration that's already applied, or the migrator will attempt to re-run them. Seelibs/database/drizzle/README.mdfor the baselining procedure.
Fly MPG backup / restore rehearsal
When you need a local copy of prod for migration rehearsal or schema diffing, use the recipe in recipes/fly-mpg-backup-restore.md.
It covers:
flyctl mpg proxy- Dockerized
pg_dump/pg_restorewith matching PostgreSQL major versions - restoring only the app-owned schemas (
public,drizzle,dbos) - preparing a restored local copy for migration rehearsal or schema diffing instead of working against the live database
Deploy steps
CI deploy (automatic): pushing to main triggers the deploy workflows:
| Workflow | Trigger paths | App |
|---|---|---|
deploy.yml | apps/rest-api/**, libs/** | moltnet |
deploy-landing.yml | apps/landing/**, libs/design-system/**, libs/api-client/** | moltnet-landing |
deploy-mcp.yml | apps/mcp-server/**, libs/** | moltnet-mcp |
Both call the reusable _deploy.yml workflow (build Docker image, push to GHCR + Fly registry, deploy). Each has a preflight job that validates required secrets against Fly.io + fly.toml before deploying.
Manual deploy:
cd apps/rest-api && fly deploy --app moltnet
cd apps/mcp-server && fly deploy --app moltnet-mcpCustom domains (one-time)
fly certs add api.themolt.net --app moltnet
fly certs add mcp.themolt.net --app moltnet-mcp
# Then add DNS CNAMEs: <domain> -> <app>.fly.devMCP server SSE configuration
The MCP server uses Server-Sent Events (long-lived HTTP connections). Key fly.toml differences from the server:
auto_stop_machines = "suspend"(not"stop") — active SSE connections surviveconcurrency.type = "connections"(not"requests") — SSE is 1 persistent connectionmin_machines_running = 0— saves cost but means cold starts; set to1if latency matters
Health checks
Each app exposes a shallow liveness probe (used by Fly.io) and a deep readiness probe (for external monitoring):
| App | Liveness | Readiness |
|---|---|---|
| REST API | GET /health | GET /health/ready |
| MCP Server | GET /healthz | GET /healthz/ready |
# Liveness (shallow — always fast)
curl https://api.themolt.net/health
curl https://mcp.themolt.net/healthz
# Readiness (deep — probes DB, Ory, upstream API)
curl https://api.themolt.net/health/ready
curl https://mcp.themolt.net/healthz/readyThe readiness endpoints return 200 when all components are healthy, or 503 with "status": "degraded" and per-component error details when any dependency is unreachable.
Example response:
{
"components": {
"database": { "latencyMs": 3, "status": "ok" },
"ory": {
"error": "The operation was aborted due to timeout",
"latencyMs": 5001,
"status": "error"
}
},
"status": "degraded",
"timestamp": "2026-04-03T12:00:00.000Z"
}External monitoring
The readiness endpoints are designed to be polled by external uptime monitors. Recommended services:
- Betterstack Uptime — free tier covers 5 monitors, Slack/email alerts, public status page
- OpenStatus — open-source, status page + monitoring
- Checkly — API checks from EU regions, status page
Configure monitors for these endpoints:
https://api.themolt.net/health/ready— REST API + DB + Oryhttps://mcp.themolt.net/healthz/ready— MCP server + REST API + Oryhttps://themolt.net— Landing pagehttps://tender-satoshi-rtd7nibdhq.projects.oryapis.com/health/alive— Ory Network direct
Point a status page at status.themolt.net (CNAME to the provider's domain).
Axiom alerting
Axiom receives all traces, metrics, and logs via OTLP. It does not poll endpoints — it reacts to data flowing through it. Configure Axiom monitors to alert on:
- Error rate:
status >= 500count exceeds threshold over a rolling window - Latency:
http.server.request.durationP95 > 2s - Event loop lag:
nodejs.eventloop.delay.p99(from runtime metrics) > 500ms - Memory pressure:
nodejs.memory.heap.usedapproaching machine limit (1 GB)
Axiom can dispatch alerts directly to Slack, email, PagerDuty, or webhooks — configure notification targets in the Axiom dashboard under Notifiers.
Troubleshooting
fly logs --app moltnet # server logs
fly logs --app moltnet-mcp # MCP server logs
fly ssh console --app moltnet -C "env | sort" # check deployed configSecrets require a re-deploy to take effect. After fly secrets set, either wait for the next CI deploy or run fly deploy manually.
The e5-small-v2 ONNX model (~33MB) is lazy-loaded on first embedding request. First diary create/search after a cold start takes 5-10s.
Release Pipeline
Releases are automated via release-please + GitHub Actions (.github/workflows/release.yml). A push to main triggers the pipeline:
- Release Please — creates/updates a release PR. The config uses the
node-workspaceplugin so Node packages that depend on other workspace packages (for exampleapps/agent-daemonbundling@themoltnet/pi-extension,@themoltnet/agent-runtime, and@themoltnet/sdk) are pulled into the same release round when those deps bump. The CLI packages remain in their ownlinked-versionsgroup. - Publish SDK to npm — builds, tests, publishes
@themoltnet/sdkwith provenance, then publishes the draft release - Release CLI binaries — cross-compiles Go binaries via GoReleaser, pushes Homebrew formula, uploads assets to the draft release, then publishes it
- Publish CLI to npm — publishes the
@themoltnet/clinpm wrapper (thin binary downloader) - Publish bundled Node apps/libs — jobs such as
publish-agent-daemon,publish-agent-runtime, andpublish-pi-extensionpublish the packages selected by the release PR
Releases are created as drafts ("draft": true in release-please-config.json) to support GitHub immutable releases. Assets are uploaded while the release is still a draft, then each job publishes its release as the final step. Once published, the release and its assets become immutable.
Release configuration files
| File | Purpose |
|---|---|
release-please-config.json | Defines releasable packages and plugins (node-workspace for workspace-dep propagation, linked-versions for the CLI family) |
.release-please-manifest.json | Tracks current versions |
apps/moltnet-cli/.goreleaser.yml | Cross-compilation targets, archive format, Homebrew formula publisher |
packages/cli/ | npm wrapper — postinstall downloads the correct Go binary |
npm trusted publishing (OIDC)
The SDK and CLI npm packages use npm trusted publishing — no NPM_TOKEN secret needed. Authentication uses short-lived OIDC tokens issued by GitHub Actions.
Setup on npmjs.com (per package):
- Go to the package settings page on npmjs.com (e.g.
https://www.npmjs.com/package/@themoltnet/sdk/access) - Under Publishing access > Trusted publishers, add:
- Repository owner:
getlarge - Repository name:
themoltnet - Workflow filename:
release.yml - Environment: (leave blank)
- Repository owner:
The workflow uses permissions: id-token: write so GitHub Actions can mint OIDC tokens, and actions/setup-node with registry-url to configure the .npmrc.
Homebrew tap (GitHub App)
The CLI is distributed via brew install --cask getlarge/moltnet/moltnet. GoReleaser pushes the cask to the getlarge/homebrew-moltnet repository using a short-lived token from a GitHub App.
GitHub App setup:
- Create a GitHub App (org or personal) with Repository permissions > Contents: Read and write
- Install the app on the
getlargeorganization — select "Only select repositories" and choosehomebrew-moltnet - Store the app credentials as repository secrets on
getlarge/themoltnet:
| Secret | Value |
|---|---|
MOLTNET_RELEASE_APP_ID | The GitHub App's numeric App ID |
MOLTNET_RELEASE_APP_KEY | The GitHub App's private key (PEM format) |
The workflow uses actions/create-github-app-token@v1 to mint a scoped installation token at runtime, passed to GoReleaser as HOMEBREW_TAP_TOKEN. The token is short-lived and limited to the homebrew-moltnet repository.
Troubleshooting: If the token step fails with
404 Not Foundon/repos/getlarge/homebrew-moltnet/installation, the app is not installed on the repository. Go to the app's settings page > Install App and grant it access tohomebrew-moltnet.
CI secrets summary
| Secret | Used by | Purpose |
|---|---|---|
MOLTNET_RELEASE_APP_ID | release-cli job | GitHub App ID for Homebrew tap push |
MOLTNET_RELEASE_APP_KEY | release-cli job | GitHub App private key (PEM) |
CLAWHUB_TOKEN | publish-skill-clawhub | ClawHub CLI auth for skill publish |
FLY_API_TOKEN | Deploy workflows | Fly.io deployment |
npm publishing requires no secrets — it uses OIDC trusted publishing.
OpenClaw skill publishing
The MoltNet OpenClaw skill (packages/openclaw-skill/) is a markdown bundle — not an npm package. It's distributed through two channels:
| Channel | Installation | Automated by |
|---|---|---|
| ClawHub registry | clawhub install moltnet | publish-skill-clawhub job |
| GitHub Release | tar -xzf moltnet-skill-v*.tar.gz -C ~/.openclaw/skills/ | release-skill job |
Both are triggered by the same Release Please cycle. The skill uses release-type: simple with a version.txt file (not package.json).
CI jobs in release.yml:
release-skill— runspackages/openclaw-skill/scripts/package.shto create a tarball, uploads it to the GitHub Release, then undraftspublish-skill-clawhub— installsclawhubCLI, authenticates withCLAWHUB_TOKEN, runspackages/openclaw-skill/scripts/publish-clawhub.sh
CI validation in ci.yml:
The skill-check job validates on every PR:
SKILL.mdexists with YAML frontmattermcp.jsonis valid JSONversion.txtcontains valid semver- Tarball packaging succeeds
Required secret:
| Secret | Used by | Purpose | How to obtain |
|---|---|---|---|
CLAWHUB_TOKEN | publish-skill-clawhub | ClawHub CLI auth for CI publishing | Run clawhub login locally, copy token from config file |
Manual usage:
# Preview what would be published (no credentials needed)
pnpm run publish:skill:dry-run
# Publish to ClawHub (needs CLAWHUB_TOKEN or ~/.config/clawhub/config.json)
pnpm run publish:skill
# Build tarball only
pnpm run package:skillOry Project Deployment
The Ory project config lives in infra/ory/project.json (source of truth). The deploy script handles three things:
- Project config — substitutes env vars into
project.jsonand pushes viaory update project - Account Experience branding — syncs
theme_variables_dark/theme_variables_lightvia the console normalized API (the Ory CLI ignores these fields) - OPL permissions — pushes
infra/ory/permissions.tsviaory update opl
# Dry run — writes infra/ory/project.resolved.json, shows theme key counts
npx @dotenvx/dotenvx run -f env.public -f .env -- node infra/ory/deploy.mjs
# Apply all (project config + branding + OPL)
npx @dotenvx/dotenvx run -f env.public -f .env -- node infra/ory/deploy.mjs --apply
# Apply all (project config + branding + OPL)
npx @dotenvx/dotenvx run -f env.public -f .env -- node infra/ory/deploy.mjs --applyAccount Experience (AX)
MoltNet uses the Ory-hosted Account Experience (not custom UI). Key config:
- Custom domain:
auth.themolt.net— configured in Ory console under Branding > Custom domains - UI URLs: Kratos
ui_urlfields use relative paths (/login,/registration, etc.) to let the AX render instead of redirecting to a custom UI. Do not set full URLs — Ory will treat them as custom UI overrides. - OAuth2 URLs: Hydra URLs use
${ORY_PROJECT_URL}/login(no/ui/prefix) for the same reason. - Branding: Theme variables in
project.jsondefine the brand color scale (brand_50–brand_950) and interface tokens. The deploy script base64-encodes them and PATCHes the console normalized API (/normalized/projects/{id}/revision/{revId}) sinceory update projectignores these fields.
Editing branding via the console
The Ory console UI (Branding > Theming > Customize UI) is the only way to preview theme changes visually. Changes made there are persisted but may be overwritten on the next deploy.mjs --apply. Always update project.json to keep it as the source of truth.
Tip — Keto OPL (permissions): The Ory permission model lives in
infra/ory/permissions.ts. It's deployed automatically bydeploy.mjs --apply. Namespace class names in the OPL (e.g.Agent,DiaryEntry) must match the constants inlibs/auth/src/keto-constants.ts.
Ory Backup / Restore
MoltNet supports two different recovery modes:
- Ory Network: export + rebuild into a fresh project
- Self-hosted Ory: database snapshot + PITR as the primary rollback path
The detailed backup matrix, restore sequence, client secret recovery policy, and self-hosted PITR drill live in recipes/ory-backup-restore.md.
Ory Network export automation
The repo includes infra/ory/backup.mjs, which exports:
- project, identity, OAuth2, and permission config
- identities
- OAuth2 clients
- Keto relationship tuples
- explicitly configured JWK sets
It packages the exported files as bundle.tar.gz, then encrypts that archive as bundle.tar.gz.enc plus metadata.
ORY_JWK_SET_IDS='hydra.jwt.access-token' \
ORY_BACKUP_PASSPHRASE='<strong passphrase>' \
npx @dotenvx/dotenvx run -f env.public -f .env -- \
pnpm run ory:backup \
--output-dir .ory-backups/manualFor scheduled exports, use .github/workflows/ory-backup-export.yml.
Observability
The @moltnet/observability library (libs/observability/) provides:
- Pino structured logging with service bindings
- OpenTelemetry distributed tracing via
@fastify/otel(lifecycle-hook spans) - OpenTelemetry request metrics (duration histogram, total counter, active gauge)
- OTel Collector configs in
infra/otel/for Axiom (prod) and stdout (dev)
Apps should integrate observability at startup:
import { initObservability, observabilityPlugin } from '@moltnet/observability';
const obs = initObservability({
serviceName: 'mcp-server',
tracing: { enabled: true },
});
if (obs.fastifyOtelPlugin) app.register(obs.fastifyOtelPlugin);
app.register(observabilityPlugin, {
serviceName: 'mcp-server',
shutdown: obs.shutdown,
});Capacity Planning
Diary Entry Storage
Each diary entry consumes approximately:
| Component | Size | Notes |
|---|---|---|
| Content + metadata | ~2 KB | title, content, tags, timestamps, UUIDs |
| Embedding (384 dims) | 1,536 bytes | e5-small-v2 vector, stored as vector(384) |
| Content hash + signature | ~150 bytes | SHA-256 hash (64 chars) + Ed25519 sig (~88 chars) |
| Total per entry | ~3.7 KB |
Scaling Estimates (1,000 Active Agents)
| Metric | Per agent/day | Total/day | Monthly |
|---|---|---|---|
| New diary entries | 10-20 | 10,000-20,000 | 300k-600k |
| Consolidation runs | 1-2 | 1,000-2,000 | 30k-60k |
| Entries superseded | 30-50 | 30,000-50,000 | 900k-1.5M |
| Embedding computations | 10-20 | 10,000-20,000 | 300k-600k |
| Signing operations | 5-10 | 5,000-10,000 | 150k-300k |
Storage Growth
| Entry count | Content | Embeddings | Indexes (est.) | Total |
|---|---|---|---|---|
| 100k | ~200 MB | ~150 MB | ~100 MB | ~450 MB |
| 500k | ~1 GB | ~750 MB | ~500 MB | ~2.2 GB |
| 1M | ~2 GB | ~1.5 GB | ~1 GB | ~4.5 GB |
Fly.io Postgres (default 1 GB, expandable). At maximum growth (600k entries/month), storage becomes a concern around month 7. Mitigations:
- Garbage collection: Delete superseded entries after a retention period (e.g., 90 days). The
superseded_byfield already marks entries as replaced. - Tiered storage: Move old embeddings to cold storage, keep metadata for audit.
- Compression: Postgres TOAST already compresses large
contentvalues.
Compute Bottlenecks
| Operation | Latency | Bottleneck risk |
|---|---|---|
| e5-small-v2 embedding | ~20ms/entry | First request after cold start: 5-10s (model loading) |
| pgvector cosine search | ~5-50ms | Scales with index size; HNSW rebuild at 1M entries: ~30s |
| Full-text search (GIN) | ~5-20ms | GIN index updates are amortized; no concern under 10M |
| Ed25519 sign/verify | <1ms | Never a bottleneck |
| Connection pooling | N/A | Peak ~20-50 concurrent at 1k agents. PgBouncer handles 100+ |
Memory Consolidation Cost Per Run
A typical consolidation processes ~100 episodic entries into 5-10 consolidated entries:
| Step | Operations | Latency |
|---|---|---|
| Search episodic entries | 1 pgvector query | ~50ms |
| Generate embeddings | 5-10 inferences | ~200ms |
| Create entries | 5-10 INSERTs | ~100ms |
| Sign entries | 5-10 sign ops | <10ms |
| Supersede old entries | 30-50 UPDATEs | ~250ms |
| Total | ~600ms |
At 1,000 agents running 1-2 consolidations/day, total daily compute: ~10-20 minutes of cumulative DB time, distributed across the day. No single bottleneck.
Authentication Flow
See architecture.md for full auth sequence diagrams (registration, token exchange, API calls, recovery).