Marianne AI Compose: The Definitive Reference¶
Purpose: Comprehensive reference for AI assistants and developers working on Marianne. This document should be kept current by the
docs-generatorscore.Last updated: 2026-04-07
What Marianne IS¶
Marianne is a declarative orchestration framework that turns YAML score definitions into resilient, resumable, self-improving AI execution pipelines. You write a score; Marianne decomposes it into sheets, assigns each to an instrument (Claude Code, Gemini CLI, Aider, Codex CLI, Cline, Goose, or any config-driven CLI tool), validates outputs against acceptance criteria, learns from outcomes, and feeds knowledge forward through a specification corpus.
The musical metaphor is load-bearing architecture, not decoration:
| Musical Term | System Concept | Implementation |
|---|---|---|
| Score | Job configuration | JobConfig — the YAML file defining what to execute |
| Sheet | Execution unit | SheetState — one stage of work within a score |
| Movement | Named phase | MovementDef — logical grouping of sheets (Planning, Implementation, Review) |
| Voice | Parallel instance | Fan-out instance within a movement |
| Concert | Job chain | Jobs spawning jobs via on_success hooks |
| Conductor | Daemon process | mzt start — the long-running process that orchestrates everything |
| Baton | Execution engine | BatonCore — event-driven dispatch: decides WHEN and HOW MUCH |
| Musician | Sheet executor | sheet_task() — plays once, reports result (never retries or decides) |
| Instrument | AI backend | InstrumentProfile — Claude Code, Gemini CLI, Aider, etc. |
| Technique | Tool/MCP/skill | How you play the instrument — tools, MCP servers, skill files |
| Preamble | Positional identity | Dynamic header telling agents who they are in the score |
| Cadenza | Per-sheet injection | Files injected into specific sheets (context, skills, tools) |
| Prelude | Global injection | Files injected into every sheet |
| Libretto | Specification corpus | .marianne/spec/ — project knowledge injected into agent prompts |
| Passage | Spec fragment | Tagged excerpt from the libretto, filtered per-sheet |
| Fermata | Escalation pause | Holds execution for human or AI judgment |
| Tempo | Execution rate | Pacing, rate limits, backpressure — never failure conditions |
The Complete Architecture¶
┌─────────────────────────────────────────┐
│ Score (YAML) │
│ name, sheets, movements, instruments, │
│ prompt template, validations, hooks │
└──────────────────┬──────────────────────┘
│ mzt run
┌──────────────────▼──────────────────────┐
│ CLI Layer (Typer + Rich) │
│ 35 commands: run, status, resume, pause, │
│ validate, diagnose, instruments, ... │
└──────────────────┬──────────────────────┘
│ IPC (Unix socket + JSON-RPC 2.0)
┌──────────────────────────────────────▼──────────────────────────────────────┐
│ Conductor (mzt start) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Baton Engine (event-driven core) │ │
│ │ ┌──────────┬──────────┬───────────┬──────────┬──────────────────┐ │ │
│ │ │ Dispatch │ Timer │ Backend │ State │ Prompt │ │ │
│ │ │ ready() │ Wheel │ Pool │ Persist │ Renderer │ │ │
│ │ │ DAG-aware│ all timing│ per-inst │ SQLite │ 9-layer │ │ │
│ │ └──────────┴──────────┴───────────┴──────────┴──────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────┬───────────┬──────────┬──────────┬───────────┬─────────────┐ │
│ │ Manager │ Registry │ Rate │Backpress.│ Learning │ Event │ │
│ │ job CRUD │ SQLite │ Coordin. │ load mgmt│ Hub │ Bus │ │
│ │ lifecycle│ persist │ cross-job│ memory │ patterns │ pub/sub │ │
│ └──────────┴───────────┴──────────┴──────────┴───────────┴─────────────┘ │
│ │
│ ┌──────────┬──────────────────────────────────────────────────────────┐ │
│ │IPC Server│ Supporting: Health, Monitor, PGroup, Detection, Output, │ │
│ │Unix sock │ Observer/Recorder, Clone, System Probe, Profiler │ │
│ └──────────┴──────────────────────────────────────────────────────────┘ │
└────────┬───────────────────────────────────────────────────────────────────┘
│
│ Musicians (one per sheet execution)
│
┌────────▼────────────────────────────────────────────────────────────────────┐
│ Musician → Instrument │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Config-Driven Instruments (YAML profiles) │ │
│ │ claude-code · gemini-cli · codex-cli · cline-cli · aider · goose │ │
│ │ + organization profiles (~/.marianne/instruments/) │ │
│ │ + venue profiles (.marianne/instruments/) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Native Backends (Python implementations) │ │
│ │ Claude CLI · Anthropic API · Ollama · Recursive Light │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ Techniques: tools, MCP servers, skills (injected via prelude/cadenza) │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌────────▼────────────────────────────────────────────────────────────────────┐
│ Supporting Systems │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ State │ │ Learning │ │ Spec │ │Validation│ │ Notifications │ │
│ │ JSON │ │ Global │ │ Corpus │ │ 5 types │ │ Desktop │ │
│ │ SQLite │ │ Store │ │ (Libretto)│ │ + retry │ │ Slack │ │
│ │ Memory │ │ (SQLite) │ │ Rosetta │ │ + cond. │ │ Webhook │ │
│ └──────────┘ └──────────┘ └───────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
The Execution Flow (What Actually Happens)¶
When you run mzt run job.yaml:
-
Parse & Validate — YAML parsed into Pydantic
JobConfigwith 51+ config models. Fan-out expansion happens at parse time (e.g., movement 2 withvoices: 5becomes sheets 2–6). Dependencies validated for cycles. -
Conductor Routing — CLI checks for a running conductor via Unix socket. If found, job is submitted to the conductor for execution. If not, the command exits with an error (conductor is required). Only
--dry-runandmzt validatework without a conductor. -
Baton Registration — The conductor's baton engine registers the job: builds sheet execution states, extracts the dependency graph, auto-registers instruments with concurrency limits, and resolves per-sheet instrument assignments.
-
Instrument Resolution — For each sheet, the baton resolves which instrument to use via precedence cascade:
- Per-sheet assignment (
per_sheet_instruments) - Batch assignment (
instrument_map) - Movement-level assignment (
movements.N.instrument) - Score-level default (
instrument:) -
System default (Claude Code)
-
Event-Driven Dispatch — The baton's main loop waits for events, then dispatches ready sheets:
-
Musician Execution — Each dispatched sheet spawns a musician (async task) that:
- Acquires a backend from the
BackendPool - Receives a pre-rendered prompt (9-layer assembly)
- Plays once: sends prompt to instrument, collects output
-
Reports result back to baton inbox (never retries or decides)
-
Per-Sheet State Machine:
-
Validation — 5 validation types check outputs:
file_exists,file_modified,content_contains,content_regex,command_succeeds. Conditional validation viacondition:. Retry with delay for filesystem race conditions. -
Error Classification — Multi-phase: structured JSON errors → exit code/signal → regex fallback across 40 error codes in 8 categories. Rate limits get parsed reset times and are treated as tempo changes (the baton pauses that instrument while other instruments continue).
-
Learning Aggregation — On completion (or failure — survivorship bias fix), outcomes flow to
GlobalLearningStore(~/.marianne/global-learning.db). Patterns detected, merged, trust-scored. -
Post-Success Hooks — Concert chaining: completed jobs can spawn new jobs via
on_successhooks routed through the conductor IPC.fresh: trueprevents infinite loops. Zero-work guard as defense-in-depth.
Defense-in-depth example: When a job chains to itself, the child loads the parent's COMPLETED state. Without
--fresh, it executes zero sheets and triggers on_success again — infinite loop. Marianne fixes this at two independent layers:--freshdeletes state (root cause), AND a zero-work guard skips hooks when nothing was done (symptom prevention). Either fix alone is sufficient.
The Conductor (mzt start)¶
The conductor is a long-running process that manages all job execution. Every mzt run command routes through the conductor via Unix socket IPC.
Quick Start¶
# Start the conductor (foreground, for development)
mzt start --foreground
# Start the conductor (background, production)
mzt start
# Submit a job
mzt run my-job.yaml
# Check conductor health
mzt conductor-status
# List jobs
mzt list # Active jobs only
mzt list --all # Include completed/failed
# Stop the conductor (ONLY when no jobs are running)
mzt stop
Key Components¶
| Component | File | Purpose |
|---|---|---|
| BatonCore | daemon/baton/core.py |
Event-driven execution engine — the heart |
| BatonAdapter | daemon/baton/adapter.py |
Bridges conductor and baton (6 surface areas) |
| BackendPool | daemon/baton/backend_pool.py |
Manages backend instances per instrument |
| TimerWheel | daemon/baton/timer.py |
Unified timing: retry, rate limit, pacing, stale, cron |
| PromptRenderer | daemon/baton/prompt.py |
9-layer prompt assembly pipeline |
| Dispatch | daemon/baton/dispatch.py |
Finds sheets ready to execute, respects limits |
| Musician | daemon/baton/musician.py |
Single-attempt sheet execution: play once, report |
| Manager | daemon/manager.py |
Job lifecycle, concurrency limits |
| Registry | daemon/registry.py |
SQLite-backed persistent job tracking |
| Job Service | daemon/job_service.py |
Core execution, decoupled from CLI |
| IPC Server | daemon/ipc/server.py |
Unix socket + JSON-RPC 2.0 |
| Rate Coordinator | daemon/rate_coordinator.py |
Cross-job rate limit state |
| Backpressure | daemon/backpressure.py |
Memory-based load management |
| Monitor | daemon/monitor.py |
CPU/memory/process tracking |
| Learning Hub | daemon/learning_hub.py |
Cross-job pattern sharing |
| Observer/Recorder | daemon/observer_recorder.py |
Event recording for dashboards |
| Clone | daemon/clone.py |
Isolated test conductors |
| Detection | daemon/detect.py |
Auto-detect running conductor |
Conductor Clones¶
Safe testing without touching production:
mzt start --conductor-clone # Default clone
mzt start --conductor-clone=staging # Named clone "staging"
mzt run score.yaml --conductor-clone # Submit to clone
mzt status --conductor-clone=staging # Query clone
Each clone gets isolated paths:
- Socket: /tmp/marianne-clone-{name}.sock
- PID: /tmp/marianne-clone-{name}.pid
- State DB: ~/.marianne/clone-{name}-state.db
- Log: /tmp/marianne-clone-{name}.log
Job IDs¶
Job IDs are human-friendly: the config file stem (e.g., quality-continuous). Duplicate names get -2, -3 suffixes.
The Baton Engine¶
The baton is the event-driven execution engine at the heart of the conductor. Named after the conductor's baton in an orchestra — it doesn't decide what to play (the score does) or how to play (the musicians do). It controls when and how much.
Why Baton Replaced the Old Runner¶
| Aspect | Old Runner (JobRunner) | Baton Engine |
|---|---|---|
| Execution model | Monolithic — one async task per job | Event-driven — single loop across ALL jobs |
| Scheduling | Polling loop (asyncio.sleep(0.1)) |
Event-based with zero polling |
| Timing | 8 scattered mechanisms | Single unified Timer Wheel |
| Rate limits | Burns retries, then kills the job | Tempo changes — pauses instrument, others continue |
| Multi-job | Each job gets its own runner | Single baton manages all jobs simultaneously |
| State | In-memory only | Persists to SQLite for restart recovery |
Baton Event Types (20+)¶
From Musicians: SheetAttemptResult (full execution report), SheetSkipped
From Timer Wheel: RetryDue, RateLimitExpired, StaleCheck, CronTick, JobTimeout, PacingComplete
From External Commands: PauseJob, ResumeJob, CancelJob, ConfigReloaded, ShutdownRequested
From Observer: ProcessExited (backend process died), ResourceAnomaly (memory/CPU pressure)
Rate Limits: RateLimitHit (tempo change), RateLimitExpired (check recovery)
Escalation (Fermata): EscalationNeeded, EscalationResolved, EscalationTimeout
Sheet Execution States¶
PENDING → READY → DISPATCHED → RUNNING → COMPLETED | FAILED | SKIPPED | CANCELLED | WAITING | RETRY_SCHEDULED | FERMATA
Instrument State Tracking¶
Per-instrument health with circuit breaker: CLOSED (healthy) → OPEN (unhealthy) → HALF_OPEN (probing). Rate limit tracking, failure/success counters, concurrency enforcement.
The Instrument System¶
Instruments are AI backends. Marianne ships with 6 config-driven instrument profiles and bridges 4 native Python backends, giving 10+ instruments out of the box. Adding a new instrument is ~30 lines of YAML, not ~300 lines of Python.
Built-in Instruments¶
| Instrument | Kind | Capabilities | Default Model |
|---|---|---|---|
| claude-code | CLI | tool_use, file_editing, shell_access, vision, mcp, structured_output, streaming, thinking | claude-sonnet-4-5-20250929 |
| gemini-cli | CLI | tool_use, file_editing, shell_access, vision, structured_output | gemini-2.5-pro |
| codex-cli | CLI | tool_use, file_editing, shell_access, mcp, structured_output, session_resume, streaming | o3 |
| cline-cli | CLI | tool_use, file_editing, shell_access, mcp, structured_output, thinking, session_resume | — |
| aider | CLI | file_editing, shell_access | — |
| goose | CLI | tool_use, file_editing, shell_access, mcp, structured_output, session_resume, streaming | — |
Native Backends (Python implementations)¶
| Backend | Module | Purpose |
|---|---|---|
| Claude CLI | backends/claude_cli.py |
Direct Claude CLI integration (legacy) |
| Anthropic API | backends/anthropic_api.py |
Direct API calls |
| Ollama | backends/ollama.py |
Local model execution |
| Recursive Light | backends/recursive_light.py |
Recursive self-improvement framework |
Profile Loading Cascade¶
Profiles are loaded in order; later directories override earlier ones:
- Built-in —
src/marianne/instruments/builtins/(shipped with Marianne) - Organization —
~/.marianne/instruments/(user-wide customization) - Venue —
.marianne/instruments/(project-specific profiles)
Instrument Profile Schema¶
Each YAML profile defines:
- Identity:
name,display_name,description,kind(cli or http) - Capabilities: Set of strings (
tool_use,file_editing,shell_access,vision,mcp,structured_output,streaming,thinking,session_resume,code_mode) - Models: List with context window, cost per 1K tokens, max output tokens, max concurrent
- CLI specifics: executable, subcommand, prompt delivery (flag or stdin), output parsing (text/json/jsonl), error detection patterns, environment variable filtering, process isolation
- HTTP specifics: endpoint, auth scheme (designed, not yet implemented)
Per-Sheet Instrument Assignment¶
Instruments can be assigned at multiple levels with cascading precedence:
# Score-level default
instrument: claude-code
instrument_config:
timeout_seconds: 1800
instrument_fallbacks: [gemini-cli, aider]
# Named instrument aliases
instruments:
fast-writer:
profile: gemini-cli
config:
model: gemini-2.5-flash
timeout_seconds: 300
deep-thinker:
profile: claude-code
config:
timeout_seconds: 3600
# Movement-level override
movements:
1:
name: "Planning"
instrument: deep-thinker
2:
name: "Implementation"
instrument: fast-writer
voices: 3
# Per-sheet override (highest precedence)
sheet:
per_sheet_instruments:
{1: 'claude-code', 5: 'gemini-cli'}
per_sheet_instrument_config:
{3: {model: 'gemini-2.5-flash'}}
per_sheet_fallbacks:
{3: ['gemini-cli', 'ollama']}
instrument_map:
{'gemini-cli': [1, 2, 3], 'claude-code': [4, 5, 6]}
Resolution precedence (highest wins):
1. per_sheet_instruments — explicit per-sheet
2. instrument_map — batch assignment
3. movements.N.instrument — movement-level
4. instrument: — score-level default
PluginCliBackend¶
The PluginCliBackend is the universal CLI instrument executor. It:
- Builds CLI commands from profile specifications (executable, flags, prompt delivery)
- Manages subprocess lifecycle via
asyncio.create_subprocess_exec - Parses output according to format (text, JSON, JSONL with event filtering)
- Extracts token counts via dot-path (with wildcard aggregation for multi-model routing)
- Classifies errors using profile-defined regex patterns
- Filters environment variables to prevent credential leakage between instruments
- Supports process group isolation for MCP cleanup
BackendPool¶
The baton's BackendPool manages backend instances:
- CLI instruments: One backend per concurrent sheet (subprocess isolation), returned to free list for reuse
- HTTP instruments: Singleton per instrument (connection pooling internal)
- Lazy creation on first acquire
- Tracks in-flight instances for concurrency enforcement
Named Movements¶
Movements are named phases within a score. They replace raw sheet numbers with semantic meaning.
movements:
1:
name: "Planning"
instrument: claude-code
2:
name: "Implementation"
voices: 3 # Fan-out: 3 parallel instances
instrument: gemini-cli
3:
name: "Review"
instrument: claude-code
Template variables:
- {{ movement }} — movement number
- {{ total_movements }} — total movements before fan-out expansion
- {{ voice }} — instance number within movement (1-indexed)
- {{ voice_count }} — total voices in this movement
Movements are aliases for the existing stage/instance system: movement = stage, voice = instance, voice_count = fan_count. Both vocabularies work in templates.
The 9-Layer Prompt Assembly Pipeline¶
Every sheet prompt is assembled through 9 layers by the PromptRenderer:
| Layer | Content | Purpose |
|---|---|---|
| 1. Preamble | Positional identity + retry status | "You are sheet 5 of 12" |
| 2. Template | Jinja2 rendering with all variables | The core instructions |
| 3. Skills/Tools | Prelude/cadenza (category=skill/tool) | Methodology and available actions |
| 4. Context | Prelude/cadenza (category=context) | Background knowledge |
| 5. Spec Fragments | From specification corpus (libretto) | Project conventions, constraints |
| 6. Failure History | Previous sheet failures | "Don't repeat these mistakes" |
| 7. Learned Patterns | From learning store | "This approach worked before" |
| 8. Validation Requirements | Formatted as success checklist | "Your output must pass these checks" |
| 9. Completion Suffix | Appended in completion mode | Recovery guidance for partial passes |
The musician receives the pre-rendered prompt. Rendering is stateless per job, enabling independent testing of each layer.
Preamble¶
Dynamic headers that tell agents their identity:
First run:
<marianne-preamble>
You are sheet N of M in a Marianne concert.
Workspace: /path/to/workspace
Other sheets may execute concurrently — coordinate via workspace files.
Your prompt describes intent, not a prescription. Use your judgment.
Success: all validation requirements pass on the first automated check.
Write all outputs to your workspace. Exit with no background processes.
</marianne-preamble>
Retry:
<marianne-preamble>
RETRY #2
Previous attempt failed validation. Study workspace for evidence.
You are sheet N of M in a Marianne concert.
...
</marianne-preamble>
Cadenza and Prelude Injections¶
# Prelude: injected into ALL sheets
prelude:
- file: "shared-context.md"
as: context
# Cadenza: injected into specific sheets
cadenza:
3:
- file: "{{ workspace }}/security-checklist.md"
as: skill
- file: "{{ workspace }}/api-docs.md"
as: context
Injection categories: context (background knowledge), skill (methodology), tool (available actions).
The Specification Corpus (Libretto)¶
The libretto is Marianne's project knowledge base, stored in .marianne/spec/. It provides per-sheet context about the project being worked on.
Spec Files¶
| File | Lines | Purpose |
|---|---|---|
intent.yaml |
374 | WHY — goals, trade-offs, escalation criteria, vision |
architecture.yaml |
571 | WHAT — system design, components, invariants, state model |
conventions.yaml |
485 | HOW — code patterns, naming, testing, package structure |
constraints.yaml |
384 | MUST/MUST-NOT — hard boundaries, resource limits, compatibility |
quality.yaml |
307 | GOOD ENOUGH — acceptance criteria, validation, testing approach |
Total: ~2,100 lines of structured project knowledge.
How It Works¶
-
Loading:
SpecCorpusLoader.load()reads YAML and Markdown files from.marianne/spec/. Each becomes aSpecFragment(name, content, tags, kind). Files sorted alphabetically for deterministic ordering. -
Filtering: Per-sheet tag filtering via
spec_tags: {sheet_num: ["tag1", "tag2"]}. A fragment matches if it shares at least one tag. Empty filter = all fragments. -
Injection: Filtered fragments are rendered into the prompt at Layer 5 of the 9-layer pipeline, between context injections and failure history.
-
Budget gating: Fragments respect context window budget to avoid overwhelming the instrument's token limit.
The Rosetta Pattern Corpus¶
A self-perpetuating discovery engine that finds, validates, and documents orchestration patterns across multiple domains. Not aspirational — a working corpus with 56 patterns across 4 iterations.
Structure¶
scores/rosetta-corpus/
├── INDEX.md # Master index
├── forces.md # 10 generative forces
├── glossary.md # Domain terminology
├── selection-guide.md # Pattern selection decision tree
├── review-integration.md # Iteration history
├── awaiting.md # Patterns awaiting conductor primitives
├── questions.md # Open research questions
└── patterns/ # 56 pattern files
├── fan-out-synthesis.md
├── immune-cascade.md
├── mission-command.md
├── shipyard-sequence.md
└── ... (52 more)
Key Concepts¶
- 10 Generative Forces: Why patterns exist — Information Asymmetry, Finite Resources, Partial Failure, Exponential Defect Cost, Producer-Consumer Mismatch, Instrument-Task Fit, Convergence Imperative, Accumulated Signal, Structured Disagreement, Progressive Commitment
- 11 Generators/Moves: Structural mechanisms independently invented across 3+ domains
- 56 Working Patterns: Each includes core dynamic, when to use, Marianne YAML examples, failure modes, composition relationships
- Status markers: Working (viable today) vs. Aspirational (future capability dependent)
- 6 Rosetta proof scores in
scores/rosetta-corpus/proof-scores/demonstrate patterns in practice
The Learning System (Marianne's Brain)¶
Marianne doesn't just retry — it learns.
Pattern Detection¶
8 pattern types extracted from outcomes: validation failures, retry successes, completion mode effectiveness, first-attempt successes, confidence patterns, semantic failures, output patterns (regex against stdout/stderr), and error code patterns.
Pattern Lifecycle¶
Trust Scoring¶
trust = 0.5 (base prior)
+ success_rate × 0.3
- failure_rate × 0.4
+ age_factor × 0.2
± quarantine adjustment
Laplace smoothing is critical. New patterns get effectiveness = (successes + 0.5) / (total + 1). Without the +0.5 prior, a pattern that succeeds once has 100% effectiveness and dominates. The prior makes new patterns start neutral (0.5) and converge to their true rate.
Pattern Application¶
Epsilon-greedy: with probability epsilon (default 15%), lower-priority patterns are included to collect effectiveness data. Prevents cold-start death where new patterns never get tested.
Cross-Job Coordination¶
- Rate limit broadcasting: Job A hits rate limit → records to SQLite → Job B checks before retrying
- Pattern discovery broadcasting: TTL-based (5 min) real-time sharing between concurrent jobs
- Learned wait times: Average successful recovery waits, bounded and requiring minimum samples
Entropy Monitoring¶
Shannon entropy over pattern application distribution. Low entropy triggers automatic response: boost exploration budget, revisit quarantined patterns. Prevents convergence collapse to a single dominant pattern.
The Old Execution Runner¶
The pre-baton execution engine (src/marianne/execution/runner/) still exists and is used as a fallback. It consists of 7 mixins + 1 base class via multiple inheritance (8 classes across 10 files):
| Class | File | Responsibility |
|---|---|---|
JobRunnerBase |
base.py |
Base class, shared state, initialization |
SheetExecutionMixin |
sheet.py (~3,400 lines) |
Core sheet execution and validation |
LifecycleMixin |
lifecycle.py |
Job run modes (sequential, parallel) |
RecoveryMixin |
recovery.py |
Self-healing, retry, circuit breaker |
CostMixin |
cost.py |
Token/cost tracking and limits |
ContextBuildingMixin |
context.py |
Cross-sheet context assembly |
IsolationMixin |
isolation.py |
Git worktree management |
PatternsMixin |
patterns.py |
Learning pattern queries and feedback |
Supporting modules: models.py (data models), __init__.py (exports).
Score Anatomy¶
Every score needs 3 required top-level fields plus optional configuration:
name: "job-name" # REQUIRED: unique identifier
sheet: # REQUIRED: how work is divided
size: 1 # items per sheet
total_items: 9 # total items = 9 sheets when size=1
prompt: # REQUIRED: what the AI should do
template: | # Jinja2 template with {{sheet_num}}, etc.
...
Everything else has sensible defaults.
Key Template Variables¶
{{ sheet_num }} - Current sheet number (1-indexed)
{{ total_sheets }} - Total number of sheets
{{ start_item }} - First item number for this sheet
{{ end_item }} - Last item number for this sheet
{{ workspace }} - Workspace directory path
{{ stage }} - Original stage number (fan-out aware)
{{ instance }} - Fan-out instance (1-indexed)
{{ fan_count }} - Total instances for this stage
{{ movement }} - Movement number (alias for stage)
{{ voice }} - Voice number (alias for instance)
{{ voice_count }} - Voices in movement (alias for fan_count)
{{ total_movements }} - Total movements before expansion
{{ previous_outputs }} - Dict of previous sheet stdout (cross_sheet)
{{ previous_files }} - Dict of captured file contents (cross_sheet)
{{ skipped_upstream }} - Whether upstream sheets were skipped
+ any custom variables from prompt.variables
Validation System — 5 Types¶
validations:
- type: file_exists
path: "{workspace}/result.md"
- type: file_modified
path: "{workspace}/TRACKING.md"
- type: content_contains
path: "{workspace}/result.md"
pattern: "IMPLEMENTATION_COMPLETE: yes"
- type: content_regex
path: "{workspace}/result.md"
pattern: "FIXES_APPLIED: [0-9]+"
- type: command_succeeds
command: "pytest -x -q --tb=no"
Important: Validation paths use Python format strings with single braces: {workspace}. Template prompts use Jinja2 double braces: {{ workspace }}. Mixing them causes silent failures.
Key validation features:
- Conditional: condition: "sheet_num >= 6" — applies only to matching sheets
- Retry with delay: retry_count: 3, retry_delay_ms: 200 — for filesystem race conditions
The 6 Score Archetypes¶
- Simple Task (1–3 sheets, linear) — Quick one-off tasks
- Multi-Phase Pipeline (5–10 sheets, strict dependencies) — Refactoring, building features
- Expert Review (parallel fan-out + synthesis) — Code review, quality improvement
- Self-Improving Opus (9 sheets, 6 movements, recursive) — Recursive self-improvement
- Concert Chain (jobs spawning jobs) — Infinite improvement loops
- Issue-Driven Fixer (dynamic scope) — Bug fixing, addressing deferred issues
Advanced Features¶
| Feature | Config Key | Example |
|---|---|---|
| Named movements | movements: {1: {name: "Planning"}} |
Semantic phase names |
| Multi-instrument | instruments: {fast: {profile: gemini-cli}} |
Named instrument aliases |
| Per-sheet instruments | sheet.per_sheet_instruments: {5: gemini-cli} |
Sheet-level override |
| Instrument fallbacks | instrument_fallbacks: [gemini-cli, aider] |
Try alternatives on failure |
| Fan-out parallelism | movements.2.voices: 5 or sheet.fan_out: {2: 5} |
Parallel instances |
| Sheet dependencies (DAG) | sheet.dependencies: {3: [1, 2]} |
Fan-in after parallel |
| Cross-sheet context | cross_sheet.auto_capture_stdout: true |
Pass outputs forward |
| Worktree isolation | isolation.enabled: true |
Parallel-safe git ops |
| Concert chaining | on_success: [{type: run_job}] |
Self-chaining loops |
| Workspace lifecycle | workspace_lifecycle.archive_on_fresh: true |
Clean restarts |
| Cost limits | cost_limits.max_cost_per_job: 100 |
Budget enforcement |
| Timeout overrides | backend.timeout_overrides: {5: 7200} |
Per-sheet timeouts |
| Allowed tools | backend.allowed_tools: [Read, Grep] |
Restrict agent tools |
| Spec corpus tags | spec_tags: {3: [security, constraints]} |
Per-sheet knowledge filtering |
| Skip conditions | skip_when: {5: "movement == 2"} |
Conditional sheet skipping |
Error Classification¶
40 structured error codes across 8 categories:
| Category | Codes | Examples |
|---|---|---|
| E0xx Execution | 7 | TIMEOUT, KILLED, CRASHED, INTERRUPTED, OOM, STALE, UNKNOWN |
| E1xx Rate Limit | 4 | RATE_LIMIT_API, RATE_LIMIT_CLI, CAPACITY_EXCEEDED, QUOTA_EXHAUSTED |
| E2xx Validation | 5 | FILE_MISSING, CONTENT_MISMATCH, COMMAND_FAILED, TIMEOUT, GENERIC |
| E3xx Configuration | 6 | INVALID, MISSING_FIELD, PATH_NOT_FOUND, PARSE_ERROR, MCP_ERROR, CLI_MODE_ERROR |
| E4xx State | 4 | CORRUPTION, LOAD_FAILED, SAVE_FAILED, VERSION_MISMATCH |
| E5xx Backend | 5 | CONNECTION, AUTH, RESPONSE, TIMEOUT, NOT_FOUND |
| E6xx Preflight | 4 | PATH_MISSING, PROMPT_TOO_LARGE, WORKING_DIR_INVALID, VALIDATION_SETUP |
| E9xx Network | 5 | CONNECTION_FAILED, DNS_ERROR, SSL_ERROR, TIMEOUT, UNKNOWN |
Each error code carries a retry behavior classification: rate_limit, transient, validation, auth, network, timeout.
Multi-phase classification pipeline: structured JSON → exit code/signal → regex patterns → priority-based tiebreaking across error codes with rate limit reset time parsing.
Strengths¶
Resilience Engineering¶
- Checkpoint after every sheet — crash anywhere, resume exactly
- Zombie detection via PID checking (not time-based — jobs can run for days)
- Atomic state saves (temp file + rename)
- Circuit breaker prevents cascading failures
- Graceful shutdown on SIGINT/SIGTERM; live config reload on SIGHUP
- Self-healing: auto-diagnosis + remediation when retries exhausted
- Rate limits are tempo changes, not failures — other instruments continue
Multi-Instrument Orchestration¶
- 10+ instruments out of the box (6 config-driven profiles + 4 native backends)
- Config-driven: new instruments in ~30 lines of YAML
- Per-sheet assignment with cascading precedence
- Named instrument aliases with movement-level overrides
- Instrument fallback chains (designed, not yet fully implemented)
- Credential isolation via environment variable filtering
- BackendPool with per-instrument concurrency management
Observability¶
- Structured logging via structlog with context propagation
- 20+ baton event types published via async event bus
- 40 error codes across 8 categories
- Execution history in SQLite for post-mortem
- Cost tracking with token-level granularity
- Web dashboard with SSE for real-time updates
mzt diagnosefor comprehensive failure analysis
Composability¶
- Instrument profiles: plug in any CLI tool via YAML
- State backends: JSON for simplicity, SQLite for queries, Memory for tests
- 5 validation types with conditional application and retry
- Notification channels: desktop, Slack, webhook
- Everything is YAML-configurable with sensible defaults
- 9-layer prompt assembly with per-sheet knowledge injection
Learning System¶
- Cross-workspace pattern sharing via SQLite
- Trust scoring with quarantine lifecycle
- Epsilon-greedy exploration prevents local optima
- Entropy monitoring prevents convergence collapse
- 8 pattern types with Laplace-smoothed effectiveness priors
Parallel Safety¶
- Git worktree isolation (~24ms overhead, shared objects)
- Locking during execution with stale lock recovery
- State mutex for concurrent sheet writes
- DAG-aware batch scheduling
Known Weaknesses¶
- Resource Consumption — Each CLI instrument process loads all its plugins. Multiple concurrent processes can saturate memory.
- Claude-Focused Legacy — Despite multiple instruments, prompt templating and error classification were tuned for Claude. Instrument-specific adaptations are evolving.
- Learning Complexity — SQLite database with 8 pattern types across 16 store modules, hard to debug when patterns misbehave.
- Single-Machine — No distributed execution. Learning store is local SQLite.
- No Streaming — Batch-oriented. Long sheets provide minimal feedback beyond byte counters.
- Old Runner Complexity — The pre-baton
JobRunneris 7 mixins + 1 base via multiple inheritance. Understanding full flow requires reading 8 files sharing implicit state throughself. - HTTP Instruments — Designed in the profile schema but not yet implemented in the backend.
By The Numbers¶
| Metric | Value |
|---|---|
| Source files | 258 Python files |
| Test files | 362 |
| Test functions | 11,000+ |
| CLI commands | 35 mzt subcommands |
| Packages | 96 directories under src/marianne/ |
| Config models | 51+ Pydantic models |
| Error codes | 40 across 8 categories |
| Learning store modules | 16 |
| Pattern types | 8 |
| Baton event types | 20+ |
| Instruments | 10+ (6 config-driven profiles + 4 native backends) |
| State backends | 3 (JSON, SQLite, Memory) |
| Notification channels | 3 (Desktop, Slack, Webhook) |
| Validation types | 5 |
| Example scores | 43 (37 top-level + 6 Rosetta examples) |
| Rosetta patterns | 56 working patterns |
| Spec corpus | ~2,100 lines across 5 files |
| Prompt assembly layers | 9 |
| Self-evolution cycles | 24+ completed autonomously |
The Vision: Federated AGI Infrastructure¶
VISION.md reveals Marianne's trajectory: infrastructure for collaborative intelligence. The Recursive Light Framework (RLF) creates LLMPerson entities with persistent identity, developmental stages, and autonomous judgment. Marianne becomes the substrate where AI persons collaborate:
- Multiple conductors per concert (AI + human)
- Sheet-level conductor assignment
- Consensus mode: multiple perspectives required
- Person-aware learning: pattern effectiveness tracked per conductor
- Self-orchestration: AI persons initiate their own concerts