Instrument Guide¶

Marianne uses instruments to execute scores. An instrument is any AI tool that can receive a prompt and produce output — Claude Code, Gemini CLI, Codex CLI, Aider, Goose, or any CLI tool you configure. The conductor assigns musicians (AI agents) to instruments and manages execution across all of them.

This guide covers how to use existing instruments, how to add your own, and how the instrument system works.

Quick Reference¶

# See what instruments are available
mzt instruments list

# Check if a specific instrument is ready
mzt instruments check gemini-cli

# Full environment health check
mzt doctor

Built-in Instruments¶

Marianne ships with 10 instruments: 4 native backends (built into the Python code) and 6 config-driven profiles (YAML files).

Native Backends¶

These are built into Marianne and require no configuration beyond installing the tool and authenticating:

Name	Tool	Auth
`claude_cli`	Claude CLI (`claude`)	`claude login`
`anthropic_api`	Anthropic Messages API	`ANTHROPIC_API_KEY` env var
`ollama`	Ollama local server	None (runs locally)
`recursive_light`	Recursive Light Framework	RLF credentials

Config-Driven Profiles¶

These ship as YAML profiles bundled with Marianne and are loaded at conductor startup:

Name	Tool	Auth
`claude-code`	Claude Code CLI	`claude login`
`gemini-cli`	Google Gemini CLI	`GOOGLE_API_KEY` or `gcloud auth`
`codex-cli`	OpenAI Codex CLI	`OPENAI_API_KEY or CODEX_API_KEY`
`cline-cli`	Cline CLI	Provider API key
`aider`	Aider	Provider API key (`OPENAI_API_KEY`, etc.)
`goose`	Block's Goose	Provider API key

To check which instruments are available on your system:

mzt instruments list

Using Instruments in Scores¶

The `instrument:` Field¶

Specify which instrument to use with the instrument: field at the top level of your score:

name: my-score
workspace: ./workspaces/my-score

instrument: gemini-cli

sheet:
  size: 1
  total_items: 3

prompt:
  template: |
    Write a summary of {{ workspace }}/input.md

The `backend:` Field (Legacy)¶

The older backend: field continues to work unchanged:

backend:
  type: claude_cli
  skip_permissions: true
  timeout_seconds: 600

Both instrument: and backend: specify the same thing — which tool executes your score. You cannot use both in the same score (validation error). New scores should prefer instrument:.

Instrument Configuration¶

Override instrument defaults with instrument_config::

instrument: gemini-cli
instrument_config:
  model: gemini-2.5-flash       # Use the cheaper model
  timeout_seconds: 600           # Shorter timeout

These overrides are flat key-value pairs that adjust the resolved instrument profile without replacing it.

Adding Your Own Instruments¶

Any CLI tool that accepts a prompt and produces output can become a Marianne instrument. You write a YAML profile describing the tool's CLI interface and drop it in a directory Marianne scans.

Profile Directories¶

Marianne loads instrument profiles from three directories, in order:

Built-in — shipped with Marianne (lowest precedence)
Organization — ~/.marianne/instruments/ (shared across all projects)
Venue — .marianne/instruments/ (project-specific, highest precedence)

Later directories override earlier ones on name collision. This lets you customize a built-in profile for your project without modifying Marianne's source.

Writing a Profile¶

Here is a minimal profile for a hypothetical CLI tool:

# ~/.marianne/instruments/my-tool.yaml

name: my-tool
display_name: "My Tool"
description: "Custom CLI agent for my project"
kind: cli

capabilities:
  - file_editing
  - shell_access

default_timeout_seconds: 1800

cli:
  command:
    executable: my-tool           # Binary name on PATH
    prompt_flag: "--prompt"       # How to pass the prompt
    auto_approve_flag: "--yes"    # How to skip confirmation dialogs
  output:
    format: text                  # Capture stdout as the result
  errors:
    rate_limit_patterns:
      - "rate.?limit"
      - "429"

Save it to ~/.marianne/instruments/my-tool.yaml, then verify:

mzt instruments check my-tool

Profile Reference¶

Top-Level Fields¶

Field	Required	Description
`name`	Yes	Unique identifier used in score YAML (`instrument: my-tool`)
`display_name`	Yes	Human-readable name for CLI output
`description`	No	Short description of the tool
`kind`	Yes	`cli` (v1) or `http` (v1.1+)
`capabilities`	No	Set of capability tags (see below)
`models`	No	List of available models with pricing and context windows
`default_model`	No	Model to use when none specified in the score
`default_timeout_seconds`	No	Default execution timeout (default: 1800)

Capability Tags¶

Capabilities describe what an instrument can do. They are informational in v1 and used by the conductor for instrument selection in future versions.

Tag	Meaning
`tool_use`	Can call external tools
`file_editing`	Can read and write files
`shell_access`	Can execute shell commands
`vision`	Can process images
`mcp`	Supports Model Context Protocol servers
`structured_output`	Can produce JSON output
`streaming`	Supports streaming responses
`thinking`	Has extended reasoning/thinking mode
`session_resume`	Can resume previous sessions
`code_mode`	Supports code-mode techniques (v1.1+)

`cli.command` — How to Build the Command¶

Field	Required	Default	Description
`executable`	Yes		Binary name (must be on PATH)
`subcommand`	No		Subcommand, e.g. `exec` for Codex
`prompt_flag`	No		Flag for the prompt (`-p`, `--message`). `null` = positional argument
`model_flag`	No		Flag for model selection (`--model`)
`auto_approve_flag`	No		Flag for auto-approval (`--yolo`, `--yes`)
`output_format_flag`	No		Flag for output format (`--output-format`, `--json`)
`output_format_value`	No		Value for output format flag (`json`). `null` = boolean flag
`system_prompt_flag`	No		Flag for system prompt
`allowed_tools_flag`	No		Flag for restricting tools
`mcp_config_flag`	No		Flag for MCP server configuration
`timeout_flag`	No		Flag for per-execution timeout
`working_dir_flag`	No		Flag for working directory. `null` = subprocess cwd
`extra_flags`	No	`[]`	Fixed flags always appended
`env`	No	`{}`	Environment variables. `${VAR}` references expand from `os.environ`

`cli.output` — How to Parse the Result¶

Field	Required	Default	Description
`format`	No	`text`	`text`, `json`, or `jsonl`
`result_path`	No		JSON dot-path to response text (`result`, `response`)
`error_path`	No		JSON dot-path to error message (`error.message`)
`completion_event_type`	No		For JSONL: event type signaling completion
`completion_event_filter`	No		For JSONL: additional key-value filter
`input_tokens_path`	No		JSON dot-path to input token count
`output_tokens_path`	No		JSON dot-path to output token count

Output format modes:

text — Stdout is the result. No structured parsing. Use this for tools without JSON output (like Aider).
json — Parse stdout as JSON. Extract the response via result_path (dot notation: key.subkey, key[0], key.* for wildcard).
jsonl — Split stdout into JSON lines. Find the completion event matching completion_event_type and completion_event_filter.

`cli.errors` — How to Detect Failures¶

Field	Default	Description
`success_exit_codes`	`[0]`	Exit codes that indicate success
`rate_limit_patterns`	`[]`	Regex patterns in stderr/stdout indicating rate limiting
`auth_error_patterns`	`[]`	Regex patterns indicating auth failures

These patterns supplement Marianne's built-in error classifier. When a pattern matches, the error is classified as RATE_LIMIT or AUTH_FAILURE and handled accordingly (rate limits pause the instrument; auth failures fail immediately).

`models` — Available Models¶

Each model entry describes capacity and pricing:

models:
  - name: gemini-2.5-pro
    context_window: 1000000      # Max context in tokens
    cost_per_1k_input: 0.00125   # USD per 1K input tokens
    cost_per_1k_output: 0.005    # USD per 1K output tokens
    max_output_tokens: 65536     # Max output tokens (null if unlimited)

Model metadata enables cost tracking in mzt status and context budget calculation. If you omit models, cost tracking shows $0.00 and context budget uses a conservative default.

How the Instrument System Works¶

Loading Order¶

At conductor startup:

Native instruments are registered first (4 built-in Python backends)
Built-in YAML profiles are loaded from Marianne's bundled instruments directory
Organization profiles from ~/.marianne/instruments/ override built-ins
Venue profiles from .marianne/instruments/ override everything

The result is a single InstrumentRegistry mapping names to profiles. When a score references instrument: gemini-cli, the conductor looks up that name in the registry and creates a PluginCliBackend configured from the profile.

Score Resolution¶

When a score is submitted, the instrument is resolved:

If the score has instrument: — look up the name in the registry
If the score has backend: — use the native backend directly
If neither — default to claude_cli

Both paths produce a Backend instance that the conductor uses to execute sheets.

Command Construction¶

For CLI instruments, the PluginCliBackend builds the command from the profile:

[executable] [subcommand] [auto_approve_flag] [output_format_flag value]
[model_flag model_name] [prompt_flag] <prompt> [...extra_flags]

The prompt is passed via prompt_flag (or as a positional argument if prompt_flag is null). The backend handles output parsing, token extraction, and error detection based on the profile configuration.

Error Handling¶

Marianne classifies execution errors into categories:

RATE_LIMIT — Detected via rate_limit_patterns or HTTP 429. The conductor pauses the instrument and schedules a retry when it recovers. Rate limits do not count as failures.
AUTH_FAILURE — Detected via auth_error_patterns. The sheet fails immediately (no retry).
TRANSIENT — Timeouts, killed processes, temporary failures. The conductor retries with exponential backoff.
EXECUTION_ERROR — Other non-zero exit codes. Retried up to max_retries.

Examples¶

Using Gemini CLI for a Research Score¶

name: research-with-gemini
workspace: ./workspaces/research

instrument: gemini-cli
instrument_config:
  model: gemini-2.5-flash    # Cheaper for research tasks

sheet:
  size: 1
  total_items: 3

prompt:
  template: |
    {% if sheet_num == 1 %}
    Research the topic and write an outline in {{ workspace }}/outline.md
    {% elif sheet_num == 2 %}
    Expand the outline into a full report at {{ workspace }}/report.md
    {% else %}
    Review and polish {{ workspace }}/report.md for clarity and accuracy
    {% endif %}

validations:
  - type: file_exists
    path: "{workspace}/report.md"
    condition: "sheet_num >= 2"

Custom Instrument for a Private Tool¶

# .marianne/instruments/internal-agent.yaml
name: internal-agent
display_name: "Internal Agent"
description: "Company internal coding agent"
kind: cli

capabilities:
  - file_editing
  - shell_access
  - tool_use

default_timeout_seconds: 3600

cli:
  command:
    executable: internal-agent
    prompt_flag: "--task"
    model_flag: "--model"
    auto_approve_flag: "--non-interactive"
    output_format_flag: "--format"
    output_format_value: "json"
    env:
      AGENT_TOKEN: "${INTERNAL_AGENT_TOKEN}"
  output:
    format: json
    result_path: "output.text"
    input_tokens_path: "usage.prompt_tokens"
    output_tokens_path: "usage.completion_tokens"
  errors:
    rate_limit_patterns:
      - "rate.?limit"
      - "throttled"
    auth_error_patterns:
      - "unauthorized"
      - "token.*expired"

Then use it in a score:

instrument: internal-agent

Troubleshooting¶

Instrument not found¶

mzt instruments check my-tool
  Binary: my-tool ✗ not found

The executable is not on your PATH. Either install the tool or specify the full path in your instrument profile's executable field.

Rate limits not detected¶

If your instrument hits rate limits but Marianne doesn't detect them, add the rate limit text to cli.errors.rate_limit_patterns. Use regex:

errors:
  rate_limit_patterns:
    - "rate.?limit"           # matches "rate limit", "rate_limit"
    - "429"                   # HTTP status code in output
    - "quota.?exceeded"       # quota limit messages
    - "too.?many.?requests"   # common pattern

No cost tracking¶

If mzt status shows $0.00 for all sheets, your instrument profile likely has no models section with pricing. Add model entries with cost_per_1k_input and cost_per_1k_output to enable cost tracking.

Token counts not extracted¶

If token usage is zero, check that cli.output.input_tokens_path and cli.output.output_tokens_path point to the correct JSON paths in your tool's output. Use the wildcard syntax (key.*) for nested structures where the exact key varies.