Skip to content

execution

execution

Execution and retry configuration models.

Defines models for retry behavior, rate limiting, circuit breaker, cost limits, and parallel execution.

Classes

RetryConfig

Bases: BaseModel

Configuration for retry behavior including partial completion recovery.

RateLimitConfig

Bases: BaseModel

Configuration for rate limit detection and handling.

CircuitBreakerConfig

Bases: BaseModel

Configuration for the circuit breaker pattern.

The circuit breaker prevents cascading failures by temporarily blocking requests after repeated failures. This gives the backend time to recover before retrying.

State transitions: - CLOSED (normal): Requests flow through, failures are tracked - OPEN (blocking): Requests are blocked after failure_threshold exceeded - HALF_OPEN (testing): Single request allowed to test recovery

Evolution #8: Cross-Workspace Circuit Breaker adds coordination between parallel Marianne jobs via the global learning store. When one job hits a rate limit, other jobs will honor that limit and wait.

Example

circuit_breaker: enabled: true failure_threshold: 5 recovery_timeout_seconds: 300 cross_workspace_coordination: true honor_other_jobs_rate_limits: true

CostLimitConfig

Bases: BaseModel

Configuration for cost tracking and limits.

Prevents runaway costs by tracking token usage and optionally enforcing cost limits per sheet or per job. Cost is estimated from token counts using configurable rates.

When cost limits are exceeded: - The current sheet is marked as failed with reason "cost_limit" - For per-job limits, the job is paused to prevent further execution - All cost data is recorded in checkpoint state for analysis

Example

cost_limits: enabled: true max_cost_per_sheet: 5.00 max_cost_per_job: 100.00 cost_per_1k_input_tokens: 0.003 cost_per_1k_output_tokens: 0.015

Default rates are for Claude Sonnet. For Opus, use:

cost_per_1k_input_tokens: 0.015 cost_per_1k_output_tokens: 0.075

StaleDetectionConfig

Bases: BaseModel

Configuration for detecting stale (hung) sheet executions.

When enabled, monitors execution activity and fails sheets that produce no output for longer than idle_timeout_seconds. This catches hung processes that the per-sheet timeout alone may not detect quickly enough (e.g., a 30-minute timeout sheet that hangs after 2 minutes of output).

Example

stale_detection: enabled: true idle_timeout_seconds: 300 check_interval_seconds: 30

Note: The idle timeout should be generous enough to accommodate legitimate pauses (e.g., waiting for API responses). A minimum of 120 seconds is recommended for LLM-based workloads.

PreflightConfig

Bases: BaseModel

Configuration for pre-flight prompt analysis before sheet execution.

Controls token count thresholds for warning and error during preflight checks. Different instruments have different context windows — a 150K threshold that's correct for a 200K-context model is wrong for a 1M-context model. Set thresholds appropriate for the instruments in use.

This config lives at the daemon level because the conductor manages all execution and knows which instruments are available. Individual scores inherit the daemon's thresholds unless overridden.

Example (daemon config for large-context instruments): preflight: token_warning_threshold: 200000 token_error_threshold: 800000

Example (disable the error threshold entirely): preflight: token_error_threshold: 0

ParallelConfig

Bases: BaseModel

Configuration for parallel sheet execution (v17 evolution).

Enables running multiple sheets concurrently when the dependency DAG permits. Requires sheet dependencies to be configured for meaningful parallel execution.

Example YAML

parallel: enabled: true max_concurrent: 3 fail_fast: true

sheet: dependencies: 2: [1] 3: [1] 4: [2, 3]

With this config, sheets 2 and 3 can run in parallel after sheet 1 completes, then sheet 4 runs after both 2 and 3 complete.

ValidationRule

Bases: BaseModel

A single validation rule for checking sheet outputs.

Supports staged execution via the stage field. Validations are run in stage order (1, 2, 3...). If any validation in a stage fails, higher stages are skipped (fail-fast behavior).

Typical stage layout: - Stage 1: Syntax & compilation (cargo check, cargo fmt --check) - Stage 2: Testing (cargo test, pytest) - Stage 3: Code quality (clippy -D warnings, ruff check) - Stage 4: Security (cargo audit, npm audit)

SkipWhenCommand

Bases: BaseModel

A command-based conditional skip rule for sheet execution.

When the command exits 0, the sheet is SKIPPED. When the command exits non-zero, the sheet RUNS. On timeout or error, the sheet RUNS (fail-open for safety).

The command field supports {workspace} template expansion, following the same pattern as validation commands.