Index
baton
¶
The Baton — Marianne's event-driven execution heart.
The baton is the conductor's primary tool. It doesn't decide what to play — the score does. It doesn't decide how to play — the musicians do. The baton controls when and how much: tempo, dynamics, cues, and fermatas.
The baton replaces the current monolithic execution model where
JobService.start_job() runs all sheets sequentially. Instead, the baton
manages sheets across all jobs in a single event-driven loop, dispatching
them to execution when they're ready and the system can handle them.
Package layout::
events.py — All BatonEvent types (dataclasses)
timer.py — Timer wheel (priority queue of future events)
state.py — Baton state models (sheet/instrument/job tracking)
core.py — Event inbox, main loop, sheet registry
musician.py — Single-attempt sheet execution (play once, report)
backend_pool.py — Per-instrument backend instance management
adapter.py — Wires baton into conductor (step 28)
Classes¶
BatonAdapter
¶
BatonAdapter(*, event_bus=None, max_concurrent_sheets=10, state_sync_callback=None, persist_callback=None)
Bridges the conductor (JobManager) and the baton (BatonCore).
The adapter owns: - A BatonCore instance (event loop, sheet registry, state machine) - A mapping of job_id → Sheet[] (for prompt rendering at dispatch time) - Active musician tasks (asyncio.Task per dispatched sheet)
The adapter does NOT own: - The BackendPool (injected by the manager) - The EventBus (injected by the manager) - The CheckpointState (managed by the manager's state backend)
Usage::
adapter = BatonAdapter(event_bus=bus)
adapter.set_backend_pool(pool)
# Register a job
adapter.register_job("j1", sheets, deps)
# Run the baton (blocks until shutdown)
await adapter.run()
Initialize the BatonAdapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_bus
|
EventBus | None
|
Optional EventBus for publishing events to subscribers. |
None
|
max_concurrent_sheets
|
int
|
Global concurrency ceiling for dispatch. |
10
|
state_sync_callback
|
StateSyncCallback | None
|
Deprecated — kept for backward compat. Phase 2 uses persist_callback instead. |
None
|
persist_callback
|
PersistCallback | None
|
Called with job_id after significant state transitions (terminal, dispatch) to persist CheckpointState to the registry. Replaces the sync layer. |
None
|
Source code in src/marianne/daemon/baton/adapter.py
Attributes¶
Functions¶
set_backend_pool
¶
Inject the BackendPool for backend acquisition.
Must be called before dispatching any sheets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pool
|
BackendPool
|
The backend pool from the manager. |
required |
Source code in src/marianne/daemon/baton/adapter.py
register_job
¶
register_job(job_id, sheets, dependencies, *, max_cost_usd=None, max_retries=3, max_completion=5, escalation_enabled=False, self_healing_enabled=False, prompt_config=None, parallel_enabled=False, cross_sheet=None, pacing_seconds=0.0, live_sheets=None)
Register a job with the baton for event-driven execution.
Converts Sheet entities to SheetExecutionState and registers them with the baton's sheet registry.
Phase 2: when live_sheets is provided, uses those SheetState
objects directly instead of creating new ones. This ensures the
baton writes to the same objects that live in _live_states,
eliminating the need for a sync layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
Unique job identifier (conductor job_id). |
required |
sheets
|
list[Sheet]
|
Sheet entities from build_sheets(). |
required |
dependencies
|
dict[int, list[int]]
|
Dependency graph {sheet_num: [dep_nums]}. |
required |
max_cost_usd
|
float | None
|
Optional per-job cost limit. |
None
|
max_retries
|
int
|
Max normal retry attempts per sheet. |
3
|
max_completion
|
int
|
Max completion mode attempts per sheet. |
5
|
escalation_enabled
|
bool
|
Enter fermata on exhaustion. |
False
|
self_healing_enabled
|
bool
|
Try self-healing on exhaustion. |
False
|
prompt_config
|
PromptConfig | None
|
Optional PromptConfig for full prompt rendering. When provided, creates a PromptRenderer for this job that handles the complete 9-layer prompt assembly pipeline. |
None
|
parallel_enabled
|
bool
|
Whether parallel execution is enabled (for preamble concurrency warning). |
False
|
cross_sheet
|
CrossSheetConfig | None
|
Optional CrossSheetConfig for cross-sheet context (F-210). When provided, the adapter collects previous sheet outputs and workspace files at dispatch time. |
None
|
Source code in src/marianne/daemon/baton/adapter.py
431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 | |
deregister_job
¶
Remove a job from the adapter and baton.
Cleans up all per-job state including active tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job to remove. |
required |
Source code in src/marianne/daemon/baton/adapter.py
recover_job
¶
recover_job(job_id, sheets, dependencies, checkpoint, *, max_cost_usd=None, max_retries=3, max_completion=5, escalation_enabled=False, self_healing_enabled=False, prompt_config=None, parallel_enabled=False, cross_sheet=None, pacing_seconds=0.0, live_sheets=None)
Recover a job from a checkpoint after conductor restart.
Rebuilds baton state from the persisted CheckpointState. Terminal sheets (completed, failed, skipped) keep their status. In-progress sheets are reset to PENDING because their musicians died when the conductor restarted. Attempt counts are preserved to avoid infinite retries.
Design invariant: Checkpoint is the source of truth. The baton rebuilds from checkpoint, not the reverse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
Unique job identifier. |
required |
sheets
|
list[Sheet]
|
Sheet entities from build_sheets() — same config that produced the original job. |
required |
dependencies
|
dict[int, list[int]]
|
Dependency graph {sheet_num: [dep_nums]}. |
required |
checkpoint
|
CheckpointState
|
Persisted CheckpointState loaded from workspace. |
required |
max_cost_usd
|
float | None
|
Optional per-job cost limit. |
None
|
max_retries
|
int
|
Max normal retry attempts per sheet. |
3
|
max_completion
|
int
|
Max completion mode attempts per sheet. |
5
|
escalation_enabled
|
bool
|
Enter fermata on exhaustion. |
False
|
self_healing_enabled
|
bool
|
Try self-healing on exhaustion. |
False
|
prompt_config
|
PromptConfig | None
|
Optional PromptConfig for prompt rendering. |
None
|
parallel_enabled
|
bool
|
Whether parallel execution is enabled. |
False
|
cross_sheet
|
CrossSheetConfig | None
|
Optional CrossSheetConfig for cross-sheet context (F-210). |
None
|
Source code in src/marianne/daemon/baton/adapter.py
593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 | |
get_sheet
¶
Get a Sheet entity for a registered job.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job identifier. |
required |
sheet_num
|
int
|
The sheet number. |
required |
Returns:
| Type | Description |
|---|---|
Sheet | None
|
The Sheet entity, or None if not found. |
Source code in src/marianne/daemon/baton/adapter.py
wait_for_completion
async
¶
Wait until a job reaches terminal state.
Blocks until all sheets in the job are completed, failed, skipped, or cancelled. Used by the manager's _run_job_task to await baton execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job to wait for. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if all sheets completed successfully, False if any failed. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the job is not registered. |
Source code in src/marianne/daemon/baton/adapter.py
has_completed_sheets
¶
Check if any sheet in the job reached COMPLETED status.
F-145: Used by the manager to set completed_new_work after baton execution. The zero-work guard for concert chaining needs to know whether any sheet completed new work — not just whether all sheets succeeded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if at least one sheet completed, False otherwise. |
Source code in src/marianne/daemon/baton/adapter.py
clear_instrument_rate_limit
¶
Clear instrument rate limit state in the baton core.
Delegates to BatonCore.clear_instrument_rate_limit(). Also
moves WAITING sheets back to PENDING so they can be re-dispatched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instrument
|
str | None
|
Instrument name to clear, or |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Number of instruments whose rate limit was cleared. |
Source code in src/marianne/daemon/baton/adapter.py
publish_sheet_skipped
async
¶
Publish a sheet skip event to the EventBus.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event
|
SheetSkipped
|
The sheet skip event. |
required |
Source code in src/marianne/daemon/baton/adapter.py
publish_job_event
async
¶
Publish a job-level event to the EventBus.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job identifier. |
required |
event_name
|
str
|
Event name (e.g., "job.started", "job.completed"). |
required |
data
|
dict[str, Any] | None
|
Optional event data. |
None
|
Source code in src/marianne/daemon/baton/adapter.py
run
async
¶
Run the baton's event loop with dispatch integration.
Processes events from the inbox, updates state, dispatches ready sheets, and publishes events to the EventBus.
Runs until the baton receives a ShutdownRequested event.
Source code in src/marianne/daemon/baton/adapter.py
1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 | |
shutdown
async
¶
Gracefully shut down the adapter.
Cancels all active musician tasks and closes the backend pool.
Source code in src/marianne/daemon/baton/adapter.py
BackendPool
¶
Manages Backend instances for per-sheet execution.
The baton acquires a backend before dispatching a sheet and releases it after the sheet completes (or fails). The pool enforces per-instrument concurrency by tracking in-flight instances.
Usage::
pool = BackendPool(registry)
# Dispatch a sheet
backend = await pool.acquire("claude-code", working_directory=ws)
try:
result = await backend.execute(prompt)
finally:
await pool.release("claude-code", backend)
# Job done
await pool.close_all()
Source code in src/marianne/daemon/baton/backend_pool.py
Functions¶
acquire
async
¶
Acquire a Backend instance for an instrument.
For CLI instruments: returns a free instance if available, otherwise creates a new one. For HTTP instruments: returns the shared singleton (creating it on first call).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instrument_name
|
str
|
Name of the instrument (from registry). |
required |
model
|
str | None
|
Optional model override for this execution. |
None
|
working_directory
|
Path | None
|
Working directory for the backend. |
None
|
Returns:
| Type | Description |
|---|---|
Backend
|
A Backend instance ready for execution. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the instrument is not registered. |
RuntimeError
|
If the pool has been closed. |
Source code in src/marianne/daemon/baton/backend_pool.py
release
async
¶
Release a Backend instance back to the pool.
For CLI instruments: the backend goes back to the free list for reuse. For HTTP instruments: no-op (the singleton stays active).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instrument_name
|
str
|
The instrument name used in |
required |
backend
|
Backend
|
The Backend instance to release. |
required |
Source code in src/marianne/daemon/baton/backend_pool.py
in_flight_count
¶
How many backends are currently acquired for this instrument.
Used by the baton's dispatch logic to enforce per-instrument concurrency limits.
Source code in src/marianne/daemon/baton/backend_pool.py
total_in_flight
¶
close_all
async
¶
Close all Backend instances and mark the pool as closed.
Called at job completion, cancellation, or conductor shutdown.
After this call, acquire() raises RuntimeError.
Source code in src/marianne/daemon/baton/backend_pool.py
BatonCore
¶
The baton's event-driven execution core.
Manages the event inbox, processes events, tracks sheet state across all jobs, resolves ready sheets, and coordinates dispatch.
The baton does NOT own backend execution — it decides WHEN to dispatch, not HOW. Sheet execution is delegated to the musician (via dispatch callbacks registered by the conductor).
Usage::
baton = BatonCore()
baton.register_job("j1", sheets, deps)
# Run the main loop (blocks until shutdown)
await baton.run()
# Or process events manually (for testing)
await baton.handle_event(some_event)
Initialize the baton core.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timer
|
Any | None
|
Optional TimerWheel for scheduling retry delays. When None, retries are set to RETRY_SCHEDULED without actual timer events (tests or manual event injection). |
None
|
inbox
|
Queue[BatonEvent] | None
|
Optional pre-created event queue. When provided, allows the caller to share the queue with other components (e.g., TimerWheel) before BatonCore is constructed. When None, a new queue is created. |
None
|
Source code in src/marianne/daemon/baton/core.py
Attributes¶
running_sheet_count
property
¶
Number of sheets currently in 'dispatched' status.
Functions¶
drain_fallback_events
¶
Return and clear collected InstrumentFallback events.
Called by the adapter after each event cycle to publish fallback events to the EventBus for observability.
Source code in src/marianne/daemon/baton/core.py
register_instrument
¶
Register an instrument for tracking.
If already registered, returns the existing state (idempotent).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Instrument name (matches InstrumentProfile.name). |
required |
max_concurrent
|
int
|
Maximum concurrent sheets on this instrument. |
4
|
Returns:
| Type | Description |
|---|---|
InstrumentState
|
The InstrumentState for the instrument. |
Source code in src/marianne/daemon/baton/core.py
set_model_concurrency
¶
Set per-model concurrency limit from instrument profile data.
Called during adapter initialization from loaded InstrumentProfiles.
Source code in src/marianne/daemon/baton/core.py
get_instrument_state
¶
build_dispatch_config
¶
Build a DispatchConfig from the current instrument state.
This bridges the gap between the baton's instrument tracking and the dispatch logic's configuration needs. Called before each dispatch cycle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_concurrent_sheets
|
int
|
Global concurrency ceiling. |
10
|
Returns:
| Type | Description |
|---|---|
DispatchConfig
|
DispatchConfig with rate-limited instruments, open circuit |
DispatchConfig
|
breakers, and per-instrument concurrency limits derived |
DispatchConfig
|
from the current InstrumentState. |
Source code in src/marianne/daemon/baton/core.py
set_job_cost_limit
¶
Set a per-job cost limit. The baton pauses the job when exceeded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job to set the limit for. |
required |
max_cost_usd
|
float
|
Maximum total cost in USD. |
required |
Source code in src/marianne/daemon/baton/core.py
get_rate_limited_instruments
¶
Get the set of currently rate-limited instrument names.
Used by dispatch logic to skip rate-limited instruments.
Source code in src/marianne/daemon/baton/core.py
clear_instrument_rate_limit
¶
Clear rate limit state on one or all instruments.
Resets rate_limited to False and rate_limit_expires_at
to None. Also moves any WAITING sheets on the cleared
instrument(s) back to PENDING so they can be re-dispatched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
instrument
|
str | None
|
Instrument name to clear, or |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Number of instruments whose rate limit was cleared. |
Source code in src/marianne/daemon/baton/core.py
get_open_circuit_breakers
¶
Get the set of instruments with open circuit breakers.
Used by dispatch logic to skip unhealthy instruments.
Source code in src/marianne/daemon/baton/core.py
set_sheet_cost_limit
¶
Set a per-sheet cost limit. The baton fails the sheet when exceeded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job containing the sheet. |
required |
sheet_num
|
int
|
The sheet number. |
required |
max_cost_usd
|
float
|
Maximum cost in USD for this sheet. |
required |
Source code in src/marianne/daemon/baton/core.py
calculate_retry_delay
¶
Calculate retry delay using exponential backoff.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attempt
|
int
|
0-based attempt index (0 = first retry). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Delay in seconds, clamped to |
Source code in src/marianne/daemon/baton/core.py
register_job
¶
register_job(job_id, sheets, dependencies, *, escalation_enabled=False, self_healing_enabled=False, pacing_seconds=0.0)
Register a job's sheets with the baton for scheduling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
Unique job identifier. |
required |
sheets
|
dict[int, SheetExecutionState]
|
Map of sheet_num → SheetExecutionState. |
required |
dependencies
|
dict[int, list[int]]
|
Map of sheet_num → list of dependency sheet_nums. Sheets not in this map have no dependencies. |
required |
escalation_enabled
|
bool
|
Whether to enter fermata on exhaustion. |
False
|
self_healing_enabled
|
bool
|
Whether to try healing on exhaustion. |
False
|
pacing_seconds
|
float
|
Inter-sheet delay after each completion (from
|
0.0
|
Source code in src/marianne/daemon/baton/core.py
deregister_job
¶
Remove a job from the baton's tracking.
Cleans up all per-job state including cost limit entries to prevent memory leaks in long-running conductors (F-062).
Source code in src/marianne/daemon/baton/core.py
get_sheet_state
¶
Get the scheduling state for a specific sheet.
Source code in src/marianne/daemon/baton/core.py
is_job_paused
¶
is_job_complete
¶
Check if all sheets in a job are in terminal state.
Source code in src/marianne/daemon/baton/core.py
get_ready_sheets
¶
Find sheets that are ready to dispatch.
A sheet is ready when: 1. Status is 'pending' or 'ready' 2. All dependencies are satisfied (completed or skipped) 3. The job is not paused
Source code in src/marianne/daemon/baton/core.py
handle_event
async
¶
Process a single event. Updates state but does NOT dispatch.
This is the core decision-making method. Each event type has a specific handler that updates sheet state.
Per the baton spec: handler exceptions are logged, not re-raised. The baton continues processing subsequent events.
Source code in src/marianne/daemon/baton/core.py
972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 | |
run
async
¶
The baton's main event loop.
Processes events from the inbox, updates state, and dispatches ready sheets. Runs until a ShutdownRequested event is received.
Source code in src/marianne/daemon/baton/core.py
get_diagnostics
¶
Get diagnostic information for a job.
Returns a dict with sheet counts by status, instrument state, and other debugging information. Returns None if job not found.
Source code in src/marianne/daemon/baton/core.py
CancelJob
dataclass
¶
Cancel all sheets for a job and deregister it from the baton.
In-flight sheet tasks are cancelled. The job is marked as cancelled in CheckpointState.
CircuitBreakerRecovery
dataclass
¶
Timer fired — check if a circuit-broken instrument can accept a probe.
When a circuit breaker trips OPEN, the baton schedules this event via the timer wheel. On fire, the instrument transitions from OPEN to HALF_OPEN, allowing one probe request through. The next dispatch cycle picks up any PENDING sheets blocked by the dead-ended fallback chain.
If the probe succeeds, the breaker closes. If it fails, the breaker reopens and a new recovery timer is scheduled with increased backoff.
GH#169: Without this, sheets whose entire fallback chain is circuit-broken stay PENDING forever — the score appears RUNNING but is dead.
ConfigReloaded
dataclass
¶
Config has changed for a job (SIGHUP, mzt modify, resume -c).
The baton rebuilds pending sheets from the new config. Completed sheets are preserved. Cost limits may be reset if they changed.
CronTick
dataclass
¶
Timer fired — a cron-scheduled job should be submitted.
The baton submits the configured score as a new job and schedules the next tick. If a previous run is still active, this tick is skipped.
DispatchRetry
dataclass
¶
Internal signal to retry dispatch after a backpressure delay.
When the baton encounters backpressure during dispatch, it schedules this event via the timer wheel rather than blocking.
EscalationNeeded
dataclass
¶
A sheet execution requires composer judgment — enter fermata.
The baton pauses the job's dispatch and notifies the composer (human or AI) via configured channels. A timeout timer is scheduled.
EscalationResolved
dataclass
¶
The composer has made a decision on a fermata.
The baton applies the decision and resumes dispatching for the job.
Arrives via IPC (job.resolve_escalation method).
EscalationTimeout
dataclass
¶
Timer fired — no escalation response received within the deadline.
The baton defaults to the safe action: fail the sheet (not the job) and resume dispatching for other sheets.
JobTimeout
dataclass
¶
Timer fired — a job has exceeded its wall-clock time limit.
The baton cancels all remaining sheets for this job.
PacingComplete
dataclass
¶
Timer fired — the inter-sheet pacing delay for a job has elapsed.
The baton clears the pacing flag, allowing the next sheet to dispatch.
Implements pause_between_sheets_seconds from score config.
PauseJob
dataclass
¶
Pause dispatching for a job. In-flight sheets continue to completion.
No new sheets are dispatched until ResumeJob is received. Retry timers are preserved — when resumed, scheduled retries fire normally.
ProcessExited
dataclass
¶
Observer detected that a backend process died unexpectedly.
The baton checks if this was a sheet's backend process and, if so, marks the sheet as crashed — faster than waiting for timeout.
RateLimitExpired
dataclass
¶
Timer fired — check if the rate-limited instrument is available again.
If the instrument is still unavailable, the baton schedules another timer. If available, sheets waiting on this instrument become ready.
RateLimitHit
dataclass
¶
An instrument hit a rate limit. NOT a failure — a tempo change.
The baton marks the instrument as rate-limited and schedules a timer for recovery. Sheets targeting this instrument move to waiting state. Other instruments are completely unaffected.
ResourceAnomaly
dataclass
¶
Observer/monitor detected a resource pressure event.
Critical severity triggers backpressure — the baton stops dispatching new sheets and lets running sheets drain.
ResumeJob
dataclass
¶
Resume dispatching for a paused job, optionally with new config.
When new_config is provided, pending sheets are rebuilt from the
new config. Completed sheets are preserved. Failed sheets being
retried use the new config for the retry.
RetryDue
dataclass
¶
Timer fired — a previously failed sheet is ready for retry.
The baton moves the sheet from retry-scheduled to ready state. The next dispatch cycle will pick it up.
SheetAttemptResult
dataclass
¶
SheetAttemptResult(job_id, sheet_num, instrument_name, attempt, execution_success=True, exit_code=None, duration_seconds=0.0, validations_passed=0, validations_total=0, validation_pass_rate=0.0, validation_details=None, error_classification=None, error_message=None, rate_limited=False, rate_limit_wait_seconds=None, cost_usd=0.0, input_tokens=0, output_tokens=0, model_used=None, stdout_tail='', stderr_tail='', timestamp=time())
A musician reports the result of a single sheet attempt.
This is the central event in the baton's event loop. The musician plays once and reports in full detail. The conductor (baton) decides what happens next — retry, completion mode, healing, escalation, or accept.
Rate limits are NOT failures. When rate_limited is True, the baton
re-queues the sheet for when the instrument recovers. No retry budget
is consumed.
Attributes¶
validation_pass_rate
class-attribute
instance-attribute
¶
Percentage of validations that passed (0.0-100.0).
CRITICAL CONTRACT (F-018): Set to 100.0 when execution succeeds with no validation rules OR when all validations pass. The baton treats the default (0.0) as "all validations failed" and will retry.
A musician that reports execution_success=True with
validations_total=0 but leaves this at 0.0 will trigger
unnecessary retries until max_retries is exhausted.
rate_limit_wait_seconds
class-attribute
instance-attribute
¶
Parsed wait duration from the API's rate limit error message.
When set, the baton uses this instead of the default 60s for scheduling the recovery timer. This is the actual duration the API told us to wait.
SheetSkipped
dataclass
¶
A sheet was skipped due to skip_when condition or start_sheet override.
The baton propagates skip state to dependents — downstream sheets that depend on a skipped sheet receive a skip sentinel, not empty string.
ShutdownRequested
dataclass
¶
The conductor is shutting down (SIGTERM, mzt stop).
When graceful is True, the baton waits for in-flight sheets
to complete (up to the configured drain timeout) before stopping.
When False, sheets are cancelled immediately.
StaleCheck
dataclass
¶
Timer fired — check if a running sheet has gone stale.
If no output progress has been received within the configured idle timeout, the baton kills the stale sheet and reschedules or fails it.
PromptRenderer
¶
Renders full prompts for baton musicians.
Bridges the PromptBuilder pipeline with the baton's Sheet-based execution model. Handles template rendering, injection resolution, preamble assembly, and validation requirements — everything the old runner's context mixin did, but without the runner dependency.
Create one per job (or share across jobs with the same config).
Call render() per sheet dispatch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt_config
|
PromptConfig
|
The job's PromptConfig (template, variables, etc.). |
required |
total_sheets
|
int
|
Total sheets in the job (for preamble). |
required |
total_stages
|
int
|
Total stages before fan-out expansion (for template aliases). |
required |
parallel_enabled
|
bool
|
Whether parallel execution is enabled (for preamble). |
required |
Source code in src/marianne/daemon/baton/prompt.py
Functions¶
render
¶
Render a full prompt for a sheet execution.
Performs the complete 9-layer prompt assembly: template rendering -> injection resolution -> optional layers -> validation requirements -> completion suffix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sheet
|
Sheet
|
The Sheet entity to render for. |
required |
attempt_context
|
AttemptContext
|
Context from the baton (attempt number, mode). |
required |
patterns
|
list[str] | None
|
Optional learned pattern descriptions to inject. |
None
|
failure_history
|
list[HistoricalFailure] | None
|
Optional historical failures from previous sheets. |
None
|
spec_fragments
|
list[SpecFragment] | None
|
Optional spec corpus fragments to inject. |
None
|
Returns:
| Type | Description |
|---|---|
RenderedPrompt
|
RenderedPrompt with fully rendered prompt and preamble. |
Source code in src/marianne/daemon/baton/prompt.py
RenderedPrompt
dataclass
¶
Output of the prompt rendering pipeline.
Contains both the rendered prompt (for backend execution) and the preamble (set on the backend before execution). Separating them allows the musician to configure the backend correctly.
AttemptContext
dataclass
¶
AttemptContext(attempt_number, mode, completion_prompt_suffix=None, healing_context=None, previous_results=None, learned_patterns=None, total_sheets=1, total_movements=1, previous_outputs=dict(), previous_files=dict())
Context provided by the conductor to the musician for a single attempt.
Each dispatch carries this context so the musician knows: - Which attempt this is (1 = first try, 2+ = retry) - What mode to operate in (normal/completion/healing) - Any extra context for non-normal modes - Learned patterns from the learning store (instrument-scoped) - Previous attempt results (for failure history injection)
Attributes¶
attempt_number
instance-attribute
¶
1-based attempt number. First try = 1, first retry = 2, etc.
completion_prompt_suffix
class-attribute
instance-attribute
¶
For completion mode: appended to the prompt to fix partial failures.
healing_context
class-attribute
instance-attribute
¶
For healing mode: diagnostic context from self-healing analysis.
previous_results
class-attribute
instance-attribute
¶
Previous attempt results for failure history injection into prompts.
learned_patterns
class-attribute
instance-attribute
¶
Patterns from the learning store, scoped to this instrument.
total_sheets
class-attribute
instance-attribute
¶
Total concrete sheet count in the job (for preamble and template vars).
total_movements
class-attribute
instance-attribute
¶
Total movement count in the job (for template vars).
previous_outputs
class-attribute
instance-attribute
¶
Stdout outputs from completed sheets. Keys are sheet numbers (1-indexed). Populated by the adapter from completed sheet attempt results.
previous_files
class-attribute
instance-attribute
¶
File contents captured via capture_files patterns. Keys are file paths. Populated by the adapter from workspace files matching CrossSheetConfig patterns.
AttemptMode
¶
Bases: str, Enum
The mode a sheet attempt runs in.
NORMAL— standard execution (first try or retry)COMPLETION— partial validation passed, trying to completeHEALING— self-healing after retry exhaustion
BatonJobState
dataclass
¶
The baton's per-job tracking during a performance.
Contains all sheet states for a job, plus job-level flags (paused, pacing, cost tracking).
Attributes¶
paused
class-attribute
instance-attribute
¶
Whether dispatching is paused for this job.
pacing_active
class-attribute
instance-attribute
¶
Whether inter-sheet pacing delay is currently active.
sheets
class-attribute
instance-attribute
¶
Map of sheet_num → SheetExecutionState.
terminal_count
property
¶
Number of sheets in a terminal status (completed, failed, skipped).
Functions¶
register_sheet
¶
get_sheet
¶
CircuitBreakerState
¶
Bases: str, Enum
Three-state circuit breaker for per-instrument health tracking.
CLOSED— healthy, accepting requestsOPEN— unhealthy, rejecting requests, waiting for recovery timerHALF_OPEN— probing, allowing one request to test recovery
InstrumentState
dataclass
¶
InstrumentState(name, max_concurrent, running_count=0, rate_limited=False, rate_limit_expires_at=None, circuit_breaker=CLOSED, consecutive_failures=0, circuit_breaker_threshold=5, circuit_breaker_recovery_at=None)
Per-instrument state tracking for rate limits, circuit breakers, and concurrency.
The baton tracks each instrument's health independently. Rate limits on claude-code don't affect gemini-cli. Circuit breaker thresholds are per-instrument. Concurrency limits come from the InstrumentProfile.
Attributes¶
max_concurrent
instance-attribute
¶
Maximum concurrent sheets on this instrument (from InstrumentProfile).
running_count
class-attribute
instance-attribute
¶
Number of currently running sheets on this instrument.
rate_limited
class-attribute
instance-attribute
¶
Whether this instrument is currently rate-limited.
rate_limit_expires_at
class-attribute
instance-attribute
¶
Monotonic time when the rate limit is expected to clear.
circuit_breaker
class-attribute
instance-attribute
¶
Current circuit breaker state.
consecutive_failures
class-attribute
instance-attribute
¶
Consecutive failures across all jobs for this instrument.
circuit_breaker_threshold
class-attribute
instance-attribute
¶
Number of consecutive failures to trip the circuit breaker.
circuit_breaker_recovery_at
class-attribute
instance-attribute
¶
Monotonic time for circuit breaker recovery check.
is_available
property
¶
Whether this instrument can accept new sheets.
Available when not rate-limited and circuit breaker is not open. Half-open allows one probe request through.
Functions¶
record_success
¶
Record a successful execution on this instrument.
Resets consecutive failures. If circuit breaker is half-open, closes it (the probe succeeded).
Source code in src/marianne/daemon/baton/state.py
record_failure
¶
Record a failed execution on this instrument.
Increments consecutive failures. If threshold reached, opens the circuit breaker. If already half-open, reopens it.
Source code in src/marianne/daemon/baton/state.py
to_dict
¶
Serialize to a dict for SQLite persistence.
Source code in src/marianne/daemon/baton/state.py
from_dict
classmethod
¶
Restore from a serialized dict.
Source code in src/marianne/daemon/baton/state.py
TimerHandle
dataclass
¶
Opaque handle returned by schedule(). Used for cancellation.
Attributes:
| Name | Type | Description |
|---|---|---|
fire_at |
float
|
Monotonic time when the event should fire. |
event |
BatonEvent
|
The BatonEvent to deliver to the inbox. |
TimerWheel
¶
Priority queue of future events with a background drain task.
Usage::
inbox = asyncio.Queue()
wheel = TimerWheel(inbox)
# Schedule a retry in 30 seconds
handle = wheel.schedule(30.0, RetryDue(job_id="j1", sheet_num=5))
# Cancel if no longer needed
wheel.cancel(handle)
# Run the drain task (usually as asyncio.create_task(wheel.run()))
await wheel.run() # blocks forever, fires events into inbox
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inbox
|
Queue[Any]
|
The asyncio.Queue that receives fired events. This is the baton's event inbox — the same queue that receives musician results, external commands, etc. |
required |
Source code in src/marianne/daemon/baton/timer.py
Attributes¶
Functions¶
schedule
¶
Schedule an event to fire after delay_seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
delay_seconds
|
float
|
Seconds from now. Clamped to >= 0. |
required |
event
|
BatonEvent
|
The BatonEvent to deliver when the timer fires. |
required |
Returns:
| Type | Description |
|---|---|
TimerHandle
|
A TimerHandle that can be passed to |
Source code in src/marianne/daemon/baton/timer.py
cancel
¶
Cancel a scheduled timer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
handle
|
TimerHandle
|
The handle returned by |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the timer was pending and is now cancelled. |
bool
|
False if it was already cancelled or already fired. |
Source code in src/marianne/daemon/baton/timer.py
snapshot
¶
Export pending (non-cancelled) timers for persistence.
Returns a list of (fire_at, event) tuples — the data needed to reconstruct the timer wheel after a restart.
Source code in src/marianne/daemon/baton/timer.py
run
async
¶
Background drain task — fire timers into the inbox.
This coroutine runs indefinitely. Cancel the task to stop it. It sleeps until the next timer is due (or a wake signal arrives), fires all due timers, and repeats.
Source code in src/marianne/daemon/baton/timer.py
shutdown
async
¶
Fire all pending timers immediately and stop.
Used during graceful shutdown — ensures no events are lost. Events are placed into the inbox in fire_at order.
Source code in src/marianne/daemon/baton/timer.py
Functions¶
sheet_task
async
¶
sheet_task(*, job_id, sheet, backend, attempt_context, inbox, total_sheets=1, total_movements=1, rendered_prompt=None, preamble=None, cost_per_1k_input=None, cost_per_1k_output=None, instrument_override=None)
Execute a single sheet attempt and report the result.
This is the musician's entire job. Play once, report in full detail. The conductor (baton) decides what happens next.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
The job this sheet belongs to. |
required |
sheet
|
Sheet
|
The sheet to execute (prompt, validations, timeout, etc.). |
required |
backend
|
Backend
|
The backend to execute through. |
required |
attempt_context
|
AttemptContext
|
Context from the conductor (attempt number, mode, etc.). |
required |
inbox
|
Queue[SheetAttemptResult]
|
The baton's event inbox to report results to. |
required |
total_sheets
|
int
|
Total concrete sheets in the job (for template variables). |
1
|
total_movements
|
int
|
Total movements in the job (for template variables). |
1
|
rendered_prompt
|
str | None
|
Optional pre-rendered prompt from PromptRenderer. When provided, the musician uses this directly instead of calling _build_prompt(). This enables the full 9-layer prompt assembly pipeline including spec fragments, learned patterns, and failure history. |
None
|
preamble
|
str | None
|
Optional pre-built preamble. Set on the backend via set_preamble() before execution. Only used when rendered_prompt is also provided (the PromptRenderer separates them). |
None
|
cost_per_1k_input
|
float | None
|
Cost per 1000 input tokens (USD) from the instrument profile's ModelCapacity. None uses hardcoded fallback. |
None
|
cost_per_1k_output
|
float | None
|
Cost per 1000 output tokens (USD) from the instrument profile's ModelCapacity. None uses hardcoded fallback. |
None
|
instrument_override
|
str | None
|
When set, used as the instrument_name in the SheetAttemptResult instead of sheet.instrument_name. Required after instrument fallback — the Sheet entity keeps the original instrument but the baton's SheetExecutionState tracks the fallback instrument. Without this, attempt results credit the wrong instrument for success/failure tracking. |
None
|
Never raises — all exceptions are caught and reported via the inbox.
Source code in src/marianne/daemon/baton/musician.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 | |