code_mode

code_mode ¶

Code mode execution — sandboxed execution of agent-generated code.

When free-tier models (OpenRouter) produce code blocks instead of tool calls, the code mode executor runs them in a bwrap sandbox with workspace access. This is the bridge that gives non-MCP-native instruments access to the technique system.

The executor: 1. Receives classified code blocks from the TechniqueRouter 2. Writes the code to a temp file in the workspace 3. Wraps execution in bwrap sandbox (via SandboxWrapper) 4. Captures stdout, stderr, and written files 5. Returns results for injection into the sheet output

For MCP-native instruments (claude-code, gemini-cli), code mode is optional. These instruments have native tool use. Code mode is the primary execution path for instruments that lack tool-use support.

Execution timeout: configurable, defaults to 30s. A bwrap subprocess starts in ~4ms — the overhead is negligible.

See: design spec section 8.2 (Code Mode Execution)

Classes¶

CodeExecutionStatus ¶

Bases: str, Enum

Result status for a code mode execution.

Attributes¶

SUCCESS `class-attribute` `instance-attribute` ¶

SUCCESS = 'success'

Code ran successfully (exit code 0).

FAILURE `class-attribute` `instance-attribute` ¶

FAILURE = 'failure'

Code ran but exited with non-zero status.

TIMEOUT `class-attribute` `instance-attribute` ¶

TIMEOUT = 'timeout'

Code execution exceeded the timeout.

SANDBOX_ERROR `class-attribute` `instance-attribute` ¶

SANDBOX_ERROR = 'sandbox_error'

Sandbox setup or teardown failed.

CodeExecutionResult `dataclass` ¶

CodeExecutionResult(status, exit_code=None, stdout='', stderr='', duration_seconds=0.0, error_message=None, files_written=list())

Result of executing agent-generated code in a sandbox.

Contains the execution outcome, captured output, and any files written by the code.

CodeModeExecutor ¶

CodeModeExecutor(*, workspace, timeout_seconds=_DEFAULT_TIMEOUT_SECONDS, use_sandbox=True)

Runs agent-generated code blocks in sandboxed subprocesses.

Each code block is written to a temp file and run through the appropriate interpreter. When bwrap is available, execution is sandboxed. When bwrap is unavailable, execution still proceeds (with a warning) — the deep fallback philosophy applies to the execution layer too.

Usage::

executor = CodeModeExecutor(
    workspace=Path("/tmp/agent-ws"),
)

block = CodeBlock(language="python", code="print('hello')")
result = await executor.execute(block)

if result.status == CodeExecutionStatus.SUCCESS:
    print(result.stdout)

Initialize the code mode executor.

Parameters:

Name	Type	Description	Default
`workspace`	`Path`	The agent's workspace directory. Code runs with this as its working directory.	required
`timeout_seconds`	`float`	Maximum time for code execution.	`_DEFAULT_TIMEOUT_SECONDS`
`use_sandbox`	`bool`	Whether to use bwrap sandbox. When False, code runs directly (useful for testing or when bwrap is unavailable).	`True`

Source code in src/marianne/execution/code_mode.py

def __init__(
    self,
    *,
    workspace: Path,
    timeout_seconds: float = _DEFAULT_TIMEOUT_SECONDS,
    use_sandbox: bool = True,
) -> None:
    """Initialize the code mode executor.

    Args:
        workspace: The agent's workspace directory. Code runs
            with this as its working directory.
        timeout_seconds: Maximum time for code execution.
        use_sandbox: Whether to use bwrap sandbox. When False,
            code runs directly (useful for testing or when bwrap
            is unavailable).
    """
    if not workspace.is_dir():
        raise ValueError(f"workspace must be an existing directory: {workspace}")

    self._workspace = workspace
    self._timeout = timeout_seconds
    self._use_sandbox = use_sandbox

Attributes¶

workspace `property` ¶

workspace

The workspace directory for code execution.

Functions¶

execute `async` ¶

execute(block)

Run a single code block.

Writes the code to a temp file, runs it through the appropriate interpreter, captures output, and returns the result.

Parameters:

Name	Type	Description	Default
`block`	`CodeBlock`	The code block to execute.	required

Returns:

Type	Description
`CodeExecutionResult`	Execution result with status, output, and metadata.

Source code in src/marianne/execution/code_mode.py

async def execute(self, block: CodeBlock) -> CodeExecutionResult:
    """Run a single code block.

    Writes the code to a temp file, runs it through the appropriate
    interpreter, captures output, and returns the result.

    Args:
        block: The code block to execute.

    Returns:
        Execution result with status, output, and metadata.
    """
    if not block.code.strip():
        return CodeExecutionResult(
            status=CodeExecutionStatus.FAILURE,
            error_message="Empty code block",
        )

    interpreter = _INTERPRETERS.get(block.language.lower())
    if interpreter is None:
        return CodeExecutionResult(
            status=CodeExecutionStatus.FAILURE,
            error_message=f"Unsupported language: {block.language}",
        )

    # Write code to a temp file in the workspace
    suffix = _file_suffix(block.language)
    try:
        code_file = self._write_code_file(block.code, suffix)
    except OSError as e:
        return CodeExecutionResult(
            status=CodeExecutionStatus.SANDBOX_ERROR,
            error_message=f"Failed to write code file: {e}",
        )

    try:
        return await self._run_code(interpreter, code_file, block.language)
    finally:
        # Clean up temp file
        try:
            code_file.unlink(missing_ok=True)
        except OSError:
            pass

execute_all `async` ¶

execute_all(blocks)

Run multiple code blocks sequentially.

Each block runs in order. If a block fails, subsequent blocks still execute (independent execution model — each block is self-contained).

Parameters:

Name	Type	Description	Default
`blocks`	`list[CodeBlock]`	Code blocks to execute.	required

Returns:

Type	Description
`list[CodeExecutionResult]`	List of results, one per block.

Source code in src/marianne/execution/code_mode.py

async def execute_all(
    self, blocks: list[CodeBlock],
) -> list[CodeExecutionResult]:
    """Run multiple code blocks sequentially.

    Each block runs in order. If a block fails, subsequent blocks
    still execute (independent execution model — each block is
    self-contained).

    Args:
        blocks: Code blocks to execute.

    Returns:
        List of results, one per block.
    """
    results: list[CodeExecutionResult] = []
    for block in blocks:
        result = await self.execute(block)
        results.append(result)
    return results

Functions¶

render_code_mode_error ¶

render_code_mode_error(result)

Render a code execution failure for retry context injection.

When code mode execution fails, this renders the error in a format that helps the agent adjust on retry. Injected into the sheet's output context.

Parameters:

Name	Type	Description	Default
`result`	`CodeExecutionResult`	The failed execution result.	required

Returns:

Type	Description
`str`	Markdown-formatted error context.

Source code in src/marianne/execution/code_mode.py

def render_code_mode_error(result: CodeExecutionResult) -> str:
    """Render a code execution failure for retry context injection.

    When code mode execution fails, this renders the error in a format
    that helps the agent adjust on retry. Injected into the sheet's
    output context.

    Args:
        result: The failed execution result.

    Returns:
        Markdown-formatted error context.
    """
    lines = [
        "## Code Execution Failed",
        "",
        f"**Status:** {result.status.value}",
    ]

    if result.exit_code is not None:
        lines.append(f"**Exit code:** {result.exit_code}")

    if result.error_message:
        lines.append(f"**Error:** {result.error_message}")

    if result.stderr:
        lines.extend([
            "",
            "**stderr:**",
            "```",
            result.stderr[:2000],  # Truncate for prompt context
            "```",
        ])

    lines.extend([
        "",
        "Please review the error and adjust your code. Common issues:",
        "- Missing imports",
        "- Incorrect file paths (use /workspace as base)",
        "- Type errors",
        "- Permission errors (sandbox restricts network access)",
    ])

    return "\n".join(lines)

code_mode

code_mode ¶

Classes¶

CodeExecutionStatus ¶

Attributes¶

SUCCESS class-attribute instance-attribute ¶

FAILURE class-attribute instance-attribute ¶

TIMEOUT class-attribute instance-attribute ¶

SANDBOX_ERROR class-attribute instance-attribute ¶

CodeExecutionResult dataclass ¶

CodeModeExecutor ¶

Attributes¶

workspace property ¶

Functions¶

execute async ¶

execute_all async ¶

Functions¶

render_code_mode_error ¶

SUCCESS `class-attribute` `instance-attribute` ¶

FAILURE `class-attribute` `instance-attribute` ¶

TIMEOUT `class-attribute` `instance-attribute` ¶

SANDBOX_ERROR `class-attribute` `instance-attribute` ¶

CodeExecutionResult `dataclass` ¶

workspace `property` ¶

execute `async` ¶

execute_all `async` ¶