Skip to content

code_mode

code_mode

Code mode execution — sandboxed execution of agent-generated code.

When free-tier models (OpenRouter) produce code blocks instead of tool calls, the code mode executor runs them in a bwrap sandbox with workspace access. This is the bridge that gives non-MCP-native instruments access to the technique system.

The executor: 1. Receives classified code blocks from the TechniqueRouter 2. Writes the code to a temp file in the workspace 3. Wraps execution in bwrap sandbox (via SandboxWrapper) 4. Captures stdout, stderr, and written files 5. Returns results for injection into the sheet output

For MCP-native instruments (claude-code, gemini-cli), code mode is optional. These instruments have native tool use. Code mode is the primary execution path for instruments that lack tool-use support.

Execution timeout: configurable, defaults to 30s. A bwrap subprocess starts in ~4ms — the overhead is negligible.

See: design spec section 8.2 (Code Mode Execution)

Classes

CodeExecutionStatus

Bases: str, Enum

Result status for a code mode execution.

Attributes
SUCCESS class-attribute instance-attribute
SUCCESS = 'success'

Code ran successfully (exit code 0).

FAILURE class-attribute instance-attribute
FAILURE = 'failure'

Code ran but exited with non-zero status.

TIMEOUT class-attribute instance-attribute
TIMEOUT = 'timeout'

Code execution exceeded the timeout.

SANDBOX_ERROR class-attribute instance-attribute
SANDBOX_ERROR = 'sandbox_error'

Sandbox setup or teardown failed.

CodeExecutionResult dataclass

CodeExecutionResult(status, exit_code=None, stdout='', stderr='', duration_seconds=0.0, error_message=None, files_written=list())

Result of executing agent-generated code in a sandbox.

Contains the execution outcome, captured output, and any files written by the code.

CodeModeExecutor

CodeModeExecutor(*, workspace, timeout_seconds=_DEFAULT_TIMEOUT_SECONDS, use_sandbox=True)

Runs agent-generated code blocks in sandboxed subprocesses.

Each code block is written to a temp file and run through the appropriate interpreter. When bwrap is available, execution is sandboxed. When bwrap is unavailable, execution still proceeds (with a warning) — the deep fallback philosophy applies to the execution layer too.

Usage::

executor = CodeModeExecutor(
    workspace=Path("/tmp/agent-ws"),
)

block = CodeBlock(language="python", code="print('hello')")
result = await executor.execute(block)

if result.status == CodeExecutionStatus.SUCCESS:
    print(result.stdout)

Initialize the code mode executor.

Parameters:

Name Type Description Default
workspace Path

The agent's workspace directory. Code runs with this as its working directory.

required
timeout_seconds float

Maximum time for code execution.

_DEFAULT_TIMEOUT_SECONDS
use_sandbox bool

Whether to use bwrap sandbox. When False, code runs directly (useful for testing or when bwrap is unavailable).

True
Source code in src/marianne/execution/code_mode.py
def __init__(
    self,
    *,
    workspace: Path,
    timeout_seconds: float = _DEFAULT_TIMEOUT_SECONDS,
    use_sandbox: bool = True,
) -> None:
    """Initialize the code mode executor.

    Args:
        workspace: The agent's workspace directory. Code runs
            with this as its working directory.
        timeout_seconds: Maximum time for code execution.
        use_sandbox: Whether to use bwrap sandbox. When False,
            code runs directly (useful for testing or when bwrap
            is unavailable).
    """
    if not workspace.is_dir():
        raise ValueError(f"workspace must be an existing directory: {workspace}")

    self._workspace = workspace
    self._timeout = timeout_seconds
    self._use_sandbox = use_sandbox
Attributes
workspace property
workspace

The workspace directory for code execution.

Functions
execute async
execute(block)

Run a single code block.

Writes the code to a temp file, runs it through the appropriate interpreter, captures output, and returns the result.

Parameters:

Name Type Description Default
block CodeBlock

The code block to execute.

required

Returns:

Type Description
CodeExecutionResult

Execution result with status, output, and metadata.

Source code in src/marianne/execution/code_mode.py
async def execute(self, block: CodeBlock) -> CodeExecutionResult:
    """Run a single code block.

    Writes the code to a temp file, runs it through the appropriate
    interpreter, captures output, and returns the result.

    Args:
        block: The code block to execute.

    Returns:
        Execution result with status, output, and metadata.
    """
    if not block.code.strip():
        return CodeExecutionResult(
            status=CodeExecutionStatus.FAILURE,
            error_message="Empty code block",
        )

    interpreter = _INTERPRETERS.get(block.language.lower())
    if interpreter is None:
        return CodeExecutionResult(
            status=CodeExecutionStatus.FAILURE,
            error_message=f"Unsupported language: {block.language}",
        )

    # Write code to a temp file in the workspace
    suffix = _file_suffix(block.language)
    try:
        code_file = self._write_code_file(block.code, suffix)
    except OSError as e:
        return CodeExecutionResult(
            status=CodeExecutionStatus.SANDBOX_ERROR,
            error_message=f"Failed to write code file: {e}",
        )

    try:
        return await self._run_code(interpreter, code_file, block.language)
    finally:
        # Clean up temp file
        try:
            code_file.unlink(missing_ok=True)
        except OSError:
            pass
execute_all async
execute_all(blocks)

Run multiple code blocks sequentially.

Each block runs in order. If a block fails, subsequent blocks still execute (independent execution model — each block is self-contained).

Parameters:

Name Type Description Default
blocks list[CodeBlock]

Code blocks to execute.

required

Returns:

Type Description
list[CodeExecutionResult]

List of results, one per block.

Source code in src/marianne/execution/code_mode.py
async def execute_all(
    self, blocks: list[CodeBlock],
) -> list[CodeExecutionResult]:
    """Run multiple code blocks sequentially.

    Each block runs in order. If a block fails, subsequent blocks
    still execute (independent execution model — each block is
    self-contained).

    Args:
        blocks: Code blocks to execute.

    Returns:
        List of results, one per block.
    """
    results: list[CodeExecutionResult] = []
    for block in blocks:
        result = await self.execute(block)
        results.append(result)
    return results

Functions

render_code_mode_error

render_code_mode_error(result)

Render a code execution failure for retry context injection.

When code mode execution fails, this renders the error in a format that helps the agent adjust on retry. Injected into the sheet's output context.

Parameters:

Name Type Description Default
result CodeExecutionResult

The failed execution result.

required

Returns:

Type Description
str

Markdown-formatted error context.

Source code in src/marianne/execution/code_mode.py
def render_code_mode_error(result: CodeExecutionResult) -> str:
    """Render a code execution failure for retry context injection.

    When code mode execution fails, this renders the error in a format
    that helps the agent adjust on retry. Injected into the sheet's
    output context.

    Args:
        result: The failed execution result.

    Returns:
        Markdown-formatted error context.
    """
    lines = [
        "## Code Execution Failed",
        "",
        f"**Status:** {result.status.value}",
    ]

    if result.exit_code is not None:
        lines.append(f"**Exit code:** {result.exit_code}")

    if result.error_message:
        lines.append(f"**Error:** {result.error_message}")

    if result.stderr:
        lines.extend([
            "",
            "**stderr:**",
            "```",
            result.stderr[:2000],  # Truncate for prompt context
            "```",
        ])

    lines.extend([
        "",
        "Please review the error and adjust your code. Common issues:",
        "- Missing imports",
        "- Incorrect file paths (use /workspace as base)",
        "- Type errors",
        "- Permission errors (sandbox restricts network access)",
    ])

    return "\n".join(lines)