ollama

ollama ¶

Ollama backend for local model execution with MCP tool support.

Enables Marianne to use local Ollama models with translated MCP tool schemas. Implements the Backend protocol with an agentic loop for multi-turn tool calling.

Architecture Decision: ADR-001 specifies MCPProxyService as in-process subprocess manager, not a separate proxy process. This backend integrates with it for tool execution during the agentic loop.

Classes¶

OllamaFunctionDef ¶

Bases: TypedDict

Function definition within an Ollama tool.

OllamaToolDef ¶

Bases: TypedDict

Ollama tool definition (OpenAI-style function calling format).

OllamaRawFunction ¶

Bases: TypedDict

Raw function payload inside a tool call from Ollama response.

OllamaRawToolCall ¶

Bases: TypedDict

Raw tool call object from Ollama chat response.

OllamaChatResponse ¶

Bases: TypedDict

Top-level Ollama /api/chat response shape.

OllamaChatOptions ¶

Bases: TypedDict

Options sub-object for Ollama chat requests.

OllamaChatRequest ¶

Bases: TypedDict

Request payload for Ollama /api/chat endpoint.

ToolCall `dataclass` ¶

ToolCall(id, name, arguments)

Represents a tool call from the model.

OllamaMessage `dataclass` ¶

OllamaMessage(role, content, tool_calls=None, tool_call_id=None)

A message in the Ollama conversation format.

Functions¶

to_dict ¶

to_dict()

Convert to Ollama API format.

Source code in src/marianne/backends/ollama.py

def to_dict(self) -> dict[str, Any]:
    """Convert to Ollama API format."""
    msg: dict[str, Any] = {"role": self.role, "content": self.content}
    if self.tool_calls:
        msg["tool_calls"] = self.tool_calls
    if self.tool_call_id:
        msg["tool_call_id"] = self.tool_call_id
    return msg

OllamaBackend ¶

OllamaBackend(base_url='http://localhost:11434', model='llama3.1:8b', timeout=300.0, num_ctx=32768, keep_alive='5m', max_tool_iterations=10, mcp_proxy=None)

Bases: HttpxClientMixin, Backend

Backend for Ollama model execution with tool translation.

Implements the Backend protocol for local Ollama models. Supports: - MCP tool schema translation to Ollama function format - Multi-turn agentic loop for tool calling - Health checks via /api/tags endpoint

Example usage

backend = OllamaBackend( base_url="http://localhost:11434", model="llama3.1:8b", ) result = await backend.execute("Write a hello world function")

Initialize Ollama backend.

Parameters:

Name	Type	Description	Default
`base_url`	`str`	Ollama server URL (default: http://localhost:11434)	`'http://localhost:11434'`
`model`	`str`	Model to use (must support tool calling)	`'llama3.1:8b'`
`timeout`	`float`	Request timeout in seconds	`300.0`
`num_ctx`	`int`	Context window size (recommend >= 32768 for Claude Code tools)	`32768`
`keep_alive`	`str`	Keep model loaded duration (e.g., "5m", "1h")	`'5m'`
`max_tool_iterations`	`int`	Maximum tool call iterations per execution	`10`
`mcp_proxy`	`MCPProxyService \| None`	Optional MCPProxyService for tool execution	`None`

Source code in src/marianne/backends/ollama.py

def __init__(
    self,
    base_url: str = "http://localhost:11434",
    model: str = "llama3.1:8b",
    timeout: float = 300.0,
    num_ctx: int = 32768,
    keep_alive: str = "5m",
    max_tool_iterations: int = 10,
    mcp_proxy: MCPProxyService | None = None,
) -> None:
    """Initialize Ollama backend.

    Args:
        base_url: Ollama server URL (default: http://localhost:11434)
        model: Model to use (must support tool calling)
        timeout: Request timeout in seconds
        num_ctx: Context window size (recommend >= 32768 for Claude Code tools)
        keep_alive: Keep model loaded duration (e.g., "5m", "1h")
        max_tool_iterations: Maximum tool call iterations per execution
        mcp_proxy: Optional MCPProxyService for tool execution
    """
    self.base_url = base_url.rstrip("/")
    self.model = model
    self.timeout = timeout
    self.num_ctx = num_ctx
    self.keep_alive = keep_alive
    self.max_tool_iterations = max_tool_iterations
    self.mcp_proxy = mcp_proxy
    self._working_directory: Path | None = None
    self._preamble: str | None = None
    self._prompt_extensions: list[str] = []

    # HTTP client lifecycle via shared mixin
    self._init_httpx_mixin(self.base_url, self.timeout, connect_timeout=10.0)

Attributes¶

name `property` ¶

name

Human-readable backend name.

Functions¶

from_config `classmethod` ¶

from_config(config)

Create backend from configuration.

Parameters:

Name	Type	Description	Default
`config`	`BackendConfig`	Backend configuration with ollama settings	required

Returns:

Type	Description
`OllamaBackend`	Configured OllamaBackend instance

Source code in src/marianne/backends/ollama.py

@classmethod
def from_config(cls, config: BackendConfig) -> OllamaBackend:
    """Create backend from configuration.

    Args:
        config: Backend configuration with ollama settings

    Returns:
        Configured OllamaBackend instance
    """
    ollama_cfg = config.ollama
    return cls(
        base_url=ollama_cfg.base_url,
        model=ollama_cfg.model,
        timeout=ollama_cfg.timeout_seconds,
        num_ctx=ollama_cfg.num_ctx,
        keep_alive=ollama_cfg.keep_alive,
        max_tool_iterations=ollama_cfg.max_tool_iterations,
    )

set_preamble ¶

set_preamble(preamble)

Set the dynamic preamble for the next execution.

Source code in src/marianne/backends/ollama.py

def set_preamble(self, preamble: str | None) -> None:
    """Set the dynamic preamble for the next execution."""
    self._preamble = preamble

set_prompt_extensions ¶

set_prompt_extensions(extensions)

Set prompt extensions for the next execution.

Source code in src/marianne/backends/ollama.py

def set_prompt_extensions(self, extensions: list[str]) -> None:
    """Set prompt extensions for the next execution."""
    self._prompt_extensions = [e for e in extensions if e.strip()]

execute `async` ¶

execute(prompt, *, timeout_seconds=None)

Execute a prompt and return the result.

Runs the agentic loop if tools are available via MCPProxyService, otherwise performs a simple completion.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The prompt to send to Ollama	required
`timeout_seconds`	`float \| None`	Per-call timeout override. Ollama uses the httpx client-level timeout from `__init__`; per-call override is logged but not enforced.	`None`

Returns:

Type	Description
`ExecutionResult`	ExecutionResult with output and metadata

Source code in src/marianne/backends/ollama.py

async def execute(
    self,
    prompt: str,
    *,
    timeout_seconds: float | None = None,
) -> ExecutionResult:
    """Execute a prompt and return the result.

    Runs the agentic loop if tools are available via MCPProxyService,
    otherwise performs a simple completion.

    Args:
        prompt: The prompt to send to Ollama
        timeout_seconds: Per-call timeout override. Ollama uses the httpx
            client-level timeout from ``__init__``; per-call override is
            logged but not enforced.

    Returns:
        ExecutionResult with output and metadata
    """
    if timeout_seconds is not None:
        _logger.debug(
            "timeout_override_ignored",
            backend="ollama",
            requested=timeout_seconds,
            actual=self.timeout,
        )
    start_time = time.monotonic()
    started_at = utc_now()

    _logger.debug(
        "ollama_execute_start",
        model=self.model,
        prompt_length=len(prompt),
        has_mcp_proxy=self.mcp_proxy is not None,
    )

    try:
        # Build initial messages, injecting preamble/extensions
        if self._preamble or self._prompt_extensions:
            parts: list[str] = []
            if self._preamble:
                parts.append(self._preamble)
            parts.append(prompt)
            if self._prompt_extensions:
                parts.append("\n".join(self._prompt_extensions))
            messages = [OllamaMessage(role="user", content="\n".join(parts))]
        else:
            messages = [OllamaMessage(role="user", content=prompt)]

        # Get tools if MCP proxy is available
        tools: list[OllamaToolDef] = []
        mcp_degraded: str | None = None
        if self.mcp_proxy:
            try:
                mcp_tools = await self.mcp_proxy.list_tools()
                tools = self._translate_tools_to_ollama(mcp_tools)
                _logger.debug("tools_loaded", tool_count=len(tools))
            except (OSError, ConnectionError, TimeoutError, httpx.HTTPError) as e:
                mcp_degraded = (
                    f"[MCP DEGRADED] Tool loading failed ({type(e).__name__}: {e}); "
                    "running in non-agentic mode. "
                    "Check MCP server connectivity and configuration."
                )
                _logger.warning(
                    "mcp_tool_load_failed.falling_back_to_non_agentic",
                    error=str(e),
                    error_type=type(e).__name__,
                    hint=mcp_degraded,
                )

        # Run agentic loop if tools available, else simple completion
        if tools:
            result = await self._agentic_loop(messages, tools)
        else:
            result = await self._simple_completion(messages)

        duration = time.monotonic() - start_time
        result.duration_seconds = duration
        result.started_at = started_at
        result.model = self.model

        # Surface MCP degradation in result so callers can detect it
        if mcp_degraded:
            result.stderr = (
                f"{result.stderr}\n{mcp_degraded}" if result.stderr
                else mcp_degraded
            )
            if not result.error_message:
                result.error_message = mcp_degraded

        _logger.info(
            "ollama_execute_complete",
            success=result.success,
            duration_seconds=duration,
            input_tokens=result.input_tokens,
            output_tokens=result.output_tokens,
        )

        return result

    except httpx.ConnectError as e:
        duration = time.monotonic() - start_time
        _logger.error("ollama_connection_error", error=str(e))
        return ExecutionResult(
            success=False,
            stdout="",
            stderr=f"Connection error: {e}",
            duration_seconds=duration,
            started_at=started_at,
            error_type="connection",
            error_message=str(e),
            model=self.model,
        )

    except httpx.TimeoutException as e:
        duration = time.monotonic() - start_time
        _logger.error("ollama_timeout", error=str(e))
        return ExecutionResult(
            success=False,
            stdout="",
            stderr=f"Timeout: {e}",
            duration_seconds=duration,
            started_at=started_at,
            exit_reason="timeout",
            error_type="timeout",
            error_message=str(e),
            model=self.model,
        )

    except Exception as e:
        duration = time.monotonic() - start_time
        _logger.exception("ollama_execute_error", error=str(e))
        raise

health_check `async` ¶

health_check()

Check if Ollama is available and model is loaded.

Uses /api/tags to verify Ollama is running and configured model exists.

Returns:

Type	Description
`bool`	True if healthy, False otherwise

Source code in src/marianne/backends/ollama.py

async def health_check(self) -> bool:
    """Check if Ollama is available and model is loaded.

    Uses /api/tags to verify Ollama is running and configured model exists.

    Returns:
        True if healthy, False otherwise
    """
    try:
        client = await self._get_client()
        response = await client.get("/api/tags", timeout=10.0)

        if response.status_code != 200:
            _logger.warning(
                "ollama_health_check_failed",
                status_code=response.status_code,
            )
            return False

        data = response.json()
        models = data.get("models", [])

        # Check if our model is available
        model_base = self.model.split(":")[0]
        available = any(
            entry.get("name", "").startswith(model_base)
            for entry in models
        )

        if not available:
            _logger.warning(
                "ollama_model_not_found",
                model=self.model,
                available_models=[entry.get("name") for entry in models],
            )

        return available

    except (httpx.HTTPError, OSError, ValueError) as e:
        _logger.warning("ollama_health_check_error", error=str(e), exc_info=True)
        return False

close `async` ¶

close()

Close HTTP client and release resources.

Source code in src/marianne/backends/ollama.py

async def close(self) -> None:
    """Close HTTP client and release resources."""
    await self._close_httpx_client()

ollama

ollama ¶

Classes¶

OllamaFunctionDef ¶

OllamaToolDef ¶

OllamaRawFunction ¶

OllamaRawToolCall ¶

OllamaChatResponse ¶

OllamaChatOptions ¶

OllamaChatRequest ¶

ToolCall dataclass ¶

OllamaMessage dataclass ¶

Functions¶

to_dict ¶

OllamaBackend ¶

Attributes¶

name property ¶

Functions¶

from_config classmethod ¶

set_preamble ¶

set_prompt_extensions ¶

execute async ¶

health_check async ¶

close async ¶

Functions¶

ToolCall `dataclass` ¶

OllamaMessage `dataclass` ¶

name `property` ¶

from_config `classmethod` ¶

execute `async` ¶

health_check `async` ¶

close `async` ¶