Skip to content

ollama

ollama

Ollama backend for local model execution with MCP tool support.

Enables Marianne to use local Ollama models with translated MCP tool schemas. Implements the Backend protocol with an agentic loop for multi-turn tool calling.

Architecture Decision: ADR-001 specifies MCPProxyService as in-process subprocess manager, not a separate proxy process. This backend integrates with it for tool execution during the agentic loop.

Classes

OllamaFunctionDef

Bases: TypedDict

Function definition within an Ollama tool.

OllamaToolDef

Bases: TypedDict

Ollama tool definition (OpenAI-style function calling format).

OllamaRawFunction

Bases: TypedDict

Raw function payload inside a tool call from Ollama response.

OllamaRawToolCall

Bases: TypedDict

Raw tool call object from Ollama chat response.

OllamaChatResponse

Bases: TypedDict

Top-level Ollama /api/chat response shape.

OllamaChatOptions

Bases: TypedDict

Options sub-object for Ollama chat requests.

OllamaChatRequest

Bases: TypedDict

Request payload for Ollama /api/chat endpoint.

ToolCall dataclass

ToolCall(id, name, arguments)

Represents a tool call from the model.

OllamaMessage dataclass

OllamaMessage(role, content, tool_calls=None, tool_call_id=None)

A message in the Ollama conversation format.

Functions
to_dict
to_dict()

Convert to Ollama API format.

Source code in src/marianne/backends/ollama.py
def to_dict(self) -> dict[str, Any]:
    """Convert to Ollama API format."""
    msg: dict[str, Any] = {"role": self.role, "content": self.content}
    if self.tool_calls:
        msg["tool_calls"] = self.tool_calls
    if self.tool_call_id:
        msg["tool_call_id"] = self.tool_call_id
    return msg

OllamaBackend

OllamaBackend(base_url='http://localhost:11434', model='llama3.1:8b', timeout=300.0, num_ctx=32768, keep_alive='5m', max_tool_iterations=10, mcp_proxy=None)

Bases: HttpxClientMixin, Backend

Backend for Ollama model execution with tool translation.

Implements the Backend protocol for local Ollama models. Supports: - MCP tool schema translation to Ollama function format - Multi-turn agentic loop for tool calling - Health checks via /api/tags endpoint

Example usage

backend = OllamaBackend( base_url="http://localhost:11434", model="llama3.1:8b", ) result = await backend.execute("Write a hello world function")

Initialize Ollama backend.

Parameters:

Name Type Description Default
base_url str

Ollama server URL (default: http://localhost:11434)

'http://localhost:11434'
model str

Model to use (must support tool calling)

'llama3.1:8b'
timeout float

Request timeout in seconds

300.0
num_ctx int

Context window size (recommend >= 32768 for Claude Code tools)

32768
keep_alive str

Keep model loaded duration (e.g., "5m", "1h")

'5m'
max_tool_iterations int

Maximum tool call iterations per execution

10
mcp_proxy MCPProxyService | None

Optional MCPProxyService for tool execution

None
Source code in src/marianne/backends/ollama.py
def __init__(
    self,
    base_url: str = "http://localhost:11434",
    model: str = "llama3.1:8b",
    timeout: float = 300.0,
    num_ctx: int = 32768,
    keep_alive: str = "5m",
    max_tool_iterations: int = 10,
    mcp_proxy: MCPProxyService | None = None,
) -> None:
    """Initialize Ollama backend.

    Args:
        base_url: Ollama server URL (default: http://localhost:11434)
        model: Model to use (must support tool calling)
        timeout: Request timeout in seconds
        num_ctx: Context window size (recommend >= 32768 for Claude Code tools)
        keep_alive: Keep model loaded duration (e.g., "5m", "1h")
        max_tool_iterations: Maximum tool call iterations per execution
        mcp_proxy: Optional MCPProxyService for tool execution
    """
    self.base_url = base_url.rstrip("/")
    self.model = model
    self.timeout = timeout
    self.num_ctx = num_ctx
    self.keep_alive = keep_alive
    self.max_tool_iterations = max_tool_iterations
    self.mcp_proxy = mcp_proxy
    self._working_directory: Path | None = None
    self._preamble: str | None = None
    self._prompt_extensions: list[str] = []

    # HTTP client lifecycle via shared mixin
    self._init_httpx_mixin(self.base_url, self.timeout, connect_timeout=10.0)
Attributes
name property
name

Human-readable backend name.

Functions
from_config classmethod
from_config(config)

Create backend from configuration.

Parameters:

Name Type Description Default
config BackendConfig

Backend configuration with ollama settings

required

Returns:

Type Description
OllamaBackend

Configured OllamaBackend instance

Source code in src/marianne/backends/ollama.py
@classmethod
def from_config(cls, config: BackendConfig) -> OllamaBackend:
    """Create backend from configuration.

    Args:
        config: Backend configuration with ollama settings

    Returns:
        Configured OllamaBackend instance
    """
    ollama_cfg = config.ollama
    return cls(
        base_url=ollama_cfg.base_url,
        model=ollama_cfg.model,
        timeout=ollama_cfg.timeout_seconds,
        num_ctx=ollama_cfg.num_ctx,
        keep_alive=ollama_cfg.keep_alive,
        max_tool_iterations=ollama_cfg.max_tool_iterations,
    )
set_preamble
set_preamble(preamble)

Set the dynamic preamble for the next execution.

Source code in src/marianne/backends/ollama.py
def set_preamble(self, preamble: str | None) -> None:
    """Set the dynamic preamble for the next execution."""
    self._preamble = preamble
set_prompt_extensions
set_prompt_extensions(extensions)

Set prompt extensions for the next execution.

Source code in src/marianne/backends/ollama.py
def set_prompt_extensions(self, extensions: list[str]) -> None:
    """Set prompt extensions for the next execution."""
    self._prompt_extensions = [e for e in extensions if e.strip()]
execute async
execute(prompt, *, timeout_seconds=None)

Execute a prompt and return the result.

Runs the agentic loop if tools are available via MCPProxyService, otherwise performs a simple completion.

Parameters:

Name Type Description Default
prompt str

The prompt to send to Ollama

required
timeout_seconds float | None

Per-call timeout override. Ollama uses the httpx client-level timeout from __init__; per-call override is logged but not enforced.

None

Returns:

Type Description
ExecutionResult

ExecutionResult with output and metadata

Source code in src/marianne/backends/ollama.py
async def execute(
    self,
    prompt: str,
    *,
    timeout_seconds: float | None = None,
) -> ExecutionResult:
    """Execute a prompt and return the result.

    Runs the agentic loop if tools are available via MCPProxyService,
    otherwise performs a simple completion.

    Args:
        prompt: The prompt to send to Ollama
        timeout_seconds: Per-call timeout override. Ollama uses the httpx
            client-level timeout from ``__init__``; per-call override is
            logged but not enforced.

    Returns:
        ExecutionResult with output and metadata
    """
    if timeout_seconds is not None:
        _logger.debug(
            "timeout_override_ignored",
            backend="ollama",
            requested=timeout_seconds,
            actual=self.timeout,
        )
    start_time = time.monotonic()
    started_at = utc_now()

    _logger.debug(
        "ollama_execute_start",
        model=self.model,
        prompt_length=len(prompt),
        has_mcp_proxy=self.mcp_proxy is not None,
    )

    try:
        # Build initial messages, injecting preamble/extensions
        if self._preamble or self._prompt_extensions:
            parts: list[str] = []
            if self._preamble:
                parts.append(self._preamble)
            parts.append(prompt)
            if self._prompt_extensions:
                parts.append("\n".join(self._prompt_extensions))
            messages = [OllamaMessage(role="user", content="\n".join(parts))]
        else:
            messages = [OllamaMessage(role="user", content=prompt)]

        # Get tools if MCP proxy is available
        tools: list[OllamaToolDef] = []
        mcp_degraded: str | None = None
        if self.mcp_proxy:
            try:
                mcp_tools = await self.mcp_proxy.list_tools()
                tools = self._translate_tools_to_ollama(mcp_tools)
                _logger.debug("tools_loaded", tool_count=len(tools))
            except (OSError, ConnectionError, TimeoutError, httpx.HTTPError) as e:
                mcp_degraded = (
                    f"[MCP DEGRADED] Tool loading failed ({type(e).__name__}: {e}); "
                    "running in non-agentic mode. "
                    "Check MCP server connectivity and configuration."
                )
                _logger.warning(
                    "mcp_tool_load_failed.falling_back_to_non_agentic",
                    error=str(e),
                    error_type=type(e).__name__,
                    hint=mcp_degraded,
                )

        # Run agentic loop if tools available, else simple completion
        if tools:
            result = await self._agentic_loop(messages, tools)
        else:
            result = await self._simple_completion(messages)

        duration = time.monotonic() - start_time
        result.duration_seconds = duration
        result.started_at = started_at
        result.model = self.model

        # Surface MCP degradation in result so callers can detect it
        if mcp_degraded:
            result.stderr = (
                f"{result.stderr}\n{mcp_degraded}" if result.stderr
                else mcp_degraded
            )
            if not result.error_message:
                result.error_message = mcp_degraded

        _logger.info(
            "ollama_execute_complete",
            success=result.success,
            duration_seconds=duration,
            input_tokens=result.input_tokens,
            output_tokens=result.output_tokens,
        )

        return result

    except httpx.ConnectError as e:
        duration = time.monotonic() - start_time
        _logger.error("ollama_connection_error", error=str(e))
        return ExecutionResult(
            success=False,
            stdout="",
            stderr=f"Connection error: {e}",
            duration_seconds=duration,
            started_at=started_at,
            error_type="connection",
            error_message=str(e),
            model=self.model,
        )

    except httpx.TimeoutException as e:
        duration = time.monotonic() - start_time
        _logger.error("ollama_timeout", error=str(e))
        return ExecutionResult(
            success=False,
            stdout="",
            stderr=f"Timeout: {e}",
            duration_seconds=duration,
            started_at=started_at,
            exit_reason="timeout",
            error_type="timeout",
            error_message=str(e),
            model=self.model,
        )

    except Exception as e:
        duration = time.monotonic() - start_time
        _logger.exception("ollama_execute_error", error=str(e))
        raise
health_check async
health_check()

Check if Ollama is available and model is loaded.

Uses /api/tags to verify Ollama is running and configured model exists.

Returns:

Type Description
bool

True if healthy, False otherwise

Source code in src/marianne/backends/ollama.py
async def health_check(self) -> bool:
    """Check if Ollama is available and model is loaded.

    Uses /api/tags to verify Ollama is running and configured model exists.

    Returns:
        True if healthy, False otherwise
    """
    try:
        client = await self._get_client()
        response = await client.get("/api/tags", timeout=10.0)

        if response.status_code != 200:
            _logger.warning(
                "ollama_health_check_failed",
                status_code=response.status_code,
            )
            return False

        data = response.json()
        models = data.get("models", [])

        # Check if our model is available
        model_base = self.model.split(":")[0]
        available = any(
            entry.get("name", "").startswith(model_base)
            for entry in models
        )

        if not available:
            _logger.warning(
                "ollama_model_not_found",
                model=self.model,
                available_models=[entry.get("name") for entry in models],
            )

        return available

    except (httpx.HTTPError, OSError, ValueError) as e:
        _logger.warning("ollama_health_check_error", error=str(e), exc_info=True)
        return False
close async
close()

Close HTTP client and release resources.

Source code in src/marianne/backends/ollama.py
async def close(self) -> None:
    """Close HTTP client and release resources."""
    await self._close_httpx_client()

Functions