tokens

tokens ¶

Token estimation and budget tracking for prompt assembly.

Provides a centralized token estimation utility and a budget tracker that enforces context window limits during prompt construction. This is the single source of truth for token estimation — all other modules (preflight, backends) should import from here rather than maintaining their own ratios.

The estimation uses a conservative chars-per-token ratio (3.5) that deliberately overestimates token counts by ~15%. This is intentional: underestimation causes context window overflow (agent gets truncated mid-instruction), while overestimation merely wastes budget (safe).

Classes¶

BudgetAllocation `dataclass` ¶

BudgetAllocation(component, tokens)

A single allocation within the token budget.

Attributes¶

component `instance-attribute` ¶

component

Name of the prompt component (e.g., 'template', 'patterns', 'specs').

tokens `instance-attribute` ¶

tokens

Estimated token count for this allocation.

TokenBudgetTracker `dataclass` ¶

TokenBudgetTracker(window_size, _allocations=list())

Tracks token budget usage during prompt assembly.

Enforces context window limits by tracking allocations as prompt components are added. Each allocation is named (e.g., 'template', 'patterns', 'specs') for diagnostic visibility via breakdown().

The tracker prevents silent over-allocation: allocate() returns False when content would exceed the remaining budget, and can_fit() checks without side effects.

Attributes¶

window_size `instance-attribute` ¶

window_size

Total token budget (effective context window).

allocated `property` ¶

allocated

Total tokens allocated so far.

Functions¶

__post_init__ ¶

__post_init__()

Validate window_size is non-negative.

Source code in src/marianne/core/tokens.py

def __post_init__(self) -> None:
    """Validate window_size is non-negative."""
    if self.window_size < 0:
        raise ValueError(f"window_size must be >= 0, got {self.window_size}")

remaining ¶

remaining()

Tokens remaining in the budget.

Returns:

Type	Description
`int`	Non-negative remaining token count. Never goes below 0
`int`	even if over-allocation somehow occurred.

Source code in src/marianne/core/tokens.py

def remaining(self) -> int:
    """Tokens remaining in the budget.

    Returns:
        Non-negative remaining token count. Never goes below 0
        even if over-allocation somehow occurred.
    """
    return max(0, self.window_size - self.allocated)

utilization ¶

utilization()

Fraction of budget used (0.0 to 1.0).

Returns:

Type	Description
`float`	Utilization ratio. Returns 0.0 for zero-budget trackers
`float`	to avoid division by zero.

Source code in src/marianne/core/tokens.py

def utilization(self) -> float:
    """Fraction of budget used (0.0 to 1.0).

    Returns:
        Utilization ratio. Returns 0.0 for zero-budget trackers
        to avoid division by zero.
    """
    if self.window_size == 0:
        return 0.0
    return min(1.0, self.allocated / self.window_size)

can_fit ¶

can_fit(text)

Check if content fits within the remaining budget.

Does not modify the tracker state.

Parameters:

Name	Type	Description	Default
`text`	`Any`	Content to check.	required

Returns:

Type	Description
`bool`	True if the estimated token count fits within remaining budget.

Source code in src/marianne/core/tokens.py

def can_fit(self, text: Any) -> bool:
    """Check if content fits within the remaining budget.

    Does not modify the tracker state.

    Args:
        text: Content to check.

    Returns:
        True if the estimated token count fits within remaining budget.
    """
    tokens = estimate_tokens(text)
    return tokens <= self.remaining()

allocate ¶

allocate(text, component)

Allocate tokens for a prompt component.

If the content fits within the remaining budget, the allocation is recorded and True is returned. If it does not fit, the allocation is rejected and False is returned — no state is modified.

Parameters:

Name	Type	Description	Default
`text`	`Any`	Content to allocate budget for.	required
`component`	`str`	Name of the prompt component (for diagnostics).	required

Returns:

Type	Description
`bool`	True if allocation succeeded, False if it would exceed budget.

Source code in src/marianne/core/tokens.py

def allocate(
    self,
    text: Any,
    component: str,
) -> bool:
    """Allocate tokens for a prompt component.

    If the content fits within the remaining budget, the allocation is
    recorded and True is returned. If it does not fit, the allocation
    is rejected and False is returned — no state is modified.

    Args:
        text: Content to allocate budget for.
        component: Name of the prompt component (for diagnostics).

    Returns:
        True if allocation succeeded, False if it would exceed budget.
    """
    tokens = estimate_tokens(text)
    if tokens > self.remaining():
        _logger.debug(
            "budget_allocation_rejected",
            component=component,
            requested_tokens=tokens,
            remaining_tokens=self.remaining(),
            window_size=self.window_size,
        )
        return False

    self._allocations.append(BudgetAllocation(component=component, tokens=tokens))
    _logger.debug(
        "budget_allocated",
        component=component,
        tokens=tokens,
        remaining=self.remaining(),
        utilization=f"{self.utilization():.1%}",
    )
    return True

breakdown ¶

breakdown()

Get per-component token allocation breakdown.

Returns:

Type	Description
`dict[str, int]`	Dict mapping component names to their allocated token counts.
`dict[str, int]`	Components with multiple allocations are summed.

Source code in src/marianne/core/tokens.py

def breakdown(self) -> dict[str, int]:
    """Get per-component token allocation breakdown.

    Returns:
        Dict mapping component names to their allocated token counts.
        Components with multiple allocations are summed.
    """
    result: dict[str, int] = {}
    for alloc in self._allocations:
        result[alloc.component] = result.get(alloc.component, 0) + alloc.tokens
    return result

reset ¶

reset()

Clear all allocations, restoring the full budget.

Source code in src/marianne/core/tokens.py

def reset(self) -> None:
    """Clear all allocations, restoring the full budget."""
    self._allocations.clear()

Functions¶

estimate_tokens ¶

estimate_tokens(text)

Estimate token count for arbitrary input.

Converts the input to a string representation and applies a conservative chars-per-token ratio. The estimate deliberately overestimates to prevent context window overflow.

Accepted types: str, dict, list, None. Other types are coerced via str().

.. warning:: CJK / Non-Latin Text Underestimation

The _CHARS_PER_TOKEN = 3.5 ratio is calibrated for English text. CJK characters (Chinese, Japanese, Korean) typically tokenize to approximately 1 token per character, meaning this function underestimates CJK token counts by 3.5-7x. For example, 600 CJK characters produce ~172 estimated tokens but consume 600-1200 actual tokens. This can cause context window overflow for non-English content.

Fix planned: InstrumentProfile.ModelCapacity will provide per-model tokenizers or script-aware estimation ratios.

Parameters:

Name	Type	Description	Default
`text`	`Any`	Input to estimate. Strings are measured directly. Dicts and lists are serialized to JSON. None returns 0.	required

Returns:

Type	Description
`int`	Estimated token count (always >= 0).

Source code in src/marianne/core/tokens.py

def estimate_tokens(text: Any) -> int:
    """Estimate token count for arbitrary input.

    Converts the input to a string representation and applies a conservative
    chars-per-token ratio. The estimate deliberately overestimates to prevent
    context window overflow.

    Accepted types: ``str``, ``dict``, ``list``, ``None``. Other types are
    coerced via ``str()``.

    .. warning:: CJK / Non-Latin Text Underestimation

       The ``_CHARS_PER_TOKEN = 3.5`` ratio is calibrated for English text.
       CJK characters (Chinese, Japanese, Korean) typically tokenize to
       approximately 1 token per character, meaning this function
       underestimates CJK token counts by 3.5-7x. For example, 600 CJK
       characters produce ~172 estimated tokens but consume 600-1200 actual
       tokens. This can cause context window overflow for non-English content.

       Fix planned: InstrumentProfile.ModelCapacity will provide per-model
       tokenizers or script-aware estimation ratios.

    Args:
        text: Input to estimate. Strings are measured directly. Dicts and lists
            are serialized to JSON. None returns 0.

    Returns:
        Estimated token count (always >= 0).
    """
    if text is None:
        return 0

    if isinstance(text, str):
        content = text
    elif isinstance(text, (dict, list)):
        try:
            content = json.dumps(text, default=str)
        except (ValueError, TypeError):
            content = str(text)
    else:
        content = str(text)

    if not content:
        return 0

    return math.ceil(len(content) / _CHARS_PER_TOKEN)

get_effective_window_size ¶

get_effective_window_size(model=None, instrument=None)

Get the effective input token budget for a model/instrument combination.

Returns the context window size minus output token reservation. When both model and instrument are provided, returns the minimum of the two — the instrument may impose a stricter limit than the model's native window.

For unknown models/instruments, returns a conservative default.

Parameters:

Name	Type	Description	Default
`model`	`str \| None`	Model name or identifier. None uses the default window.	`None`
`instrument`	`str \| None`	Instrument (backend) name. None imposes no instrument limit. Unknown instruments impose no additional limit.	`None`

Returns:

Type	Description
`int`	Effective input token budget (always > 0).

Source code in src/marianne/core/tokens.py

def get_effective_window_size(
    model: str | None = None,
    instrument: str | None = None,
) -> int:
    """Get the effective input token budget for a model/instrument combination.

    Returns the context window size minus output token reservation. When both
    model and instrument are provided, returns the minimum of the two — the
    instrument may impose a stricter limit than the model's native window.

    For unknown models/instruments, returns a conservative default.

    Args:
        model: Model name or identifier. None uses the default window.
        instrument: Instrument (backend) name. None imposes no instrument limit.
            Unknown instruments impose no additional limit.

    Returns:
        Effective input token budget (always > 0).
    """
    # Resolve model window (None if model not provided or unknown)
    model_window = _resolve_model_window(model)

    # Resolve instrument window (None if instrument not provided or unknown)
    instrument_window: int | None = None
    if instrument is not None:
        instrument_window = _resolve_instrument_window(instrument)

    # Resolution: both known → min; one known → that one; neither → default
    if model_window is not None and instrument_window is not None:
        return min(model_window, instrument_window)
    if instrument_window is not None:
        return instrument_window
    if model_window is not None:
        return model_window
    return _DEFAULT_EFFECTIVE_WINDOW

tokens

tokens ¶

Classes¶

BudgetAllocation dataclass ¶

Attributes¶

component instance-attribute ¶

tokens instance-attribute ¶

TokenBudgetTracker dataclass ¶

Attributes¶

window_size instance-attribute ¶

allocated property ¶

Functions¶

__post_init__ ¶

remaining ¶

utilization ¶

can_fit ¶

allocate ¶

breakdown ¶

reset ¶

Functions¶

estimate_tokens ¶

get_effective_window_size ¶

BudgetAllocation `dataclass` ¶

component `instance-attribute` ¶

tokens `instance-attribute` ¶

TokenBudgetTracker `dataclass` ¶

window_size `instance-attribute` ¶

allocated `property` ¶