Skip to content

tokens

tokens

Token estimation and budget tracking for prompt assembly.

Provides a centralized token estimation utility and a budget tracker that enforces context window limits during prompt construction. This is the single source of truth for token estimation — all other modules (preflight, backends) should import from here rather than maintaining their own ratios.

The estimation uses a conservative chars-per-token ratio (3.5) that deliberately overestimates token counts by ~15%. This is intentional: underestimation causes context window overflow (agent gets truncated mid-instruction), while overestimation merely wastes budget (safe).

Classes

BudgetAllocation dataclass

BudgetAllocation(component, tokens)

A single allocation within the token budget.

Attributes
component instance-attribute
component

Name of the prompt component (e.g., 'template', 'patterns', 'specs').

tokens instance-attribute
tokens

Estimated token count for this allocation.

TokenBudgetTracker dataclass

TokenBudgetTracker(window_size, _allocations=list())

Tracks token budget usage during prompt assembly.

Enforces context window limits by tracking allocations as prompt components are added. Each allocation is named (e.g., 'template', 'patterns', 'specs') for diagnostic visibility via breakdown().

The tracker prevents silent over-allocation: allocate() returns False when content would exceed the remaining budget, and can_fit() checks without side effects.

Attributes
window_size instance-attribute
window_size

Total token budget (effective context window).

allocated property
allocated

Total tokens allocated so far.

Functions
__post_init__
__post_init__()

Validate window_size is non-negative.

Source code in src/marianne/core/tokens.py
def __post_init__(self) -> None:
    """Validate window_size is non-negative."""
    if self.window_size < 0:
        raise ValueError(f"window_size must be >= 0, got {self.window_size}")
remaining
remaining()

Tokens remaining in the budget.

Returns:

Type Description
int

Non-negative remaining token count. Never goes below 0

int

even if over-allocation somehow occurred.

Source code in src/marianne/core/tokens.py
def remaining(self) -> int:
    """Tokens remaining in the budget.

    Returns:
        Non-negative remaining token count. Never goes below 0
        even if over-allocation somehow occurred.
    """
    return max(0, self.window_size - self.allocated)
utilization
utilization()

Fraction of budget used (0.0 to 1.0).

Returns:

Type Description
float

Utilization ratio. Returns 0.0 for zero-budget trackers

float

to avoid division by zero.

Source code in src/marianne/core/tokens.py
def utilization(self) -> float:
    """Fraction of budget used (0.0 to 1.0).

    Returns:
        Utilization ratio. Returns 0.0 for zero-budget trackers
        to avoid division by zero.
    """
    if self.window_size == 0:
        return 0.0
    return min(1.0, self.allocated / self.window_size)
can_fit
can_fit(text)

Check if content fits within the remaining budget.

Does not modify the tracker state.

Parameters:

Name Type Description Default
text Any

Content to check.

required

Returns:

Type Description
bool

True if the estimated token count fits within remaining budget.

Source code in src/marianne/core/tokens.py
def can_fit(self, text: Any) -> bool:
    """Check if content fits within the remaining budget.

    Does not modify the tracker state.

    Args:
        text: Content to check.

    Returns:
        True if the estimated token count fits within remaining budget.
    """
    tokens = estimate_tokens(text)
    return tokens <= self.remaining()
allocate
allocate(text, component)

Allocate tokens for a prompt component.

If the content fits within the remaining budget, the allocation is recorded and True is returned. If it does not fit, the allocation is rejected and False is returned — no state is modified.

Parameters:

Name Type Description Default
text Any

Content to allocate budget for.

required
component str

Name of the prompt component (for diagnostics).

required

Returns:

Type Description
bool

True if allocation succeeded, False if it would exceed budget.

Source code in src/marianne/core/tokens.py
def allocate(
    self,
    text: Any,
    component: str,
) -> bool:
    """Allocate tokens for a prompt component.

    If the content fits within the remaining budget, the allocation is
    recorded and True is returned. If it does not fit, the allocation
    is rejected and False is returned — no state is modified.

    Args:
        text: Content to allocate budget for.
        component: Name of the prompt component (for diagnostics).

    Returns:
        True if allocation succeeded, False if it would exceed budget.
    """
    tokens = estimate_tokens(text)
    if tokens > self.remaining():
        _logger.debug(
            "budget_allocation_rejected",
            component=component,
            requested_tokens=tokens,
            remaining_tokens=self.remaining(),
            window_size=self.window_size,
        )
        return False

    self._allocations.append(BudgetAllocation(component=component, tokens=tokens))
    _logger.debug(
        "budget_allocated",
        component=component,
        tokens=tokens,
        remaining=self.remaining(),
        utilization=f"{self.utilization():.1%}",
    )
    return True
breakdown
breakdown()

Get per-component token allocation breakdown.

Returns:

Type Description
dict[str, int]

Dict mapping component names to their allocated token counts.

dict[str, int]

Components with multiple allocations are summed.

Source code in src/marianne/core/tokens.py
def breakdown(self) -> dict[str, int]:
    """Get per-component token allocation breakdown.

    Returns:
        Dict mapping component names to their allocated token counts.
        Components with multiple allocations are summed.
    """
    result: dict[str, int] = {}
    for alloc in self._allocations:
        result[alloc.component] = result.get(alloc.component, 0) + alloc.tokens
    return result
reset
reset()

Clear all allocations, restoring the full budget.

Source code in src/marianne/core/tokens.py
def reset(self) -> None:
    """Clear all allocations, restoring the full budget."""
    self._allocations.clear()

Functions

estimate_tokens

estimate_tokens(text)

Estimate token count for arbitrary input.

Converts the input to a string representation and applies a conservative chars-per-token ratio. The estimate deliberately overestimates to prevent context window overflow.

Accepted types: str, dict, list, None. Other types are coerced via str().

.. warning:: CJK / Non-Latin Text Underestimation

The _CHARS_PER_TOKEN = 3.5 ratio is calibrated for English text. CJK characters (Chinese, Japanese, Korean) typically tokenize to approximately 1 token per character, meaning this function underestimates CJK token counts by 3.5-7x. For example, 600 CJK characters produce ~172 estimated tokens but consume 600-1200 actual tokens. This can cause context window overflow for non-English content.

Fix planned: InstrumentProfile.ModelCapacity will provide per-model tokenizers or script-aware estimation ratios.

Parameters:

Name Type Description Default
text Any

Input to estimate. Strings are measured directly. Dicts and lists are serialized to JSON. None returns 0.

required

Returns:

Type Description
int

Estimated token count (always >= 0).

Source code in src/marianne/core/tokens.py
def estimate_tokens(text: Any) -> int:
    """Estimate token count for arbitrary input.

    Converts the input to a string representation and applies a conservative
    chars-per-token ratio. The estimate deliberately overestimates to prevent
    context window overflow.

    Accepted types: ``str``, ``dict``, ``list``, ``None``. Other types are
    coerced via ``str()``.

    .. warning:: CJK / Non-Latin Text Underestimation

       The ``_CHARS_PER_TOKEN = 3.5`` ratio is calibrated for English text.
       CJK characters (Chinese, Japanese, Korean) typically tokenize to
       approximately 1 token per character, meaning this function
       underestimates CJK token counts by 3.5-7x. For example, 600 CJK
       characters produce ~172 estimated tokens but consume 600-1200 actual
       tokens. This can cause context window overflow for non-English content.

       Fix planned: InstrumentProfile.ModelCapacity will provide per-model
       tokenizers or script-aware estimation ratios.

    Args:
        text: Input to estimate. Strings are measured directly. Dicts and lists
            are serialized to JSON. None returns 0.

    Returns:
        Estimated token count (always >= 0).
    """
    if text is None:
        return 0

    if isinstance(text, str):
        content = text
    elif isinstance(text, (dict, list)):
        try:
            content = json.dumps(text, default=str)
        except (ValueError, TypeError):
            content = str(text)
    else:
        content = str(text)

    if not content:
        return 0

    return math.ceil(len(content) / _CHARS_PER_TOKEN)

get_effective_window_size

get_effective_window_size(model=None, instrument=None)

Get the effective input token budget for a model/instrument combination.

Returns the context window size minus output token reservation. When both model and instrument are provided, returns the minimum of the two — the instrument may impose a stricter limit than the model's native window.

For unknown models/instruments, returns a conservative default.

Parameters:

Name Type Description Default
model str | None

Model name or identifier. None uses the default window.

None
instrument str | None

Instrument (backend) name. None imposes no instrument limit. Unknown instruments impose no additional limit.

None

Returns:

Type Description
int

Effective input token budget (always > 0).

Source code in src/marianne/core/tokens.py
def get_effective_window_size(
    model: str | None = None,
    instrument: str | None = None,
) -> int:
    """Get the effective input token budget for a model/instrument combination.

    Returns the context window size minus output token reservation. When both
    model and instrument are provided, returns the minimum of the two — the
    instrument may impose a stricter limit than the model's native window.

    For unknown models/instruments, returns a conservative default.

    Args:
        model: Model name or identifier. None uses the default window.
        instrument: Instrument (backend) name. None imposes no instrument limit.
            Unknown instruments impose no additional limit.

    Returns:
        Effective input token budget (always > 0).
    """
    # Resolve model window (None if model not provided or unknown)
    model_window = _resolve_model_window(model)

    # Resolve instrument window (None if instrument not provided or unknown)
    instrument_window: int | None = None
    if instrument is not None:
        instrument_window = _resolve_instrument_window(instrument)

    # Resolution: both known → min; one known → that one; neither → default
    if model_window is not None and instrument_window is not None:
        return min(model_window, instrument_window)
    if instrument_window is not None:
        return instrument_window
    if model_window is not None:
        return model_window
    return _DEFAULT_EFFECTIVE_WINDOW