Marianne Learning Architecture¶
Status: Phase 1 IMPLEMENTED and evolved through 24+ autonomous self-improvement cycles. Original design Phases 2-6 remain unbuilt proposals. Created: 2025-12-27 Initial Implementation: 2026-01-14 Last Updated: 2026-04-07
Overview¶
Marianne's learning system aggregates execution outcomes and patterns across all workspaces, enabling Marianne to learn from every job and improve retry strategies, error handling, and pattern detection over time. All learning data stays local in a SQLite database at ~/.marianne/global-learning.db. The system has evolved significantly beyond its original Phase 1 design through autonomous self-improvement cycles.
Reading guide: This document has two clearly separated parts. Part 1 describes working, tested code (~11,400 lines) that ships today — file paths and module names are verifiable against the codebase. Part 2 is the original design document for features that have not been built — the file paths, CLI commands, and config structures described there do not exist. Every section is labelled. If you are configuring or debugging Marianne, only Part 1 applies.
Part 1: Implemented System¶
Everything in this section describes working code. File paths and module names are verifiable.
Components¶
The learning system comprises ~11,400 lines of implementation across two packages:
Core Learning Package (src/marianne/learning/)¶
| Module | LOC | Purpose |
|---|---|---|
global_store.py |
89 | Re-exports from modular store package (backward compat) |
store/ (16 modules) |
7,268 | SQLite-backed global learning store (see below) |
patterns.py |
1,267 | Pattern extraction, matching, and application |
aggregator.py |
581 | Cross-workspace pattern merging |
error_hooks.py |
456 | Error classification learning and adaptive wait times |
migration.py |
450 | Workspace outcome import to global store |
judgment.py |
452 | Learning-informed decision making |
weighter.py |
304 | Priority calculation (recency + effectiveness) |
outcomes.py |
403 | Execution outcome recording |
Global Learning Store (src/marianne/learning/store/)¶
The store was originally a monolithic ~5,136-line module. It has been modularized into 16 files using a mixin architecture:
| Module | LOC | Purpose |
|---|---|---|
base.py |
922 | SQLite connection, schema, migrations, WAL mode |
models.py |
656 | Dataclasses and enums (PatternRecord, ExecutionRecord, etc.) |
patterns_crud.py |
678 | Pattern create/read/update/delete |
patterns_query.py |
289 | Pattern search and filtering |
patterns_trust.py |
248 | Trust scoring for patterns (v19 evolution) |
patterns_quarantine.py |
180 | Quarantine lifecycle (pending/quarantined/validated/retired) |
patterns_lifecycle.py |
272 | Pattern state transitions |
patterns_broadcast.py |
200 | Cross-workspace pattern sharing |
patterns_success_factors.py |
269 | Metacognitive pattern reflection (v22) |
budget.py |
975 | Exploration budget management (v23) |
drift.py |
1,025 | Effectiveness and epistemic drift detection |
executions.py |
714 | Execution outcome recording and querying |
escalation.py |
288 | Escalation decision recording |
rate_limits.py |
236 | Cross-workspace rate limit coordination |
patterns.py |
78 | Pattern mixin aggregation |
__init__.py |
238 | Package exports |
Daemon Integration¶
| Module | LOC | Purpose |
|---|---|---|
daemon/learning_hub.py |
~120 | Centralized store for all daemon jobs |
daemon/semantic_analyzer.py |
486 | AI-powered pattern analysis |
The LearningHub maintains a single GlobalLearningStore instance shared across all concurrent jobs. Pattern discoveries in Job A are immediately visible to Job B. Periodic persistence (60-second heartbeat) replaces per-write flushes.
Configuration (src/marianne/core/config/learning.py)¶
The LearningConfig Pydantic model provides these implemented configuration fields:
| Field | Default | Purpose |
|---|---|---|
enabled |
true |
Master switch for learning |
outcome_store_type |
"json" |
Backend type (json or sqlite) |
outcome_store_path |
None |
Custom path (default: workspace/.marianne-outcomes.json) |
min_confidence_threshold |
0.3 |
Below this triggers escalation |
high_confidence_threshold |
0.7 |
Above this uses completion mode |
escalation_enabled |
false |
Enable low-confidence escalation |
use_global_patterns |
true |
Query global store for patterns |
exploration_rate |
0.15 |
Epsilon-greedy exploration rate |
exploration_min_priority |
0.05 |
Floor for exploration candidates |
entropy_alert_threshold |
0.5 |
Low-diversity alert trigger |
entropy_check_interval |
100 |
Check every N applications |
auto_apply_enabled |
false |
High-trust auto-apply (deprecated flat field) |
auto_apply_trust_threshold |
0.85 |
Trust score for auto-apply (deprecated flat field) |
exploration_budget |
ExplorationBudgetConfig() |
Dynamic budget (v23) |
entropy_response |
EntropyResponseConfig() |
Auto entropy response (v23) |
auto_apply |
None |
Structured auto-apply config (v22, replaces flat fields) |
Additional config models defined in the same file:
ExplorationBudgetConfig— Dynamic exploration budget with floor/ceiling/decay (v23)EntropyResponseConfig— Automatic diversity injection when entropy drops (v23)AutoApplyConfig— Trust-aware autonomous pattern application (v22)GroundingConfig— External grounding hooks (file checksum validation)GroundingHookConfig— Individual hook configuration (currently supportsfile_checksumtype)CheckpointConfig— Proactive pre-execution checkpoints (v21)CheckpointTriggerConfig— Trigger conditions for checkpoints (sheet numbers, keywords, retry count)
Architecture¶
┌──────────────────────────────────────────────────────────────────────────┐
│ Marianne Instance │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────────────────────┐│
│ │ Runner │───▶│ Pattern │───▶│ Global Learning Store ││
│ │ (executes) │ │ Aggregator │ │ (~/.marianne/global- ││
│ └──────────────┘ └──────────────┘ │ learning.db) ││
│ │ │ └─────────────────────────────┘│
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Error Learning Hooks │ │
│ │ - Records error classifications and recoveries │ │
│ │ - Learns adaptive wait times from recovery success │ │
│ │ - Shares learned delays across workspaces │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Daemon Learning Hub │ │
│ │ - Single GlobalLearningStore shared across concurrent jobs │ │
│ │ - Instant cross-job pattern visibility │ │
│ │ - 60-second heartbeat persistence │ │
│ │ - Semantic analysis of patterns (AI-powered) │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────┘
Pattern Lifecycle¶
Patterns follow a quarantine lifecycle (v19 evolution):
PENDING → VALIDATED (proven effective through repeated application)
PENDING → QUARANTINED (flagged for review due to failures)
QUARANTINED → VALIDATED (rehabilitated after investigation)
QUARANTINED → RETIRED (permanently deactivated)
VALIDATED → RETIRED (no longer relevant)
Trust and Exploration¶
The system balances exploitation of known-good patterns with exploration of unproven ones:
- Epsilon-greedy selection:
exploration_rate(default 0.15) determines how often lower-priority patterns are tried. - Dynamic budget (v23): When entropy drops, the exploration budget auto-boosts. When entropy is healthy, it decays toward a floor.
- Trust scoring (v19): Patterns accumulate trust through successful applications. High-trust patterns (>0.85) can be auto-applied without human confirmation.
- Drift detection: Tracks effectiveness drift (are patterns degrading?) and epistemic drift (is the system's knowledge becoming stale?).
CLI Commands¶
# View global patterns
mzt patterns-list [--min-priority 0.0] [--limit N]
# Why patterns succeed (metacognitive analysis)
mzt patterns-why
# Pattern diversity metrics
mzt patterns-entropy
# Exploration budget status
mzt patterns-budget
# Learning statistics
mzt learning-stats
# Learning insights
mzt learning-insights
# Effectiveness drift detection
mzt learning-drift
# Epistemic drift detection
mzt learning-epistemic-drift
# Recent learning activity
mzt learning-activity
# Export patterns
mzt learning-export [--output FILE]
# Record evolution trajectory
mzt learning-record-evolution
# Entropy system status
mzt entropy-status
Test Coverage¶
The learning system has extensive test coverage across 16 dedicated test files (~19,500 lines):
test_global_learning.py— Comprehensive store tests (patterns, executions, trust, quarantine)test_learning_executions.py— Execution recording and similarity matchingtest_learning_budget.py— Exploration budget dynamics and entropy responsetest_learning_store_base.py— SQLite schema, migrations, WAL modetest_learning_aggregator.py— Outcome aggregation into global patternstest_learning_drift.py— Effectiveness and epistemic drift detectiontest_learning_weighter.py— Priority calculationstest_learning_migration_judgment.py— Outcome store migrationtest_error_learning_hooks.py— Error code recovery learningtest_learning_store_fk_migration.py— Foreign key constraint handlingtest_learning_e2e.py— End-to-end pattern detection and applicationtest_learning_export_filtering.py— Data export with filteringtest_daemon_learning_hub.py— Async hub lifecycletest_cli_learning.py— CLI command validationtest_cli_learning_export.py— Export command validationtest_learning_store_priority_and_fk.py— Priority calculations and FK migrations
Key Decisions¶
| Decision | Choice | Rationale |
|---|---|---|
| Global Store Location | SQLite at ~/.marianne/global-learning.db |
Single-file, no server, WAL mode for concurrent access |
| Aggregation Trigger | Immediate on job completion | Patterns available instantly for next job |
| Pattern Weighting | Combined recency + effectiveness | Recent successes weighted higher than old ones |
| Error Learning | Hook-based extension of ErrorClassifier | Non-invasive integration with existing error handling |
| Store Architecture | Mixin-based modular design | Original 5,136-line monolith was unmaintainable |
| Daemon Integration | Centralized LearningHub | Single store instance prevents SQLite lock contention |
Evolution History¶
The learning system has evolved through Marianne's autonomous self-improvement cycles. Key evolutions visible in the code:
| Evolution | Feature Added |
|---|---|
| v8 | Cross-workspace rate limit coordination |
| v11 | Escalation learning loop |
| v12 | Goal drift detection |
| v14 | Real-time pattern broadcasting |
| v19 | Pattern quarantine lifecycle, trust scoring, provenance tracking |
| v21 | Epistemic drift detection, proactive checkpoints, pattern entropy monitoring |
| v22 | Metacognitive pattern reflection (success factors), trust-aware autonomous application |
| v23 | Exploration budget maintenance, automatic entropy response |
These evolutions extended the system far beyond its original Phase 1 design. The proposed Phases 2-6 (anonymization, GitHub contribution, sync) were never implemented; instead, the system evolved in a different direction — toward self-awareness and autonomous pattern management.
Part 2: Design Proposals (Not Implemented)¶
Everything below this line describes aspirational designs from the original 2025-12-27 design document (Phases 2-6). None of this code exists. The following proposed files were never created:
adaptation.py,preflight.py,improvements.py,quality.py,anonymize.py,contribute.py,sync.py,database.py. The config structures and CLI commands shown below are proposals, not references to real code.
The original design envisioned six phases. Only Phase 1 (local learning foundation) was implemented — and the actual implementation diverged significantly from the original design, evolving through 24 autonomous self-improvement cycles into the system described in Part 1. Phases 2-6 remain unbuilt.
Proposed: Preflight Learning (Before Execution)¶
Query the local database for similar past executions before running a sheet. Identify patterns that led to failures and apply learned mitigations automatically.
# PROPOSED config — does not exist
learning:
preflight:
enabled: true
check_similar_failures: true
apply_learned_patterns: true
warn_on_risky_patterns: true
Proposed preflight output:
Sheet 3 preflight:
! Similar sheet failed 2/3 times historically
! Pattern detected: "validation_markers_missing" (80% failure rate)
Applied mitigation: Added explicit marker format to prompt
Adjusted timeout: 1800s -> 2400s (based on timing patterns)
Proposed: Mid-Run Adaptation (During Execution)¶
On validation failure, query patterns for this failure type and apply learned recovery strategies to modify the prompt for retry.
# PROPOSED config — does not exist
learning:
mid_run:
enabled: true
adapt_on_failure: true
max_adaptations: 2
Proposed: Post-Run Improvement Detection¶
Analyze completed jobs for improvement opportunities. Generate suggestions, filter through quality gates, and queue for contribution.
Proposed deliverables:
- src/marianne/learning/improvements.py — Improvement detection
- src/marianne/learning/quality.py — Quality gates
Proposed: Anonymization Layer¶
Strip PII from patterns before any external sharing. Hash identifiers for correlation without exposure.
Fields to strip: PIDs, usernames, absolute paths, API keys, environment variables, hostnames, IP addresses, stdout/stderr content.
Fields to hash: job names, workspace paths, project paths.
Fields to keep: pattern names, validation types, success rates, retry counts, config structure (no values).
Proposed deliverables:
- src/marianne/learning/anonymize.py — Anonymization logic
Proposed: GitHub Contribution Pipeline¶
Automatically generate PRs from locally-learned improvements:
- Improvement detected (pattern X causes failure, modification Y fixes it)
- Quality gate (minimum sample size, minimum confidence, not already contributed)
- Anonymization (strip PII, hash identifiers, normalize paths)
- Contribution preparation (generate diff, write evidence summary)
- GitHub PR creation (fork, branch, commit, open PR)
- Human review by maintainer
Proposed deliverables:
- src/marianne/learning/contribute.py — GitHub contribution logic
Proposed: Learning Sync¶
Pull merged learnings from the GitHub repository and apply to the local database.
Proposed CLI command: mzt learning sync
Proposed deliverables:
- src/marianne/learning/sync.py — Learning synchronization
Proposed: Centralized Learning Store¶
Three options were considered:
Option A: Shared PostgreSQL — Real-time sharing, centralized statistics. Cons: requires network, privacy concerns, hosting cost.
Option B: Git-Based Federation (recommended for start) — Works offline, full transparency, human review via PRs. Cons: async, requires GitHub access.
Option C: Hybrid — Local SQLite syncs periodically to central PostgreSQL, materialized to git for transparency.
Open Design Questions¶
- Should code changes ever be auto-contributed? Current proposal: No, always requires PR review.
- How to handle conflicting learnings? Different instances may learn opposite things. Needs conflict resolution.
- Rate limiting contributions? Prevent spam from misconfigured instances.
- Versioning learnings? Patterns may become obsolete as Marianne evolves.
Initial implementation 2026-01-14. System evolved through 24+ autonomous cycles (v8-v23). Design proposals (Phases 2-6) remain unimplemented as of 2026-04-07.