collector
collector
¶
Central profiler orchestrator for the Marianne daemon.
ProfilerCollector is the heart of the profiling subsystem. It runs a
periodic collection loop that gathers system metrics, per-process data,
GPU stats, and strace summaries into SystemSnapshot objects. These
are persisted (SQLite + JSONL), fed to the AnomalyDetector, and
published to the EventBus for downstream consumers.
Lifecycle::
collector = ProfilerCollector(config, monitor, pgroup, event_bus, manager)
await collector.start()
...
snapshot = await collector.collect_snapshot()
...
await collector.stop()
Classes¶
ProfilerCollector
¶
Central orchestrator for the daemon profiler subsystem.
Coordinates:
- Periodic metric collection (system + per-process + GPU + strace)
- SQLite + JSONL persistence via MonitorStorage
- Heuristic anomaly detection via AnomalyDetector
- EventBus integration for monitor.anomaly events
- Process lifecycle tracking via sheet.started/completed/failed
Parameters¶
config:
Profiler configuration (interval, storage paths, thresholds).
monitor:
The daemon's ResourceMonitor for system-level metrics.
pgroup:
The daemon's ProcessGroupManager for child process enumeration.
event_bus:
The daemon's EventBus for publishing anomaly events and
subscribing to sheet lifecycle events.
manager:
Optional JobManager for mapping PIDs to job_id/sheet_num
and reading running job / active sheet counts.
Source code in src/marianne/daemon/profiler/collector.py
Functions¶
start
async
¶
Initialize storage, subscribe to events, start collection loop.
Source code in src/marianne/daemon/profiler/collector.py
stop
async
¶
Stop collection loop, detach strace, unsubscribe from events.
Source code in src/marianne/daemon/profiler/collector.py
collect_snapshot
async
¶
Gather all metrics into a single SystemSnapshot.
Steps: 1. System memory via SystemProbe 2. Per-process metrics via psutil (with PID → job mapping) 3. GPU metrics via GpuProbe 4. Load average via os.getloadavg() 5. Strace summaries for attached PIDs 6. Pressure level from BackpressureController 7. Running jobs / active sheets from JobManager
Source code in src/marianne/daemon/profiler/collector.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 | |
get_resource_context_for_pid
¶
Get current resource context for a specific PID.
Returns a dict suitable for embedding in sheet event data:
rss_mb, cpu_pct, syscall_hotspot, anomalies_active.
If the PID is not found in the latest snapshot, returns a dict with all values set to None/empty.
Source code in src/marianne/daemon/profiler/collector.py
get_resource_context
¶
Get general resource context (not PID-specific).
Useful when no specific PID is available for the event.
Source code in src/marianne/daemon/profiler/collector.py
get_latest_snapshot
¶
Return the latest snapshot as a JSON-serializable dict.
Used by the daemon.top IPC method.
Source code in src/marianne/daemon/profiler/collector.py
get_jsonl_path
¶
Return the JSONL streaming log path.
Used by the daemon.top.stream IPC method.
Source code in src/marianne/daemon/profiler/collector.py
get_recent_events
¶
Return recent process events as JSON-serializable dicts.
Used by the daemon.events IPC method.