pgroup
pgroup
¶
Process group management for orphan prevention.
Addresses issue #38: cascading crashes from orphaned MCP server processes. When the daemon is the process group leader, all child processes (backends, validators, MCP servers) belong to the same pgid, enabling clean group-wide shutdown via os.killpg().
Lifecycle
- setup() — create new process group (daemon becomes leader)
- cleanup_orphans() — scan daemon's child tree for leaked processes
- reap_orphaned_backends() — system-wide scan for leaked backend children
- kill_all_children() — shutdown: signal entire group
- atexit handler — last-resort cleanup if shutdown is interrupted
The distinction between cleanup_orphans() and reap_orphaned_backends(): cleanup_orphans() walks the daemon's own child tree via psutil. This misses processes that were reparented to init (PID 1) after their parent (e.g. Claude CLI) exited — which is exactly how MCP/LSP server leaks happen. reap_orphaned_backends() scans ALL processes owned by the current user for known orphan patterns.
Classes¶
ProcessGroupManager
¶
Manages the daemon's process group to prevent orphans.
The daemon calls setup() early in its lifecycle to become the process group leader. During shutdown, kill_all_children() sends SIGTERM to the entire group, ensuring no child process (including deeply nested MCP servers) survives the daemon.
An atexit handler provides last-resort cleanup even if the normal shutdown path is skipped.
Source code in src/marianne/daemon/pgroup.py
Attributes¶
Functions¶
track_backend_pid
¶
Register a backend process PID for orphan tracking.
When a backend (claude, gemini-cli, etc.) spawns a process, call this with the process PID. On cleanup, any surviving children of dead tracked PIDs are killed as orphans — regardless of what they're called. This replaces cmdline pattern matching with ancestry-based detection.
Source code in src/marianne/daemon/pgroup.py
untrack_backend_pid
¶
Remove a backend PID from tracking after clean exit.
setup
¶
Create a new process group with the daemon as leader.
Must be called early in daemon startup, before spawning any child processes. Idempotent — safe to call multiple times.
Source code in src/marianne/daemon/pgroup.py
kill_all_children
¶
Send signal to all processes in our group except ourselves.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sig
|
int
|
Signal number to send (default SIGTERM for graceful stop). |
SIGTERM
|
Returns:
| Type | Description |
|---|---|
int
|
The process group ID that was signaled, or 0 if no signal sent. |
Source code in src/marianne/daemon/pgroup.py
cleanup_orphans
¶
Find and clean up orphaned child processes in the daemon's tree.
Detects two categories: 1. Zombie children — reaped via waitpid 2. Orphaned MCP servers — processes whose parent has died (reparented to init/PID 1) that still match MCP patterns
Note: This only scans the daemon's own child tree. For processes that escaped the tree entirely (reparented to init), use reap_orphaned_backends() which does a system-wide scan.
Returns:
| Type | Description |
|---|---|
list[int]
|
List of PIDs that were cleaned up. |
Source code in src/marianne/daemon/pgroup.py
reap_orphaned_backends
¶
System-wide scan for orphaned backend child processes.
.. warning:: DISABLED — This method is a no-op.
The F-481 rewrite removed cmdline pattern filtering and replaced
it with ancestry-only detection (ppid in {0, 1}). Without
filtering, this kills EVERY user-owned process parented by
init/systemd — including the user's systemd session manager,
terminal emulators, and dbus. On WSL2, killing systemd
--user cascades into systemd-poweroff.service and shuts
down the entire VM (observed 9 times, exit code 9, all
terminals dead).
The replacement is per-job PID tracking in the conductor DB (see composer-notes.yaml "PROCESS CLEANUP SIMPLIFICATION"). Until that's implemented, orphaned MCP/LSP servers from dead backends accumulate but don't crash the system.
Returns:
| Type | Description |
|---|---|
list[int]
|
Empty list (no-op). |