gpu_probe

gpu_probe ¶

GPU probing for the Marianne daemon profiler.

Provides a GpuProbe class that follows the same "try primary → fallback → graceful None" pattern as SystemProbe. Priority chain:

pynvml — NVIDIA Management Library Python bindings (fast, no subprocess)
nvidia-smi — shell subprocess fallback
No GPU data — silent skip, returns empty list

All methods are static. _pynvml_available is set once at import time.

Classes¶

GpuMetric `dataclass` ¶

GpuMetric(index, utilization_pct, memory_used_mb, memory_total_mb, temperature_c)

Snapshot of a single GPU's current state.

GpuProbe ¶

GPU resource probes following the SystemProbe pattern.

Each method tries pynvml first, then falls back to nvidia-smi. Returns empty list when no GPU is available — callers treat that as "no GPU present" (not an error).

Functions¶

get_gpu_metrics `staticmethod` ¶

get_gpu_metrics()

Get current metrics for all GPUs.

Priority

pynvml (fast, in-process)
nvidia-smi subprocess fallback
Empty list (no GPU / no drivers)

Returns:

Type	Description
`list[GpuMetric]`	List of GpuMetric, one per GPU. Empty if no GPU available.

Source code in src/marianne/daemon/profiler/gpu_probe.py

@staticmethod
def get_gpu_metrics() -> list[GpuMetric]:
    """Get current metrics for all GPUs.

    Priority:
        1. pynvml (fast, in-process)
        2. nvidia-smi subprocess fallback
        3. Empty list (no GPU / no drivers)

    Returns:
        List of GpuMetric, one per GPU.  Empty if no GPU available.
    """
    if _pynvml_available:
        try:
            return GpuProbe._probe_pynvml()
        except Exception:
            _logger.debug("pynvml_probe_failed", exc_info=True)
    # Fallback to nvidia-smi
    try:
        return GpuProbe._probe_nvidia_smi_sync()
    except Exception:
        _logger.debug("nvidia_smi_probe_failed", exc_info=True)
    return []

get_gpu_metrics_async `async` `staticmethod` ¶

get_gpu_metrics_async()

Async variant of get_gpu_metrics.

Uses asyncio.create_subprocess_exec for the nvidia-smi fallback so it doesn't block the event loop.

Returns:

Type	Description
`list[GpuMetric]`	List of GpuMetric, one per GPU. Empty if no GPU available.

Source code in src/marianne/daemon/profiler/gpu_probe.py

@staticmethod
async def get_gpu_metrics_async() -> list[GpuMetric]:
    """Async variant of get_gpu_metrics.

    Uses ``asyncio.create_subprocess_exec`` for the nvidia-smi fallback
    so it doesn't block the event loop.

    Returns:
        List of GpuMetric, one per GPU.  Empty if no GPU available.
    """
    if _pynvml_available:
        try:
            return GpuProbe._probe_pynvml()
        except Exception:
            _logger.debug("pynvml_probe_failed", exc_info=True)
    try:
        return await GpuProbe._probe_nvidia_smi_async()
    except Exception:
        _logger.debug("nvidia_smi_async_probe_failed", exc_info=True)
    return []

is_available `staticmethod` ¶

is_available()

Check whether any GPU probing method is available.

Returns True if pynvml is importable OR nvidia-smi is on PATH.

Source code in src/marianne/daemon/profiler/gpu_probe.py

@staticmethod
def is_available() -> bool:
    """Check whether any GPU probing method is available.

    Returns True if pynvml is importable OR nvidia-smi is on PATH.
    """
    if _pynvml_available:
        return True
    return shutil.which("nvidia-smi") is not None

gpu_probe

gpu_probe ¶

Classes¶

GpuMetric dataclass ¶

GpuProbe ¶

Functions¶

get_gpu_metrics staticmethod ¶

get_gpu_metrics_async async staticmethod ¶

is_available staticmethod ¶

Functions¶

GpuMetric `dataclass` ¶

get_gpu_metrics `staticmethod` ¶

get_gpu_metrics_async `async` `staticmethod` ¶

is_available `staticmethod` ¶