Linux System Metrics Details
Consult the reference for the metrics reported by the System Metrics Source on Linux.
Events generated by the System Metrics Source have metrics metadata to designate dimension and metric fields. The host field contains the hostname, and is included as a dimension in all of them. The collectors include:
In the Source’s configuration modal, you can set the level of detail for each type of metrics:
- Basic enables minimal metrics, averaged or aggregated.
- All enables full, detailed metrics, specified for individual CPUs, interfaces, and so on.
- Custom displays sub-menus and buttons from which you can choose a level of detail (Basic, All, Custom, or Disabled) for each type of event.
- Disabled means that no metrics will be generated.
Basic and Custom have different meanings depending on event type and will be described under each section below.
The tables outline the metrics emitted for each mode (Basic or Custom) and where applicable, the dimensions (to indicate where the metrics are coming from).
System
With System Metrics enabled, Cribl Edge captures CPU load averages, uptime, and count. The Custom option allows you to include process metrics that reflect the numbers of processes in various states.
Metrics for the overall system include the following:
| Name | Description | Type | Dimensions | Mode |
|---|---|---|---|---|
node_uname_info | Labeled system information as provided by the uname system call. | Counter | release, sysname, version | Basic |
node_cpu_count | The number of CPU cores. | Gauge | release, sysname, version | Basic |
node_uptime_seconds | System uptime in seconds. | Counter | N/A | Basic |
node_boot_time_seconds | Node boot time in Unix time. | Counter | N/A | Basic |
node_time_seconds | System time in seconds. | Counter | N/A | Basic |
node_load1 | 1m load average. | Gauge | N/A | Basic |
node_load5 | 5m load average. | Gauge | N/A | Basic |
node_load15 | 15m load average. | Gauge | N/A | Basic |
node_open_fds | Open file descriptors | Counter | N/A | Basic |
node_processes_state_all | Total number of processes in different states. | Gauge | state | Basic |
node_processes_threads | Allocated threads in system. | Gauge | state | Basic |
node_procs_blocked | Number of processes blocked waiting for I/O to complete. | Gauge | state | Basic |
node_procs_running | Number of processes in runnable state. | Gauge | state | Basic |
node_processes_state | Number of processes in each state. | Gauge | state | Custom: Process metrics |
CPU
Cribl Edge captures active, user, system, idle, and iowait percentages over all CPUs, with options to add per-CPU metrics and raw time counters for each state.
Metrics for CPUs include the following:
| Name | Description | Type | Dimensions | Mode |
|---|---|---|---|---|
node_cpu_percent_active_all | Percent all the CPUs spent in activity. | Gauge | N/A | Basic |
node_cpu_seconds_active_all_total | Seconds all the CPUs spent in activity (excluding idle and wait). | Counter | N/A | Custom: CPU time metrics |
node_cpu_seconds_active_total | Seconds each CPU spent in activity (excluding idle and wait). | Counter | cpu | Custom: CPU time metrics |
node_cpu_seconds_all_total | Seconds for all CPUs usage. | Counter | mode | Custom: CPU time metrics |
node_cpu_seconds_total | Seconds for each CPU’s usage. | Counter | cpu, mode | Custom: CPU time metrics |
node_cpu_percent_active | Percent each CPU spent in activity. | Gauge | cpu | Custom: Per CPU or Detailed metrics |
node_cpu_percent_all | Percent CPU usage for all. | Gauge | mode,user | Custom: Detailed metrics |
node_cpu_percent | Percent CPU usage for each CPU. | Gauge | cpu,mode,user | Custom: Per CPU or Detailed metrics |
- The
Per CPUoption adds metrics with thecpudimension. - The
Detailed metricsoption adds metrics withmodedimension set to:irq,softirq,steal,guest,guest_nice, andnice.
Memory
With System Metrics enabled, Cribl Edge captures memory metrics including total, used, available, swap_free, and swap_total, with the option to toggle all memory states.
Metrics for memory include the following:
| Name | Description | Type | Dimensions | Mode |
|---|---|---|---|---|
node_memory_MemTotal_bytes | Memory information field MemTotal_bytes. | Gauge | N/A | Basic |
node_memory_Used_bytes | Used memory in bytes. | Gauge | N/A | Basic |
node_memory_Used_percent | Percent used memory. | Gauge | N/A | Basic |
node_memory_MemAvailable_bytes | Memory information field MemAvailable_bytes. | Gauge | N/A | Basic |
node_memory_MemAvailable_percent | Percent memory available. | Gauge | N/A | Basic |
node_memory_SwapFree_bytes | Memory information field SwapFree_bytes. | Gauge | N/A | Basic |
node_memory_SwapTotal_bytes | Memory information field SwapTotal_bytes. | Gauge | N/A | Basic |
node_vmstat_oom_kill | /proc/vmstat information field oom_kill. | Gauge | N/A | Basic |
node_vmstat_pgfault | /proc/vmstat information field pgfault. | Gauge | N/A | Basic |
node_vmstat_pgmajfault | /proc/vmstat information field pgmajfault. | Gauge | N/A | Basic |
node_vmstat_pgpgin | /proc/vmstat information field pgpgin. | Gauge | N/A | Basic |
node_vmstat_pgpgout | /proc/vmstat information field pgpgout. | Gauge | N/A | Basic |
node_vmstat_pswpin | /proc/vmstat information field pswpin. | Gauge | N/A | Basic |
node_vmstat_pswpout | /proc/vmstat information field pswpout. | Gauge | N/A | Basic |
node_memory_Active_bytes | Memory information field Active_bytes. | Gauge | N/A | Custom: Detailed metrics |
node_memory_Buffers_bytes | Memory information field Buffers_bytes. | Gauge | N/A | Custom: Detailed metrics |
node_memory_Cached_bytes | Memory information field Cached_bytes. | Gauge | N/A | Custom: Detailed metrics |
node_memory_Dirty_bytes | Memory information field Dirty_bytes | Gauge | N/A | Custom: Detailed metrics |
node_memory_MemFree_bytes | Memory information field MemFree_bytes. | Gauge | N/A | Custom: Detailed metrics |
node_memory_SwapCached_bytes | Memory information field SwapCached_bytes. | Gauge | N/A | Custom: Detailed metrics |
Network
With System Metrics enabled, Cribl Edge captures bytes, packets, errors, and network connections over all interfaces. The Custom option allows you to filter interfaces, and to decide whether to select per-interface metrics and generate protocol metrics.
Metrics for networks include the following:
| Name | Description | Type | Dimensions | Mode |
|---|---|---|---|---|
node_network_receive_bytes_all_total | Network device statistic receive_bytes. | Counter | N/A | Basic |
node_network_receive_errs_all_total | Network device statistic receive_errs. | Counter | N/A | Basic |
node_network_receive_packets_all_total | Network device statistic receive_packets. | Counter | N/A | Basic |
node_network_transmit_bytes_all_total | Network device statistic transmit_bytes. | Counter | N/A | Basic |
node_network_transmit_errs_all_total | Network device statistic transmit_errs. | Counter | N/A | Basic |
node_network_transmit_packets_all_total | Network device statistic transmit_packets. | Counter | N/A | Basic |
node_socket_tcp_established_total | TCP established connections. | Counter | N/A | Basic |
node_network_receive_bytes_total | Network device statistic receive_bytes per interface. | Counter | device | Custom: Per Interface |
node_network_receive_errs_total | Network device statistic receive_errs per interface. | Counter | device | Custom: Per Interface |
node_network_receive_packets_total | Network device statistic receive_packets per interface. | Counter | device | Custom: Per Interface |
node_network_transmit_bytes_total | Network device statistic transmit_bytes per interface. | Counter | device | Custom: Per Interface |
node_network_transmit_errs_total | Network device statistic transmit_errs per interface. | Counter | device | Custom: Per Interface |
node_network_transmit_packets_total | Network device statistic transmit_packets per interface. | Counter | device | Custom: Per Interface |
node_network_receive_drop_all_total | Network device statistic receive_drop. | Counter | N/A | Custom: Detailed Metrics |
node_network_receive_drop_total | Network device statistic receive_drop per interface. | Counter | device | Custom: Detailed Metrics |
node_network_transmit_drop_all_total | Network device statistic transmit_drop. | Counter | N/A | Custom: Detailed Metrics |
node_network_transmit_drop_total | Network device statistic transmit_drop per interface. | Counter | device | Custom: Detailed Metrics |
node_socket_tcp_syn_sent_total | TCP sent packets total. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_syn_recv_total | TCP received packets total. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_fin_wait1_total | Total connections waiting for termination request from remote TCP. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_fin_wait2_total | Active TCP connections to be shut down. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_time_wait_total | Length of time to pass to be sure the remote TCP received the acknowledgement to terminate. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_close_total | Total TCP sockets with closed connections. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_last_ack_total | Total TCP sockets in state before the TCP connection is closed. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_listen_total | Total TCP sockets waiting for a connection request from any remote TCP/port. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_closing_total | Total TCP sockets waiting for connection termination request acknowledgement. | Counter | N/A | Custom: Detailed Metrics |
node_socket_tcp_none_total | Number of TCP sockets with no connections. | Counter | N/A | Custom: Detailed Metrics |
node_socket_udp_total | Number of UDP sockets in use. | Counter | N/A | Custom: Detailed Metrics |
Disk
With System Metrics enabled, Cribl Edge captures disk-used metrics - in percent, bytes read and written, and read and write operations - over all mounted disks. The Custom option allows you to filter devices, mountpoint, and filesystem type, and to decide whether to select per-device metrics and generate detailed metrics.
Metrics for Disk include the following:
| Name | Description | Type | Dimensions | Mode |
|---|---|---|---|---|
node_disk_reads_completed_all_total | Total number of reads completed successfully. | Counter | N/A | Basic |
node_disk_read_bytes_all_total | Total number of bytes read successfully. | Counter | N/A | Basic |
node_disk_writes_completed_all_total | Network device statistic receive_packets. | Counter | N/A | Basic |
node_disk_written_bytes_all_total | Total number of bytes written successfully. | Counter | N/A | Basic |
node_filesystem_size_bytes_all | Filesystem size in bytes. | Gauge | N/A | Basic |
node_filesystem_avail_bytes_all | Filesystem space available to non-root users in bytes. | Gauge | N/A | Basic |
node_filesystem_used_bytes_all | Filesystem used space in bytes. | Gauge | N/A | Basic |
node_filesystem_used_percent_all | Percent filesystem used space. | Gauge | N/A | Basic |
node_filesystem_files_free_all | Filesystem free file nodes. | Gauge | N/A | Basic |
node_filesystem_files_used_all | Filesystem total used file nodes. | Gauge | N/A | Basic |
node_filesystem_files_used_percent_all | Percent Filesystem used in all disks. | Gauge | N/A | Basic |
node_filesystem_size_bytes_all | Filesystem size in bytes. | Gauge | N/A | Basic |
node_filesystem_used_bytes | Filesystem used space in bytes per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_filesystem_used_percent | Percent Filesystem used per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_reads_completed_total | Total number of reads completed successfully per disk. | Counter | device | Custom: Per device metrics |
node_disk_read_bytes_total | Total number of bytes read successfully per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics |
node_disk_writes_completed_total | Network device statistic receive_packets per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics |
node_disk_written_bytes_total | Total number of bytes written successfully per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics |
node_disk_discards_completed_all_total | Total number of discards completed successfully. | Counter | N/A | Custom: Detailed Metrics |
node_disk_discards_completed_total | Total number of discards completed successfully per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_discards_merged_all_total | Total number of discards merged. | Counter | N/A | Custom:Detailed metrics |
node_disk_read_time_seconds_total | Total number of seconds spent by all reads per disk. | Counter | device, fstype, mountpoint | Custom:Detailed metrics |
node_disk_write_time_seconds_all_total | Total number of seconds spent by all writes. | Counter | N/A | Custom: Detailed Metrics |
node_disk_read_time_seconds_all_total | Total number of seconds spent by all reads. | Counter | N/A | Custom: Detailed Metrics |
node_disk_write_time_seconds_total | Total number of seconds spent by all writes per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_reads_merged_all_total | Total number of reads merged. | Counter | N/A | Custom: Detailed Metrics |
node_disk_writes_merged_all_total | Total number of writes merged. | Counter | N/A | Custom: Detailed metrics |
node_disk_reads_merged_total | Total number of reads merged per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_writes_merged_total | Total number of writes merged per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_discards_merged_total | Total number of discards merged per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_io_time_seconds_all_total | Total seconds spent doing I/Os. | Counter | N/A | Custom: Detailed Metrics |
node_disk_io_time_seconds_total | Total seconds spent doing I/Os per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_io_now_all | The number of I/Os currently in progress. | Gauge | N/A | Custom: Detailed Metrics |
node_disk_io_now | The number of I/Os currently in progress per disk. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_disk_io_time_weighted_seconds_all_total | Weighted number of seconds spent doing I/Os per device. | Counter | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_filesystem_size_bytes | Filesystem size in bytes per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_filesystem_avail_bytes | Filesystem space available to non-root users in bytes per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_filesystem_files_free | Filesystem free file nodes per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_filesystem_files_used_percent | Percent Filesystem used per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
node_filesystem_size_bytes | Filesystem size in bytes per disk. | Gauge | device, fstype, mountpoint | Custom: Per device metrics or Detailed metrics |
Container
With System Metrics enabled, Cribl Edge generates Docker information with CPU, memory, network, and disk metrics for running containers. Optionally, you can customize which containers to generate metrics from.
Metrics for Container include the following:
| Name | Description | Type | Dimensions | Mode |
|---|---|---|---|---|
container_start_time_seconds | Unix time (seconds) when the container was started. | Counter | N/A | Basic |
container_finish_time_seconds | Unix time (seconds) when the container was stopped. Only for non-running containers. | Counter | N/A | Basic |
container_fs_reads_bytes_all_total | Total bytes read for all disk devices. | Counter | N/A | Basic |
container_memory_usage_percent | Percent of available memory being used. | Gauge | N/A | Basic |
container_network_receive_bytes_all_total | Total bytes received for all network interfaces. | Counter | N/A | Basic |
container_network_receive_errors_all_total | Total number of errors received for all network interfaces. | Counter | N/A | Basic |
container_network_receive_packets_all_total | Total number of packets received for all network interfaces. | Counter | N/A | Basic |
container_network_transmit_bytes_all_total | Total bytes transmitted for all network interfaces. | Counter | N/A | Basic |
container_network_transmit_errors_all_total | Total number of errors transmitted for all network interfaces. | Counter | N/A | Basic |
container_network_transmit_packets_all_total | Total number of packets transmitted for all network interfaces. | Counter | N/A | Basic |
container_memory_total_bytes | Total number of memory bytes available. | Counter | N/A | Basic |
container_cpu_user_seconds_total | Number of seconds the container has been on the CPU running user code. | Counter | cpu | Custom: Per device metrics or Detailed metrics |
container_cpu_system_seconds_total | Number of seconds the container has been on the CPU running kernel code. | Counter | cpu | Custom: Per device metrics or Detailed metrics |
container_fs_reads_bytes_total | Total bytes read per device | Counter | device | Custom: Per device metrics or Detailed metrics |
container_fs_writes_bytes_all_total | Total bytes written for all disk devices | Counter | N/A | Custom: Detailed metrics |
container_fs_writes_bytes_total | Total bytes written per device | Counter | device | Custom: Per device metrics or Detailed metrics |
container_fs_reads_all_total | Total number of read operations for all disk devices. | Counter | N/A | Custom: Detailed metrics |
container_fs_writes_total | Total number write operations per device. | Counter | device | Custom: Per device metrics or Detailed metrics |
container_memory_mapped_file | Total bytes writted for all disk devices. | Counter | N/A | Custom: Detailed metrics |
container_memory_max_usage_bytes | Highest seen value of the container_memory_usage_bytes metric. | Counter | N/A | Custom: Detailed metrics |
container_memory_pgin | Total number of memory page-in events. | Counter | N/A | Custom: Detailed metrics |
container_mem.pgpgout | Total number of memory pages paged out. | Counter | N/A | Custom: Detailed metrics |
container_memory_pgfault | Total number of major page faults. | Counter | N/A | Custom: Detailed metrics |
container_memory_pgmajfault | Total number of minor page faults. | Counter | N/A | Custom: Detailed metrics |
container_memory_usage_bytes | Number of memory bytes used. | Counter | N/A | Custom: Detailed metrics |
container_network_receive_dropped_all_total | Total number of receives dropped for all network interfaces. | Counter | N/A | Custom: Detailed metrics |
container_network_transmit_dropped_all_total | Total number of transmits dropped for all network interfaces. | Counter | N/A | Custom: Detailed metrics |
Process Metrics
With Process Metrics enabled, Cribl Edge captures process-specific metrics from Linux servers and reports them as events. This allows you to monitor specific processes on Cribl.Cloud instances. You can generate events for any process object.
For information on how to configure the System Metrics Source to generate process-specific metrics, check out the Process Metrics section of the System Metrics page.
Process-specific metrics are not affected by the Host Metrics detail setting.
Process-specific metrics include the following:
| Name | Description | Type | Dimensions |
|---|---|---|---|
process_num_threads | The number of threads. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_open_filedesc | The number of file descriptors. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_write_bytes | The number of bytes which this process caused to be sent to the storage layer. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_read_bytes | The number of bytes this process actually fetched from the storage layer. This number is accurate for block-backed filesystems. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_major_page_faults | The number of major faults for this process that required loading a memory page from disk. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_minor_page_faults | The number of minor faults for this process that have not required loading a memory page from disk. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_voluntary_context_switches | The number of voluntary context switches. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_nonvoluntary_context_switches | The number of involuntary context switches. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_cpu_usage | The process’s CPU usage, expressed as a percentage of total CPU power. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_cpu_seconds | The process’s CPU usage, based on user time and system time. | Counter | process_cmdline, process_set, process_uid, process_gid, process_service |
process_resident_memory_bytes | The amount of memory used, in bytes. Includes the pages that count toward text, data, or stack space. Does not include pages that haven’t been demand-loaded in or are swapped out. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_virtual_memory_bytes | The process’s virtual memory size. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_swapped_memory_bytes | The process’s swapped memory size. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_memory_bytes | The total amount of memory used by the process, in bytes. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_memory_usage | The total amount of memory used by the process, as a percentage of total memory. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
process_start_time | The time that the process started, derived by adding the start time to the boot time, making it relative to epoch. | Gauge | process_cmdline, process_set, process_uid, process_gid, process_service |
GPU Metrics
With GPU Metrics enabled, Cribl Edge captures temperature, utilization, memory, power, clock, and throttle metrics from Nvidia GPUs.
The All option adds per-GPU events with identifying dimensions (gpu_index, gpu_name, gpu_uuid) and detailed PCIe, ECC, encoder, and power metrics.
In Basic mode, or when Per GPU metrics is disabled in Custom mode,
the collector emits a single aggregated metric event for all GPUs during each collection interval.
These metrics use the same name with the _all suffix, such as node_gpu_temperature_celsius_all.
Across GPUs, those _all values use sum, average, or maximum, depending on how the metric is defined.
The node_gpu_count gauge appears only on the aggregated event and does not have an _all equivalent.
| Name | Description | Type | Mode |
|---|---|---|---|
node_gpu_count | Number of NVIDIA GPUs detected (aggregated event only). | Gauge | Basic |
node_gpu_temperature_celsius | GPU core temperature in degrees Celsius. | Gauge | All or Custom: Per GPU |
node_gpu_temperature_memory_celsius | GPU memory temperature in degrees Celsius. | Gauge | All or Custom: Per GPU |
node_gpu_fan_speed_percent | Fan speed as a percent of maximum. | Gauge | All or Custom: Per GPU |
node_gpu_utilization_gpu_percent | Percent utilization of GPU compute. | Gauge | All or Custom: Per GPU |
node_gpu_utilization_memory_percent | Percent utilization of GPU memory. | Gauge | All or Custom: Per GPU |
node_gpu_utilization_encoder_percent | Percent utilization of the hardware video encoder. | Gauge | All or Custom: Per GPU |
node_gpu_utilization_decoder_percent | Percent utilization of the hardware video decoder. | Gauge | All or Custom: Per GPU |
node_gpu_power_draw_watts | Instantaneous GPU power draw in watts (see also average and instantaneous detail metrics). | Gauge | All or Custom: Per GPU |
node_gpu_power_limit_watts | Configured GPU power limit in watts. | Gauge | All or Custom: Per GPU |
node_gpu_memory_total_bytes | Total GPU framebuffer memory size in bytes. | Gauge | All or Custom: Per GPU |
node_gpu_memory_used_bytes | GPU memory used in bytes. | Gauge | All or Custom: Per GPU |
node_gpu_memory_free_bytes | GPU memory free in bytes. | Gauge | All or Custom: Per GPU |
node_gpu_memory_reserved_bytes | GPU memory reserved but not actively used in bytes. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_gr_mhz | GPU graphics clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_sm_mhz | Streaming multiprocessor clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_mem_mhz | GPU memory clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_video_mhz | Video clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_max_gr_mhz | Maximum advertised graphics clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_max_mem_mhz | Maximum advertised memory clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_clocks_max_sm_mhz | Maximum advertised streaming multiprocessor clock in megahertz. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_gpu_idle | Whether the GPU idle clock throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_applications_clocks_setting | Whether the applications clocks-setting throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_sw_power_cap | Whether the software power cap throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_hw_thermal_slowdown | Whether the hardware thermal slowdown throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_hw_power_brake_slowdown | Whether the hardware power brake slowdown throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_sw_thermal_slowdown | Whether the software thermal slowdown throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_sync_boost | Whether the synchronous boost throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_throttle_hw_slowdown | Whether the hardware slowdown throttle reason is active; 1 = active, 0 = not active. | Gauge | All or Custom: Per GPU |
node_gpu_pcie_gen_current | Current PCIe link generation. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_pcie_gen_gpucurrent | Negotiated PCIe link generation reported by the GPU. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_pcie_gen_max | Maximum PCIe link generation. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_pcie_gen_gpumax | Maximum PCIe link generation supported by the GPU. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_pcie_gen_hostmax | Maximum PCIe link generation supported by the host. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_pcie_width_current | Current PCIe link width in lanes. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_pcie_width_max | Maximum PCIe link width in lanes. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_accounting_buffer_size | Process accounting statistics buffer size in KiB. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_utilization_jpeg_percent | JPEG engine utilization in percent. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_utilization_ofa_percent | OFA engine utilization in percent. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_encoder_session_count | Number of NVENC encoder sessions. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_encoder_average_fps | Encoder throughput in average frames per second. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_encoder_average_latency_us | Encoder latency in microseconds. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_device_memory | Corrected ECC error count (volatile) for device memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_dram | Corrected ECC error count (volatile) for DRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_register_file | Corrected ECC error count (volatile) for register file. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_l1_cache | Corrected ECC error count (volatile) for L1 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_l2_cache | Corrected ECC error count (volatile) for L2 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_texture_memory | Corrected ECC error count (volatile) for texture memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_cbu | Corrected ECC error count (volatile) for CBU. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_sram | Corrected ECC error count (volatile) for SRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_volatile_total | Corrected ECC error count (volatile) for memory (total). | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_device_memory | Corrected ECC error count (aggregate lifetime) for device memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_dram | Corrected ECC error count (aggregate lifetime) for DRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_register_file | Corrected ECC error count (aggregate lifetime) for register file. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_l1_cache | Corrected ECC error count (aggregate lifetime) for L1 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_l2_cache | Corrected ECC error count (aggregate lifetime) for L2 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_texture_memory | Corrected ECC error count (aggregate lifetime) for texture memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_cbu | Corrected ECC error count (aggregate lifetime) for CBU. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_sram | Corrected ECC error count (aggregate lifetime) for SRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_corrected_aggregate_total | Corrected ECC error count (aggregate lifetime) for memory (total). | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_device_memory | Uncorrected ECC error count (volatile) for device memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_dram | Uncorrected ECC error count (volatile) for DRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_register_file | Uncorrected ECC error count (volatile) for register file. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_l1_cache | Uncorrected ECC error count (volatile) for L1 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_l2_cache | Uncorrected ECC error count (volatile) for L2 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_texture_memory | Uncorrected ECC error count (volatile) for texture memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_cbu | Uncorrected ECC error count (volatile) for CBU. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_sram | Uncorrected ECC error count (volatile) for SRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_total | Uncorrected ECC error count (volatile) for memory (total). | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_device_memory | Uncorrected ECC error count (aggregate lifetime) for device memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_dram | Uncorrected ECC error count (aggregate lifetime) for DRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_register_file | Uncorrected ECC error count (aggregate lifetime) for register file. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_l1_cache | Uncorrected ECC error count (aggregate lifetime) for L1 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_l2_cache | Uncorrected ECC error count (aggregate lifetime) for L2 cache. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_texture_memory | Uncorrected ECC error count (aggregate lifetime) for texture memory. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_cbu | Uncorrected ECC error count (aggregate lifetime) for CBU. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram | Uncorrected ECC error count (aggregate lifetime) for SRAM. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_total | Uncorrected ECC error count (aggregate lifetime) for memory (total). | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_sram_parity | Uncorrected ECC error count (volatile) for sram parity. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_volatile_sram_secded | Uncorrected ECC error count (volatile) for sram secded. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_parity | Uncorrected ECC error count (aggregate lifetime) for sram parity. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_secded | Uncorrected ECC error count (aggregate lifetime) for sram secded. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_threshold_exceeded | Uncorrected ECC error count (aggregate lifetime) for sram threshold exceeded. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_l2 | Uncorrected ECC error count (aggregate lifetime) for sram l2. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_sm | Uncorrected ECC error count (aggregate lifetime) for sram sm. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_mcu | Uncorrected ECC error count (aggregate lifetime) for sram mcu. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_pcie | Uncorrected ECC error count (aggregate lifetime) for sram pcie. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_ecc_uncorrected_aggregate_sram_other | Uncorrected ECC error count (aggregate lifetime) for sram other. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_retired_pages_sbe | Number of framebuffer pages retired after single-bit errors. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_retired_pages_dbe | Number of framebuffer pages retired after double-bit errors. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_correctable | Number of DRAM rows requiring remapping after correctable ECC. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_uncorrectable | Number of DRAM rows requiring remapping after uncorrectable ECC. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_histogram_max | Remapped-row histogram count for worst-case mapping impact. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_histogram_high | Remapped-row histogram count for high mapping impact. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_histogram_partial | Remapped-row histogram count for partial mapping impact. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_histogram_low | Remapped-row histogram count for low mapping impact. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_remapped_rows_histogram_none | Remapped-row histogram count for no measurable mapping impact. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_temperature_throttle_celsius | GPU throttle temperature threshold (t limit) in degrees Celsius. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_draw_average_watts | Average GPU power draw in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_draw_instant_watts | Instantaneous GPU power draw in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_enforced_power_limit_watts | Enforced GPU power limit in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_default_limit_watts | Default GPU power limit in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_min_limit_watts | Minimum GPU power limit setting in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_max_limit_watts | Maximum GPU power limit setting in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_power_draw_average_watts | Average module-level power draw in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_power_draw_instant_watts | Instantaneous module-level power draw in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_power_limit_watts | Module-level power limit in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_enforced_power_limit_watts | Enforced module-level power limit in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_power_default_limit_watts | Default module-level power limit in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_power_min_limit_watts | Minimum module-level power limit setting in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_module_power_max_limit_watts | Maximum module-level power limit setting in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_primary_floor_watts | Power-smoothing primary floor in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_secondary_floor_watts | Power-smoothing secondary floor in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_min_primary_activation_offset | Minimum activation offset for the primary power floor. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_min_primary_activation_point | Minimum activation point for the primary power floor. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_window_multiplier | Power-smoothing window multiplier. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_curr_secondary_floor_watts | Active profile secondary floor in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_curr_primary_act_win_multiplier | Active profile primary-floor activation-window multiplier. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_curr_primary_tar_win_multiplier | Active profile primary-floor target-window multiplier. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_curr_primary_act_offset | Primary-floor activation offset for the active smoothing profile. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_admin_secondary_floor_watts | Admin override secondary floor in watts. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_admin_primary_act_win_multiplier | Admin primary activation-window multiplier override. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_admin_primary_tar_win_multiplier | Admin primary target-window multiplier override. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_power_smoothing_admin_primary_act_offset | Admin primary activation offset. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_clocks_applications_gr_mhz | Requested application-driven graphics clock in megahertz. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_clocks_applications_mem_mhz | Requested application-driven memory clock in megahertz. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_clocks_default_applications_gr_mhz | Default application graphics clock in megahertz. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_clocks_default_applications_mem_mhz | Default application memory clock in megahertz. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_protected_memory_total_bytes | Total protected VRAM capacity in bytes. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_protected_memory_used_bytes | Protected VRAM used in bytes. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_protected_memory_free_bytes | Protected VRAM free in bytes. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_throttle_counter_sw_power_cap | Count of software power-cap clock-throttle events. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_throttle_counter_sync_boost | Count of synchronous-boost throttle events. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_throttle_counter_sw_thermal_slowdown | Count of software thermal slowdown events. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_throttle_counter_hw_thermal_slowdown | Count of hardware thermal slowdown events. | Gauge | All or Custom: Per GPU and Detailed |
node_gpu_throttle_counter_hw_power_brake_slowdown | Count of hardware power-brake slowdown events. | Gauge | All or Custom: Per GPU and Detailed |
gpu_serial | Board serial number. | Property | All or Custom: Per GPU |
driver_version | Installed NVIDIA driver version. | Property | All or Custom: Per GPU |
pci_bus_id | Full PCI bus identifier in domain:bus:device.function format. | Property | All or Custom: Per GPU |
pci_bus | PCI bus number. | Property | All or Custom: Per GPU |
pci_device | PCI device number on the bus. | Property | All or Custom: Per GPU |
pci_device_id | PCI vendor and device ID. | Property | All or Custom: Per GPU |
persistence_mode | Whether persistence mode is enabled. | Property | All or Custom: Per GPU |
pstate | Current GPU performance state (P0 is maximum performance). | Property | All or Custom: Per GPU |
pci_domain | PCI domain number. | Property | All or Custom: Per GPU and Detailed |
pci_base_class | PCI base class code. | Property | All or Custom: Per GPU and Detailed |
pci_sub_class | PCI sub-class code. | Property | All or Custom: Per GPU and Detailed |
pci_sub_device_id | PCI subsystem device ID. | Property | All or Custom: Per GPU and Detailed |
vgpu_driver_cap_heterogenous_multi_vgpu | Whether the driver supports heterogeneous multi-vGPU. | Property | All or Custom: Per GPU and Detailed |
vgpu_device_cap_fractional_multi_vgpu | Whether the device supports fractional multi-vGPU. | Property | All or Custom: Per GPU and Detailed |
vgpu_device_cap_heterogeneous_time_slice_profile | Whether the device supports heterogeneous time-slice profiles. | Property | All or Custom: Per GPU and Detailed |
vgpu_device_cap_heterogeneous_time_slice_sizes | Whether the device supports heterogeneous time-slice sizes. | Property | All or Custom: Per GPU and Detailed |
vgpu_device_cap_homogeneous_placements | Whether the device supports homogeneous vGPU placements. | Property | All or Custom: Per GPU and Detailed |
vgpu_device_cap_mig_time_slicing | Whether the device supports MIG time-slicing. | Property | All or Custom: Per GPU and Detailed |
vgpu_device_cap_mig_time_slicing_mode | Current MIG time-slicing mode. | Property | All or Custom: Per GPU and Detailed |
display_mode | Whether the display feature is enabled. | Property | All or Custom: Per GPU and Detailed |
display_attached | Whether a display is physically attached to this GPU. | Property | All or Custom: Per GPU and Detailed |
display_active | Whether a display is actively receiving output from this GPU. | Property | All or Custom: Per GPU and Detailed |
addressing_mode | Memory addressing mode. | Property | All or Custom: Per GPU and Detailed |
accounting_mode | Whether process accounting is enabled. | Property | All or Custom: Per GPU and Detailed |
driver_model_current | Active driver model; typically N/A on Linux. | Property | All or Custom: Per GPU and Detailed |
driver_model_pending | Pending driver model after reboot; typically N/A on Linux. | Property | All or Custom: Per GPU and Detailed |
vbios_version | Video BIOS version string. | Property | All or Custom: Per GPU and Detailed |
inforom_img | InfoROM image version. | Property | All or Custom: Per GPU and Detailed |
inforom_oem | InfoROM OEM object version. | Property | All or Custom: Per GPU and Detailed |
inforom_ecc | InfoROM ECC object version. | Property | All or Custom: Per GPU and Detailed |
inforom_pwr | InfoROM power management object version. | Property | All or Custom: Per GPU and Detailed |
inforom_checksum_validation | InfoROM data integrity check result. | Property | All or Custom: Per GPU and Detailed |
gpu_recovery_action | Recommended recovery action following a GPU error. | Property | All or Custom: Per GPU and Detailed |
reset_status_reset_required | Whether a GPU reset is required to clear an error condition. | Property | All or Custom: Per GPU and Detailed |
reset_status_drain_and_reset_recommended | Whether a drain-and-reset is the recommended recovery procedure. | Property | All or Custom: Per GPU and Detailed |
gom_current | Current GPU operation mode (All On, Compute, or Low DP). | Property | All or Custom: Per GPU and Detailed |
gom_pending | Pending GPU operation mode, applied after the next reboot. | Property | All or Custom: Per GPU and Detailed |
clocks_throttle_reasons_supported | Bitmask of clock throttle reasons supported by this GPU. | Property | All or Custom: Per GPU and Detailed |
clocks_throttle_reasons_active | Bitmask of clock throttle reasons currently active. | Property | All or Custom: Per GPU and Detailed |
compute_mode | Compute access mode (Default, Exclusive Thread, Prohibited, or Exclusive Process). | Property | All or Custom: Per GPU and Detailed |
compute_cap | CUDA compute capability in major.minor format. | Property | All or Custom: Per GPU and Detailed |
dram_encryption_mode_current | Current DRAM encryption mode. | Property | All or Custom: Per GPU and Detailed |
dram_encryption_mode_pending | Pending DRAM encryption mode, applied after the next reboot. | Property | All or Custom: Per GPU and Detailed |
ecc_mode_current | Current ECC mode. | Property | All or Custom: Per GPU and Detailed |
ecc_mode_pending | Pending ECC mode, applied after the next reboot. | Property | All or Custom: Per GPU and Detailed |
retired_pages_pending | Whether pending page retirements require a reboot to take effect. | Property | All or Custom: Per GPU and Detailed |
remapped_rows_pending | Whether pending row remappings require a reboot to take effect. | Property | All or Custom: Per GPU and Detailed |
remapped_rows_failure | Whether a row remapping failure has been recorded. | Property | All or Custom: Per GPU and Detailed |
power_management | Whether power management is supported for this GPU. | Property | All or Custom: Per GPU and Detailed |
power_smoothing_supported | Whether delayed power smoothing is supported. | Property | All or Custom: Per GPU and Detailed |
mig_mode_current | Current MIG (Multi-Instance GPU) mode. | Property | All or Custom: Per GPU and Detailed |
mig_mode_pending | Pending MIG mode, applied after the next reboot. | Property | All or Custom: Per GPU and Detailed |
gsp_mode_current | Current GSP firmware mode. | Property | All or Custom: Per GPU and Detailed |
gsp_mode_default | Default GSP firmware mode. | Property | All or Custom: Per GPU and Detailed |
c2c_mode | Current chip-to-chip interconnect (C2C) mode. | Property | All or Custom: Per GPU and Detailed |
fabric_state | NVLink fabric state. | Property | All or Custom: Per GPU and Detailed |
fabric_status | NVLink fabric status. | Property | All or Custom: Per GPU and Detailed |
fabric_clique_id | NVLink fabric clique identifier. | Property | All or Custom: Per GPU and Detailed |
fabric_cluster_uuid | NVLink fabric cluster unique identifier. | Property | All or Custom: Per GPU and Detailed |
fabric_health_summary | Overall NVLink fabric health summary. | Property | All or Custom: Per GPU and Detailed |
fabric_health_bandwidth | NVLink fabric bandwidth health status. | Property | All or Custom: Per GPU and Detailed |
fabric_health_route_recovery_in_progress | Whether NVLink fabric route recovery is in progress. | Property | All or Custom: Per GPU and Detailed |
fabric_health_route_unhealthy | Whether any NVLink fabric routes are unhealthy. | Property | All or Custom: Per GPU and Detailed |
fabric_health_access_timeout_recovery | Whether access-timeout recovery is in progress on the NVLink fabric. | Property | All or Custom: Per GPU and Detailed |
fabric_health_incorrect_configuration | Whether the NVLink fabric has an incorrect configuration. | Property | All or Custom: Per GPU and Detailed |
fabric_health_partition_assigned | Whether the NVLink fabric partition has been assigned. | Property | All or Custom: Per GPU and Detailed |
platform_chassis_serial_number | Serial number of the chassis this GPU is installed in. | Property | All or Custom: Per GPU and Detailed |
platform_slot_number | Chassis slot number for this GPU module. | Property | All or Custom: Per GPU and Detailed |
platform_tray_index | Tray index within the chassis for this GPU module. | Property | All or Custom: Per GPU and Detailed |
platform_host_id | Host identifier in the platform topology. | Property | All or Custom: Per GPU and Detailed |
platform_peer_type | Type of the platform peer connection. | Property | All or Custom: Per GPU and Detailed |
platform_module_id | Module identifier within the platform. | Property | All or Custom: Per GPU and Detailed |
platform_gpu_fabric_guid | NVLink fabric GUID assigned to this GPU in the platform topology. | Property | All or Custom: Per GPU and Detailed |
hostname | Hostname of the system as reported by nvidia-smi. | Property | All or Custom: Per GPU and Detailed |
timestamp | Collection timestamp fetched from nvidia-smi. Used internally to maintain CSV column alignment; not emitted on any event. | Query | Basic |
count | Total number of GPUs returned by nvidia-smi. Produces the node_gpu_count gauge on the aggregated event; not emitted as a separate field. | Query | Basic |