On This Page

Home / Cribl Insights/About Cribl Insights

About Cribl Insights

Cribl Insights provides unified observability and monitoring for your Cribl deployment. It aggregates operational, system, and data metrics, enabling you to quickly identify issues, optimize resource usage, and maintain system health across distributed environments.

What Is Insights?

Insights is a monitoring and analytics layer integrated into Cribl products. It collects and visualizes key metrics from Pipelines, system components, and alerting mechanisms. Insights supports real-time and historical analysis, helping you understand system behavior, detect anomalies, and troubleshoot efficiently.

  • Availability: Insights is available to Enterprise Cribl.Cloud organizations on the Main Workspace.
  • Data retention: Two days. Upgrade options are available under Workspace > Details.

Also see the Introducing Cribl Insights blog post.

Why Use Insights?

  • Operational Visibility: View Pipeline throughput, latency, error rates, and resource consumption.
  • Proactive Monitoring: Set up monitors and notifications to detect failures, bottlenecks, or abnormal patterns before they impact downstream systems.
  • Root Cause Analysis: Drill into system and data metrics to isolate issues and accelerate incident response.
  • Capacity Planning: Analyze trends for scaling decisions and to optimize infrastructure costs.

Insights Components

Insights is organized into three areas that together provide end-to-end operational visibility.

System presents control and data plane health for Nodes, including CPU and memory utilization, queue depth, backpressure, error and retry rates, and component availability. Use it to identify saturation before it impacts Pipelines, perform capacity planning, and drive root-cause analysis when infrastructure issues cascade into data symptoms.

Data surfaces the health and behavior of your data flows across Sources, Routes, Pipelines, and Destinations. It highlights throughput, freshness (event age), drops, gaps, and schema drift to help you detect partial failures and format changes early, validate transformations, and localize flow interruptions. Freshness shows how long it takes events to reach each point in the flow. Higher maximum freshness indicates delayed or stale data. Typical signals include events/bytes in and out, drop rates, freshness min/max, and parse/transform timings, enabling rapid triage of data-path regressions.

Alerts centralizes detection logic and delivery. You define monitors over Insights metrics and events (for example, “no data for N minutes” or sustained latency spikes), and route notifications to channels such as Slack, email, or webhooks with grouping and deduplication. This consolidates alert lifecycle management and reduces mean time to recovery by coupling detectors with the same time-aligned signals you investigate in Data and System views.