Distributed Collection
The Distributed Collection blueprint is designed for massive-scale endpoint environments. It separates the collection tier (Cribl Edge) from the processing tier (Cribl Stream) to ensure that even if thousands of events are generated at the Source, the core processing engine remains stable, efficient, and unburdened by raw ingestion tasks.
Topology Overview
This blueprint is built upon the Distributed (Multi-Worker Group) Topology. It creates a functional and physical split between your Fleet of endpoints and your centralized processing hub.
- Collection tier (Cribl Edge): Thousands of lightweight Edge nodes act as the “Spokes.” They are deployed directly on endpoints (servers, laptops, or VMs) to discover, collect, and forward logs or metrics.
- Processing tier (Cribl Stream): A central Worker Group (or multiple Worker Groups) acts as the “Hub.” It receives the data from the Edge Nodes, performs heavy enrichment (GeoIP, Lookups, Redaction), and routes it to final Destinations.
- Unified control plane: A single Leader manages both the Fleets and Worker Groups, providing a single pane of glass for global observability.
Overlays
This architecture is the primary use case for combining logical and transit overlays to manage the hand-off between tiers:
- Cribl Edge and Stream: Defines the “Push” relationship where Edge Nodes are configured to forward data to a dedicated Stream entry point.
- Hub and Spoke: Provides the logical structure for centralizing thousands of data sources into a manageable, horizontally scalable processing cluster.
- Worker Group to Worker Group Bridging: This is the functional “bridge” between the tiers. It uses Cribl HTTP or Cribl TCP to ensure the hand-off is encrypted, compressed, and load-balanced across the processing tier.
Operational Guardrails
To maintain a validated global deployment, you must adhere to these combined connectivity, management, and scaling standards:
- Fleet Management Scale: When managing a high volume of endpoints, use Subfleets to organize configurations by OS, function, or region. This prevents a single configuration error from impacting your entire global endpoint footprint.
- Worker to Worker Group compression: For the Worker-to-Worker Transit Overlay, always enable compression. This significantly reduces the bandwidth required to move data from the edge to the central hub, especially over limited or metered links.
- Processing Isolation: Avoid performing heavy lookups or complex regex directly on Edge Modes. Move “expensive” processing to the Cribl Stream processing tier to preserve endpoint performance and stability.
- Hop Limits: Maintain a single-hop architecture between the Edge tier and the Stream tier. Do not daisy-chain collection points, as this introduces unnecessary latency and management complexity.
- Connectivity Resilience: Ensure Edge Nodes have a stable management path to the Leader (Port
4200). For roaming endpoints, use Cribl.Cloud or a secure internet-facing ingress to maintain configuration heartbeats.
FFor more considrerations, see Strategic Architectures (Bridging, Edge, and Hybrid Flow).