On This Page

Home / Reference Architectures/ Cribl Validated Architectures/ CVA Reference Library/Organize Your Workloads (Overlays)

Organize Your Workloads (Overlays)

If a Topology is the physical foundation of your deployment (the servers and their locations), an Overlay is the logical floor plan that defines how you actually use that space.

Overlays are specialized patterns layered onto a Distributed: Multi-Worker Group/Fleet. While a topology describes where the software lives, an overlay defines the rules, roles, and data flows required to meet specific business or technical goals.

Why Use Overlays?

Topologies provide the capacity to process data, but Overlays provide the strategy. You use them to:

  • Define purpose: Assign specific roles to Worker Groups (such as “This group only handles high-volume Syslog”).
  • Enforce governance: Control how data crosses security or sovereignty boundaries.
  • Optimize performance: Isolate different types of workloads so one doesn’t interfere with another.
Overlay PatternDescription
Functional splitPartitions Worker Groups by workload type (push, pull, replay). This allows Worker Groups to scale independently and run more efficiently, and it isolates failures so they do not interrupt the live data streams.
Regional or geo splitDivides Worker Groups by physical location (region, data center, or sovereignty boundary). This helps meet regulatory compliance (like GDPR) and reduces cross-region egress costs.
Worker Group to Worker Group bridgingProvides a secure mechanism using Cribl HTTP/TCP to route data across trust boundaries. It also enforces schema governance and compliance before data transfer.
Cribl Edge and StreamUses Edge Nodes near the Source to handle local collection, filtering, and buffering. This maximizes efficiency and resiliency before forwarding curated data to Cribl Stream Worker Groups.
Hub-and-spoke with Core Worker GroupUses spoke Worker Groups to ingest data locally, while a central Core Worker Group performs global normalization, enrichment, and controlled routing to all final Destinations.
Replay-firstPrioritizes long-term durability and cost control by initially writing all raw data to low-cost object storage. It then feeds downstream platforms with data using real-time subsets and flexible on-demand replay.