CVA Operational Guardrails
These requirements provide the structural foundation for the architecture archetypes outlined in theCVA Matrix. Use these technical criteria to build resilient, scalable, and secure environments that align with our core deployment benchmarks.
Guardrails for Foundational Architectures (Vanilla Deployments)
These guardrails apply to standard, single-Worker Group topologies, which serve as the starting point for most deployments.
Vanilla Cloud Guardrails
Vanilla Cloud is the simplest, most operationally streamlined deployment for a cloud-native environment. In this set up, you are using the Cribl infrastructure for the control plane. You can then shift your focus entirely to managing license limits and data egress patterns within the cloud environment.
In Cribl.Cloud, Cribl-managed Worker Groups are the default unit of scale. You should treat these as your primary building block, designing your Pipelines and Routes specifically around the ingest limits and Worker Process caps enforced by your license tier, rather than treating processes as unbounded resources.For details, see Cribl.Cloud Architecture Planning.
For outbound traffic, avoid pinning connections to a single endpoint. Instead, rely on Destination-side load balancing or use round-robin DNS to spread connections across multiple IPs. For details, see Load Balancing.
When configuring firewalls for hybrid components, use the “Leader NLB IPs” and “Ingress/Egress IPs” found in your Cribl.Cloud Workspace settings. For details, see Workspaces.
Internal transfers between Stream components (Worker Group to Worker Group) use Cribl protocols and are billed once at initial ingest, so you do not need to worry about double-counting data volume for internal routing.
Vanilla On-Prem Guardrails
The Vanilla On-Prem archetype offers a straightforward, self-managed solution for smaller, stable sites where the entire control plane, including the Leader Node, is hosted within your own environment, thus requiring you to assume full responsibility for the resilience and high availability of the system.
Leader High Availability (HA)
If your organization has strict Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements, meaning you cannot tolerate significant downtime or configuration loss, you must implement the full Leader HA architecture.
The documented architecture uses an “active-active proxy” model where you deploy a standby Leader Node and place both the active and standby Leaders behind a single load balancer. This configuration is essential for high availability because:
- Seamless Failover: The load balancer ensures immediate failover by automatically redirecting all Worker and UI traffic to the healthy standby node if the active one fails.
- Simplified Routing: The design maintains single-writer semantics while allowing the standby node to proxy UI and API requests.
To implement this HA architecture, configure your load balancer as follows:
- Health Check: Configure the load balancer to health-check port
9000on both Leaders. - Control Plane Routing: Route all Worker control-plane traffic on port
4200to the active Leader. - Connectivity: Ensure Leader-to-Leader communication is allowed on ports
9000and4200.
For detailed information on implementing this design, see High Availability Architecture.
Configuration Management and Recovery
If full HA is not required, you must at a minimum implement a robust configuration management strategy. This requires configuring Packs or using a remote Git repository to version and protect your configuration. This strategy will enable rapid reconstruction of the Leader if the host fails. For more details, see Configuration Management.
Vanilla Hybrid Guardrails
Vanilla Hybrid consists of Customer-managed Workers that feed data into a central, cloud-managed Cribl.Cloud Leader. While offering flexibility, success depends on robust connectivity and firewall discipline due to the control-plane traffic dependency on the public internet.
- The primary operational guardrail is the requirement to allowlist the Cribl.Cloud Leader NLB IPs and required ports to ensure your Workers and Edge Nodes can reliably reach the control plane. The Cribl.Cloud Leader’s static access details (found in Workspaces > Access) exist specifically for this purpose. For details, see Workspaces.
- For implementation, add the Workspace “Leader NLB IPs” to your firewall allow-list and ensure control-plane traffic can reach the Cloud URL on port
4200. - When designing data egress from your hybrid Workers, keep the network path simple and uniform per site–ideally utilizing a single TLS/mTLS path–to reduce operational friction and troubleshooting complexity. For details, see Static External IPs for the Leader.
Cribl.Cloud/Hybrid (not Vanilla) Guardrails
Non-Vanilla Cribl.Cloud/Hybrid manages multiple customer-managed Worker Groups across distinct regions or functions, moving beyond the simple single-Worker Group setup. This complexity requires strict guardrails and safe limits for complex data flows.
- In these advanced environments, you may need to route data between Worker Groups for isolation or tiering. However, you must limit these Worker Group-to-Worker Group chains to a maximum of one internal hop.
- Reference architectures that rely on these flows, such as unified ingest combined with a separate replay tier, add this single extra hop strictly for functional reasons. Avoid creating deep, daisy-chained Pipelines, as each additional hop introduces new queues, latency, and potential failure points.
- Also, because these links often traverse different network zones (such as Cloud to on-prem, or Region A to Region B), you must secure every Worker-to-Worker path with TLS or mTLS, treating them as first-class production data paths rather than informal internal plumbing.
Functional Architectures (Workload Separation)
These guardrails apply when you need to evolve beyond a single Worker Group to handle mixed or high-volume workloads.
Small/Medium Push-Pull Mix Guardrails
Data sources behave differently on the network: Push Sources demand immediate attention, while Pull Sources operate on a schedule. Mixing them often leads to resource contention.
- You will achieve better stability if you separate push and pull workloads into distinct Worker Groups. Push traffic (like Syslog or HEC) requires stable listeners and consistent backpressure, whereas Pull workloads (like SQS or APIs) are governed by collector concurrency.
- To implement this, create at least two Worker Groups: a Push Worker Group fronted by a non-sticky load balancer, and a Pull Worker Group for queue/API collectors.
- This separation works because it allows you to independently tune the Worker Process counts and Job Limits for each Worker Group. For example, you can dedicate CPU power for fast data ingestion (Push) and set higher concurrency limits for managing parallel connections (Pull), ensuring neither workload compromises the other’s performance.
For details, see Data Resilience and Workload Architecture.
All-Cloud (Large Worker Group) Guardrails
This guardrail defines the operational signals that indicate when vertical scaling must stop and horizontal scaling must begin, even in a pure Cribl.Cloud environment where resource provisioning seems limitless.
- In scaling scenarios, job limits and system metrics serve as your critical early warning signals. When a single large Worker Group becomes too “hot”, indicated by sustained high CPU, deep queues, or hitting job concurrency caps, you should not simply add more Worker Processes. Instead, the mandated approach is to partition the workload into additional Worker Groups along functional lines, such as separating live ingest from replay. For details, see Functional Split Overlay.
- The key is to proactively reassign Sources and Destinations to these new functional Worker Groups before you hit the managed caps. This strategy keeps your scaling predictable and reduces the blast radius; if one high-volume Source causes issues, it is contained within its specific Worker Group rather than bringing down the entire deployment.
Large Push/Pull (Multi-Push) Guardrails
This guardrail is designed for large push/pull environments, prioritizing a dedicated strategy for Syslog traffic due to its high volume and unique challenge of creating “hot spots” on specific worker processes.
- Syslog streams should be treated as their own workload due to connection persistence and volume. Deploying a dedicated Syslog Worker Group allows you to independently tune persistent queues and Pipeline behavior for these sensitive streams.
- You must mitigate TCP pinning, where a massive volume of traffic sticks to a single Worker Process. To do this, front the syslog Workers with a non-sticky load balancer, use multiple VIPs, or enable Cribl’s native TCP load-balancing features. For details, see Syslog to Cribl Stream Reference Architecture.
- In Cribl.Cloud, plan your port usage carefully, as the platform exposes a finite set of pre-enabled ports; keeping high-volume Syslog on hybrid Workers often offers maximum flexibility.
Pull-Heavy Sources Guardrails
This guardrail is optimized for environments dominated by pull-heavy, queue-based Sources (like SQS, Kinesis, or Event Hubs), recognizing that managing high concurrency is more critical than simply managing high CPU.
- Throughput for these queue-backed Sources is governed by partition counts and consumer groups rather than raw network bandwidth. Therefore, you must use pull-optimized Worker Groups sized based on throughput units and job limits, rather than focusing solely on CPU resources. For details, see Source Architecture.
- For implementation, it is essential to partition these Worker Groups along namespaces, accounts, or data domains. This isolation prevents the issue of “noisy neighbors”, where a sudden spike in one account queue consumes all the resources, thereby starving the consumption needed for other business-critical domains.
Strategic Architectures (Bridging, Edge, and Hybrid Flow)
These guardrails address advanced integration architectures, focusing on how data crosses boundaries between environments, zones, or tiers.
General Worker Group to Worker Group Guardrails
This guardrail addresses routing data between Worker Groups, to introduce clear boundaries and isolation while actively mitigating the inherent latency and risk that every hop adds to the Pipeline.
- While internal Cribl HTTP/TCP Destinations allow you to route events between Worker Groups, you should limit designs to a single hop between the ingestion tier and the egress tier. Deeper chains add queues, latency, and potential failure points that make troubleshooting difficult.
- Treat the hand-off between Worker Groups as a formal internal interface. Define an explicit schema agreement regarding expected fields and formats so downstream systems (Destinations) are insulated from upstream changes.
- Also, secure all such internal paths using TLS or mTLS, treating them with the same security rigor as external internet-facing links. For details, see Worker Group to Worker Group Bridging.
Hybrid Feed to Cribl.Cloud Guardrails
This guideline specifically designed to address the “data gravity” problem, using an on-prem infrastructure to stage and forward data before lifting it to Cribl.Cloud for analytics and retention.
- In this setup, your on-prem Workers act as an ingest and staging tier that forwards data to Cribl.Cloud via secure private links (VPN or Private Link).
- The Cribl.Cloud-Managed Worker Group must then act as a strategic branching point.
- You should configure the Cribl.Cloud-Managed Worker Group to bifurcate the data stream: Route a filtered, cost-optimized subset to real-time analytics (the “Hot Path”) and send the full-fidelity raw data directly to low-cost object storage (the “Warm Path”) for long-term retention.
- The hybrid control plane relies on the Cloud Workspace URL and Leader NLB IPs for allow-listing (ports
4200/443), keeping the connectivity model explicit and auditable.
Cribl Edge Fleet to Worker Group Guardrails
This guideline is designed for extending visibility to resource-limited endpoints using Cribl Edge, ensuring collected data is efficiently forwarded to downstream Worker Groups without overwhelming the Source.
- Cribl Edge is designed for resilience and local collection, not heavy data transformation. To maintain performance at the endpoint, use Fleets and Subfleets to manage configuration at scale, but keep the Edge processing logic lean.
- Edge Nodes should focus only on collection, simple filtering, and basic redaction. You should offload complex operations, such as heavy enrichment, joins, or replay workflows, to the downstream Stream Worker Groups, where you have dedicated compute resources.
- Align internal links to documented default ports (such as
4200for control plane,10200/10300for internal data) and secure them with TLS/mTLS.