Transport Choice and Tuning
Within the framework of Cribl Validated Architectures (CVA), the choice of transport protocol is the foundation of data integrity. While UDP was historically favored for its low overhead, CVA prioritize TCP and TLS for both on-prem and Cribl.Cloud environments. This shift addresses the complexities of distributed systems, including the following requirements.
- Reliability and backpressure: TCP provides a “closed-loop” communication system via an acknowledgment mechanism that UDP lacks. This allows Cribl Worker/Edge Nodes to exert backpressure during high-volume events, such as security breaches or system failures, signaling the upstream sender to throttle its transmission rate. Unlike the stateless nature of UDP, this mechanism prevents silent data loss and takes into account every log, even when the Pipeline reaches capacity.
- Security (TLS): CVA guidelines assume telemetry will cross untrusted network segments. TLS encryption protects sensitive logs from interception, while mutual authentication (mTLS) guarantees only authorized senders can inject data into your environment.
- Session-based monitoring: TCP maintains a persistent connection and this provides real-time visibility into the health of every data Source. This stateful approach allows you to monitor active sessions and immediately identify issues, such as connection flapping or handshake failures, that are invisible with the stateless nature of UDP.
While the CVA prioritize stateful protocols, UDP remains a necessary exception for legacy hardware and specific high-volume scenarios. Its use is strictly limited to devices that lack TCP support or cases where high-throughput requirements outweigh the need for 100% delivery, making small, measurable data loss acceptable.
In these instances, UDP should only be deployed over short, local network segments where packet loss can be tightly monitored and mitigated by persistent queues (PQ) to maintain as much durability as the protocol allows.
For details, see TCP or UDP Transport?
On-Prem Transport and Tuning
In customer-managed environments, architectural performance hinges on eliminating kernel-level bottlenecks. The primary challenge in high-volume on-prem deployments is TCP pinning, where a single, long-lived TCP session is locked to a single CPU core, artificially capping throughput regardless of available system resources.
Default Transports
For customer-managed (on-prem or hybrid) Worker Groups and Edge Fleets, the default for syslog and most push-based telemetry:
- Use TCP with TLS from senders to Cribl Stream Workers or Edge Node syslog Sources.
- Use HTTPS for REST/HEC-style Sources (for example, Splunk HEC, Cribl HTTP, custom HTTP senders).
UDP syslog is constrained to:
- Devices that only support UDP.
- High-volume senders where small, measurable data loss is acceptable and you need to avoid TCP pinning overhead.
- Short, local network segments where you can monitor packet loss tightly and use PQ for resilience.
OS-Level Tuning
For high-volume or high-connection-count on-prem deployments, tune the OS to prevent kernel-level bottlenecks:
- Increase TCP/UDP receive buffers to reduce packet drops for busy UDP syslog or NetFlow/IPFIX inputs.
- Raise per-process and system-wide file-descriptor limits so API and Worker processes can accept many concurrent connections without hitting EMFILE.
- Size connection-tracking (conntrack) tables for worst-case flow counts, or disable conntrack selectively in extreme cases where it becomes a bottleneck.
Failing to tune these parameters results in “invisible” packet drops at the OS layer, where the kernel discards data before the Cribl application even has the opportunity to process it. For details, see OS Tuning for Large Deployments.
Cribl Configuration Tuning
At the Cribl layer, tune concurrency and backpressure to match Source behavior and Destination capacity. Treat Source and Destination concurrency limits (including pull-Source concurrency and long-lived connection caps) as primary controls for protecting latency, CPU, and network stability.
Configure concurrency per Source (for example, Syslog, Cribl HTTP/TCP, Splunk TCP, HEC):
Adjust Max active connections and, where applicable, Requests-per-socket limit to spread load across Worker Processes and avoid pinning a single connection to one Worker. For details, see Number of Connections.
Configure Idle timeout setting for long-lived TCP connections to prevent idle sockets from consuming resources indefinitely.
Configure concurrency per Destination (for example, Syslog, Cribl HTTP/TCP, object store, SIEM):
- Adjust Max concurrent connections and related load-balancing/pooling settings on Cribl HTTP/TCP Destinations so they don’t open too many simultaneous connections to on-prem or third-party systems, and so traffic is evenly distributed across those connections.
- Set timeouts and retry behavior to match downstream SLAs to prevent aggressive reconnection attempts from overwhelming struggling destinations during service degradations.
Enable PQ for durability: On critical Sources and streaming Destinations that feed SIEM, observability platforms, or lakes so you can absorb transient downstream slowness or outages without losing data.
For details see, Data Resilience and Workload Architecture.
For high-throughput TCP syslog, also enable the Syslog Source TCP Load Balancing option on distributed Worker Nodes to mitigate TCP pinning and scale throughput with CPU cores via a dedicated load-balancer process. For details, see Mitigate TCP Pinning.
Cribl.Cloud Transport and Tuning
In Cribl.Cloud, the architecture shifts from kernel tuning to secure ingress management. Because data typically crosses the public internet to reach cloud endpoints, security and session persistence are the primary concerns.
Mandatory Encryption
All traffic entering Cribl.Cloud must use TCP/TLS or HTTPS. Raw UDP syslog lacks the encryption necessary for internet transit and is highly susceptible to Source-address spoofing and data interception; therefore, it is unsupported for direct Cribl.Cloud ingress.
For any traffic crossing the Internet or other untrusted networks into Cribl.Cloud:
- Syslog senders: Use TCP and TLS to Cribl.Cloud syslog Sources (for example, port
6514for TLS syslog in Cribl.Cloud). - HTTP/REST/HEC senders: Use HTTPS to Cribl.Cloud HTTP or HEC endpoints, including the Splunk HEC Source and Cribl HTTP Source.
For details, see Syslog TLS to Cribl.Cloud (Palo Alto Example) and Secure Cribl.Cloud with TLS and mTLS.
UDP Scope for Cribl.Cloud
Do not send raw UDP syslog across the public Internet to Cribl.Cloud. If a device only supports UDP:
- Terminate UDP on a local Edge Node or hybrid Stream Worker Group.
- Forward from there to Cribl.Cloud via Cribl HTTP or Cribl TCP Destinations, using TLS end-to-end. This effectively “wraps” the insecure local traffic in a TLS-encrypted tunnel for safe and authenticated transit to Cribl.Cloud.
Managed Optimization
Because Cribl manages the OS for Cribl-managed Worker Groups, you cannot change kernel parameters directly. Focus tuning on:
Source configuration (per Stream Worker Group):
- Concurrency: Max connections and Requests-per-socket limit for syslog, HEC, and HTTP Sources.
- TLS settings: Use Cribl-provided certificates for Cribl.Cloud Sources where possible.
- PQ: For eligible Sources in Cribl.Cloud, PQ sizing is automatically constrained (for example, up to 1 GB per Source per Worker Process); you can enable or disable PQ per Source based on durability requirements.
Destination configuration:
- Connection pooling and limits for Cribl HTTP/TCP Destinations that relay to on-prem or third-party targets.
- Timeouts and retry behavior on HTTP-family Destinations (including Cribl HTTP, object stores, and APIs) to handle long-latency or intermittently reachable targets gracefully.
For details, see About Destination Backpressure Triggers.
Hybrid Transport Strategy
Hybrid topologies (on-prem Workers/Edge Nodes to the Cribl.Cloud Leader) must maintain protocol consistency across trust boundaries. The strategy is to confine high-risk, low-visibility protocols (UDP) to local network segments while using robust, stateful tunnels for long-haul transit.
Cross-Boundary Links
Between on-prem Worker Groups/Fleets and Cribl.Cloud, or Cribl.Cloud and SaaS/SIEM/object stores, use Cribl HTTP or Cribl TCP Destination/Source pairs over TLS-protected TCP whenever you are sending data between the Leader or Worker Groups.
This avoids double-billing and provides a consistent, authenticated transport for inter-Organization and inter-Workspace data movement.
Limit UDP to short, controlled paths such as:
- From a device to the local Edge Node.
- From a device to a local on-prem Worker Group.
Forward the data via TCP and TLS (Cribl HTTP, Cribl TCP, or syslog TCP/TLS).
This pattern ensures that:
- Any lossy behavior of UDP is confined to a local, observable network segment.
- Long-haul links benefit from TCP congestion control, retry semantics, and TLS encryption.
Handling High-Latency
For WAN/VPN paths between Cribl components and downstream systems (for example, Cribl.Cloud to on-prem SIEM, or on-prem Worker Nodes to a cloud object storage):
- Increase Destination timeouts and retry intervals to reflect realistic round-trip times and maintenance windows.
- Rely on PQs at the sending side (Worker Nodes or Edge Nodes) to provide durability while the downstream is unreachable or slow, rather than trying to keep connections open indefinitely.
- For syslog-heavy environments, combine PQ with the guidance in Syslog to Cribl Stream Reference Architecture and Syslog Best Practices to balance durability, cost, and acceptable loss.