These docs are for Cribl Stream 4.10 and are no longer actively maintained.
See the latest version (4.14).
About Load Balancing
Cribl Stream will attempt to load-balance outbound data as fairly as possibly across all endpoints. For example, if FQDNs/hostnames are used as the Destination addresses, and each resolves to 5 (unique) IPs, then each Worker Process will have its # of outbound connections = {# of IPs x # of FQDNs} for purposes of the Destination.
Data is sent by all Worker Processes to a randomly selected endpoint, and the amount sent to each connection depends on these parameters:
- Respective destination weight: If a Load Weight is set to 0, Cribl Stream will not try to connect to that IP (nor to any IPs resolved for that FQDN). When a connection is blocked, Cribl Stream will apply a 10% penalty to the load weight, and will select the connection with the lower adjusted load.
- Respective destination historical data: By default, historical data is tracked for 300s. Cribl Stream uses this data to influence the traffic sent to each destination, to ensure that differences decay over time, and that total ratios converge towards configured weights.
Sources share a single Worker Process for load balancing, so you don’t need to dedicate a separate Worker Process for load balancing for each Source. Load balancing Worker Processes have a high throughput rate because they only handle Event Breakers.
On Destinations that support load balancing, Cribl recommends enabling the feature, even if you only have one hostname – because this hostname can expand to multiple IPs. If you must disable load balancing, then to ensure robust connections, enable the alternative Advanced Settings > Round-robin DNS option. This will allow Cribl Stream to follow the DNS host’s IP rotation, preventing connection pinning.
If a request fails, Cribl Stream will resend the data to a different endpoint. Cribl Stream will block only if all endpoints are experiencing problems.
Example: Equal Distribution
Suppose we have two connections, A and B, each with weight of 1 (i.e., they are configured to receive equal amounts of data). Suppose further that the load-balance stats period is set at the default 300s and – to make things easy – for each period, there are 200 events of equal size (bytes) that need to be balanced.
| Interval | Time Range | Events to balance | 
|---|---|---|
| 1 | time=0s —> time=300s | 200 | 
Both A and B start this interval with 0 historical stats each.
Let’s assume that, due to various circumstances, 200 events are “balanced” as follows:
A = 120 events and B = 80 events – a difference of 40 events and a ratio of 1.5:1.
| Interval | Time Range | Events to balance | 
|---|---|---|
| 2 | time=300s —> time=600s | 200 | 
At the beginning of interval 2, the load-balancing algorithm will look back to the previous interval stats and carry half of the receiving stats forward. In other words, connection A will start the interval with 60 and connection B with 40. To determine how many events A and B will receive during this next interval, Cribl Stream will use their weights and their stats as follows:
Total number of events: events to balance + stats carried forward = 200 + 60 + 40 = 300.
Number of events per each destination (weighted): 300/2 = 150 (they’re equal, due to equal weight).
Number of events to send to each destination A: 150 - 60 = 90 and B: 150 - 40 = 110.
Totals at the end of interval 2: A=120+90=210, B=80+110=190, a difference of 20 events and a ratio of 1.1:1.
Over the subsequent intervals, the difference becomes exponentially less pronounced, and eventually insignificant. Thus, the load gets balanced fairly.
Example: Unequal Distribution
Suppose you have three connections, A, B, and C, weighted to receive unequal amounts of data:
- A: 1
- B: 2
- C: 7
Assume that the load-balance stats period is set at the default 300s and for each period, 200 events of equal size (bytes) need to be balanced:
| Interval | Time Range | Events to balance | 
|---|---|---|
| 1 | time=0s —> time=300s | 200 | 
All three connections (A, B, and C) start this interval with 0 historical stats each.
Let’s assume that 200 events are “balanced” as follows:
A = 30 events, B = 50 events, and C = 120 events – a difference of 90 events and a ratio of 3:5:12.
| Interval | Time Range | Events to balance | 
|---|---|---|
| 2 | time=600s —> time=600s | 200 | 
At the beginning of interval 2, the load-balancing algorithm will look back to the previous interval stats and carry half of the receiving stats forward. This means that connection A will start the interval with 15, connection B with 25, and connection C with 60. To determine how many events each connection will receive during this next interval, Cribl Stream will use their weights and their stats as follows:
Total number of events: events to balance + stats carried forward = 200 + 15 + 25 + 60 = 300.
Number of events per each destination (weighted): 300 / (1 + 2 + 7) = 300 / 10 = 30.
Number of events to send to each destination A: 30 - 15 = 15; B: 60 - 25 = 35; C: 210 - 60 = 150.
Totals at the end of interval 2: A: 30 + 15 = 45; B: 50 + 35 = 85; C: 120 + 150 = 270.
The difference between the highest connection (C) and the lowest (A) is 225 events. The ratio at the end of interval 2 is approximately 1:1.89:6.
Over the subsequent intervals, the difference becomes exponentially less pronounced, and eventually insignificant. Thus, the load gets balanced fairly.