These docs are for Cribl Stream 4.6 and are no longer actively maintained.
See the latest version (4.13).
About Load Balancing
Cribl Stream will attempt to load-balance outbound data as fairly as possibly across all endpoints. For example, if FQDNs/hostnames are used as the Destination addresses, and each resolves to 5 (unique) IPs, then each Worker Process will have its # of outbound connections = {# of IPs x # of FQDNs} for purposes of the Destination.
Data is sent by all Worker Processes to a randomly selected endpoint, and the amount sent to each connection depends on these parameters:
- Respective destination weight: If a Load Weight is set to
0
, Cribl Stream will not try to connect to that IP (nor to any IPs resolved for that FQDN). When a connection is blocked, Cribl Stream will apply a 10% penalty to the load weight, and will select the connection with the lower adjusted load. - Respective destination historical data: By default, historical data is tracked for 300s. Cribl Stream uses this data to influence the traffic sent to each destination, to ensure that differences decay over time, and that total ratios converge towards configured weights.
On Destinations that support load balancing, Cribl recommends enabling the feature, even if you only have one hostname – because this hostname can expand to multiple IPs. If you must disable load balancing, then to ensure robust connections, enable the alternative Advanced Settings > Round-robin DNS option. This will allow Cribl Stream to follow the DNS host’s IP rotation, preventing connection pinning.
Example
Suppose we have two connections, A and B, each with weight of 1
(i.e., they are configured to receive equal amounts of data). Suppose further that the load-balance stats period is set at the default 300s
and – to make things easy – for each period, there are 200 events of equal size (Bytes) that need to be balanced.
Interval | Time Range | Events to be dispensed |
---|---|---|
1 | time=0s —> time=300s | 200 |
Both A and B start this interval with 0 historical stats each.
Let’s assume that, due to various circumstances, 200 events are “balanced” as follows:
A = 120 events
and B = 80 events
– a difference of 40 events and a ratio of 1.5:1.
Interval | Time Range | Events to be dispensed |
---|---|---|
2 | time=300s —> time=600s | 200 |
At the beginning of interval 2, the load-balancing algorithm will look back to the previous interval stats and carry half of the receiving stats forward. I.e., connection A will start the interval with 60 and connection B with 40. To determine how many events A and B will receive during this next interval, Cribl Stream will use their weights and their stats as follows:
Total number of events: events to be dispensed + stats carried forward = 200 + 60 + 40 = 300
.
Number of events per each destination (weighed): 300/2 = 150
(they’re equal, due to equal weight).
Number of events to send to each destination A: 150 - 60 = 90
and B: 150 - 40 = 110
.
Totals at end of interval 2: A=120+90=210
, B=80+110=190
, a difference of 20 events and a ratio of 1.1:1.
Over the subsequent intervals, the difference becomes exponentially less pronounced, and eventually insignificant. Thus, the load gets balanced fairly.
If a request fails, Cribl Stream will resend the data to a different endpoint. Cribl Stream will block only if all endpoints are experiencing problems.