Splunk is a streaming Destination type, and with the Splunk Load Balanced output, you can load-balance data out to multiple Splunk receivers.
Cribl LogStream will attempt to load-balance outbound data as fairly as possibly across all receivers (listed as Destinations in the GUI). If FQDNs/hostnames are used as the Destination address and each resolves to, for example, 5 (unique) IPs, then each worker process will have its # of outbound connections = # of IPs x # of FQDNs for purposes of the SplunkLB output. Data is sent by all worker processes to all receivers simultaneously, and the amount sent to each receiver depends on these parameters:
- Respective destination weight.
- Respective destination historical data.
By default, historical data is tracked for 300s. LogStream uses this data to influence the traffic sent to each destination, to ensure that differences decay over time, and that total ratios converge towards configured weights.
Suppose we have two receivers, A and B, each with weight of
1 (i.e., they are configured to receive equal amounts of data). Suppose further that the load-balance stats period is set at the default
300s and – to make things easy – for each period, there are 200 events of equal size (Bytes) that need to be balanced.
Events to be dispensed
time=0s ---> time=300s
Both A and B start this interval with 0 historical stats each.
Let's assume that, due to various circumstances, 200 events are "balanced" as follows:
A = 120 events and
B = 80 events – a difference of 40 events and a ratio of 1.5:1.
Events to be dispensed
time=300s ---> time=600s
At the beginning of interval 2, the load-balancing algorithm will look back to the previous interval stats and carry half of the receiving stats forward. I.e., receiver A will start the interval with 60 and receiver B with 40. To determine how many events A and B will receive during this next interval, LogStream will use their weights and their stats as follows:
Total number of events:
events to be dispensed + stats carried forward = 200 + 60 + 40 = 300.
Number of events per each destination (weighed):
300/2 = 150 (they're equal, due to equal weight).
Number of events to send to each destination
A: 150 - 60 = 90 and
B: 150 - 40 = 110.
Totals at end of interval 2:
B=80+110=190, a difference of 20 events and a ratio of 1.1:1.
Over the subsequent intervals, the difference becomes exponentially less pronounced, and eventually insignificant. Thus, the load gets balanced fairly.
To configure load balancing, first select Data > Destinations, then select Splunk > Load Balanced from the Data Destinations page's tiles or left menu. Then click Add New to open the Load Balanced > New Destination modal, which provides the following fields.
Output ID: Enter a unique name to identify this Splunk LB Destination definition.
Indexer Discovery: When toggled to
Yes, enables automatic discovery of indexers in an indexer clustering environment. See Indexer Discovery for the resulting UI options displayed below. When set to
No (the default), displays the Destinations section below.
Exclude current host IPs: Exclude all IPs of the current host from the list of any resolved hostnames. Defaults to
Backpressure behavior: Select whether to block, drop, or queue events when all receivers in this group are exerting backpressure. Defaults to
Block. When toggled to
Persistent Queue, adds the Persistent Queue Settings section (left tab) to the modal.
The Destinations section appears only when Indexer discovery has its default
No setting. Here, you specify a known set of Splunk receivers on which to load-balance data.
Click + Add Destination to specify more receivers on new rows. Each row provides the following fields:
Address: Hostname of the Splunk receiver. Optionally, you can paste in a comma-separated list, in
Port: Port number to send data to.
TLS: Whether to inherit TLS configs from group setting, or disable TLS. Defaults to
TLS servername: Servername to use if establishing a TLS connection. If not specified, defaults to connection host (if not an IP). Otherwise, uses the global TLS settings.
Load weight: The weight to apply to this Destination for load-balancing purposes.
Toggling the Indexer Discovery toggle to
Yes displays the following fields instead of the Destinations section:
Site: Clustering site from which indexers need to be discovered. In the case of a single site cluster,
default is the default entry.
Cluster Master URI: Full URI of Splunk Cluster Master, in the format:
scheme://host:port. (Worker Nodes normally access the Cluster Master on port 8089 to get the list of currently online indexers.)
Auth token: Authentication token required to authenticate to Cluster Master for indexer discovery.
Refresh period: Time interval (in seconds) between two consecutive fetches of indexer list from Cluster Master. Defaults to
To enable token authentication on the Splunk Cluster Master, you can find complete instructions in Splunk's Enable or Disable Token Authentication documentation. This option requires Splunk 7.3.0 or higher, and requires the following capabilites:
Be sure to give the token an Expiration setting well in the future, whether you use Relative Time or Absolute Time. Otherwise, the token will inherit Splunk's default expiration time of
+30d(30 days in the future), which will cause indexer discovery to fail.
If you have a failover site configured on Splunk's Cluster Master, Cribl respects this configuration, and forwards the data to the failover site in case of site failure.
This section is displayed when the Backpressure behavior is set to Persistent Queue.
Max file size: The maximum size to store in each queue file before closing it. Enter a numeral with units of KB, MB, etc. Defaults to
Max queue size: The maximum amount of disk space the queue is allowed to consume. Once this limit is reached, queueing is stopped, and data blocking is applied. Enter a numeral with units of KB, MB, etc.
Queue file path: The location for the persistent queue files. This will be of the form:
your/path/here/<worker-id>/<output-id>. Defaults to
Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to
Gzip is also available.
Enabled defaults to
No. When toggled to
Validate server certs: Reject certificates that are not authorized by a CA in the CA certificate path, or by another trusted CA (e.g., the system's CA). Defaults to
Server name (SNI): Server name for the SNI (Server Name Indication) TLS extension. This must be a host name, not an IP address.
Certificate name: The name of the predefined certificate.
CA certificate path: Path on client containing CA certificates (in PEM format) to use to verify the server's cert. Path can reference
Private key path (mutual auth): Path on client containing the private key (in PEM format) to use. Path can reference
$ENV_VARS. Use only if mutual auth is required.
Certificate path (mutual auth): Path on client containing certificates in (PEM format) to use. Path can reference
$ENV_VARS. Use only if mutual auth is required.
Passphrase: Passphrase to use to decrypt private key.
Minimum TLS version: Optionally, select the minimum TLS version to use when connecting.
Maximum TLS version: Optionally, select the maximum TLS version to use when connecting.
Single PEM File
If you have a single
certsections, enter this file's path in all of these fields above: CA certificate path, Private key path (mutual auth), and Certificate path (mutual auth).
Connection timeout: Amount of time (in milliseconds) to wait for the connection to establish, before retrying. Defaults to
Write timeout: Amount of time (in milliseconds) to wait for a write to complete, before assuming connection is dead. Defaults to
Pipeline: Pipeline to process data before sending the data out using this output.
System fields: A list of fields to automatically add to events that use this output. By default, includes
cribl_pipe (identifying the LogStream Pipeline that processed the event). Supports wildcards. Other options include:
cribl_host– LogStream Node that processed the event.
cribl_wp– LogStream Worker Process that processed the event.
cribl_input– LogStream Source that processed the event.
cribl_output– LogStream Destination that processed the event.
Output Multi Metrics: Toggle this slider to
Yes to output multiple-measurement metric data points. (Supported in Splunk 8.0 and above, this format enables sending multiple metrics in a single event, improving the efficiency of your Splunk capacity.)
Minimize in-flight data loss: If set to
Yes (the default), LogStream will check whether the indexer is shutting down and, if so, stop sending data. This helps minimize data loss during shutdown.
DNS resolution period (seconds): Re-resolve any hostnames after each interval of this many seconds, and pick up destinations from A records. Defaults to
Load balance stats period (seconds): Lookback traffic history period. Defaults to
300 seconds. (Note that If multiple receivers are behind a hostname – i.e., multiple A records – all resolved IPs will inherit the weight of the host, unless each IP is specified separately. In Cribl LogStream load balancing, IP settings take priority over those from hostnames.)
Max connections: Constrains the number of concurrent indexer connections, per Worker Process, to limit memory utilization. If set to a number >
0, then on every DNS resolution period (or indexer discovery), LogStream will randomly select this subset of discovered IPs to connect to. LogStream will rotate IPs in future resolution periods – monitoring weight and historical data, to ensure fair load balancing of events among IPs.
Nested field serialization: Specifies whether and how to serialize nested fields into index-time fields. Select
None (the default) or
Auth token: Optionally, enter a shared secret to use when establishing a connection to a Splunk indexer configured with the same secret.
Throttling: Throttle rate, in bytes per second. Multiple byte units such as KB, MB, GB, etc., are also allowed. E.g.,
42 MB. Default value of
0 indicates no throttling. When throttling is engaged, excess data will be dropped only if Backpressure behavior is set to Drop events. (Data will be blocked for all other Backpressure behavior settings.)
To connect to Splunk Cloud, you might need to extract the private and public key from the Splunk-provided Splunk Cloud Certificate, which is typically bundled in an app. Use the following steps:
Step 1. Test connectivity to Splunk Cloud, using the Root CA certificate:
openssl s_client -CApath path_to_ca.pem -connect hostnameToSplunkCloud:9997
Step 2. Extract the Private key from the Splunk Cloud Certificate. At the prompt, you will need the
sslPassword value from the
outputs.conf file bundled with the Splunk Cloud app. Using Elliptic Curve keys:
openssl ec -in path_to_server_cert.pem -out private.pem
If you are using RSA keys, instead use:
openssl rsa -in path_to_server_cert.pem -out private.pem
Step 3. Extract the Public Key for the Server Certificate:
openssl x509 -in path_to_server_cert.pem -out server.pem
Step 4. In the LogStream Destination's TLS Settings (Client Side) section, enter the following:
- CA Certificate Path: Path to CA Certificate.
- Private Key Path (mutual auth): Path to
private.pem(Step 2 above).
- Certificate Path (mutual auth): Path to
server.pem(Step 3 above).
Data sent to Splunk is not compressed.
If events have a LogStream internal field called
__criblMetrics, they'll be forwarded to Splunk as metric events.
If events do not have a
_rawfield, they'll be serialized to JSON prior to sending to Splunk.
Updated 10 days ago