Home / Stream/ Destinations/Elasticsearch

Elasticsearch

Cribl Stream can send events to an Elasticsearch cluster using the Bulk API. As of v.3.3, Cribl Stream supports Elastic data streams.

Type: Streaming | TLS Support: Configurable | PQ Support: Yes

Configuring Cribl Stream to Output to Elasticsearch

From the top nav, click Manage, then select a Worker Group to configure. Next, you have two options:

To configure via the graphical QuickConnect UI, click Routing > QuickConnect (Stream) or Collect (Edge). Next, click + Add Destination at right. From the resulting drawer’s tiles, select Elasticsearch. Next, click either + Add Destination or (if displayed) Select Existing. The resulting drawer will provide the options below.

Or, to configure via the Routing UI, click Data > Destinations (Stream) or More > Destinations (Edge). From the resulting page’s tiles or the Destinations left nav, select Elasticsearch. Next, click New Destination to open a New Destination modal that provides the options below.

General Settings

Output ID: Enter a unique name to identify this Elasticsearch Destination definition.

Load balancing: Set to No by default. When toggled to Yes, see Load Balancing Settings below. With the default No setting, if you notice that Cribl Stream is not sending data to all possible IP addresses, enable Advanced Settings > Round-robin DNS.

Bulk API URL or Cloud ID: Specify either an Elasticsearch cluster or Elastic Cloud deployment to send events to. For an Elasticsearch cluster, enter a URL (e.g., http://<myElasticCluster>:9200/_bulk). For an Elastic Cloud deployment, enter its Cloud ID. This setting is not available when Load balancing is enabled.

Index or data stream: Enter a JavaScript expression that evaluates to the name of the Elastic data stream or Elastic index where you want events to go. The expression is evaluated for each event; can evaluate to a constant value; and must be enclosed in quotes or backticks. An event’s __index field can overwrite the index or data stream name.

Authentication Settings

Authentication enabled: When toggled to Yes, use the Authentication method buttons to select one of these options:

Manual: Enter your credentials directly in the resulting Username and Password fields.

Secret: Exposes a Credentials secret drop-down, in which you can select a stored secret that references the credentials described above. A Create link is available to store a new, reusable secret.

Manual API Key: Exposes an API key field to directly enter your Elasticsearch API key.

Secret API Key: Exposes an API key (text secret) drop-down, in which you can select a stored text secret that references your Elasticsearch API key. A Create link is available to store a new, reusable secret.

Optional Settings

Type: Specify document type to use for events. An event’s __type field can overwrite this value.

Backpressure behavior: Specify whether to block, drop, or queue events when all receivers are exerting backpressure. Defaults to Block.

Tags: Optionally, add tags that you can use for filtering and grouping at the final destination. Use a tab or hard return between (arbitrary) tag names.

Load Balancing Settings

Enabling the Load balancing slider displays the following controls:

Exclude Current Host IPs

This slider determines whether to exclude all IPs of the current host from the list of any resolved hostnames. Defaults to No.

Bulk API URLs

The Bulk API URLs table is where you specify a known set of receivers on which to load-balance data. Click + Add URL to specify more receivers on new rows. Each row provides the following fields:

URL: Specify the URL to an Elastic node to send events to – e.g., http://elastic:9200/_bulk

Load weight: Specify a weight to apply to the receiver for load-balancing purposes.

The final column provides an X button to delete any row from the table.

When you first enable load balancing, or if you edit the load weight once your data is load–balanced, give the logic time to settle. The changes might take a few seconds to register.

Persistent Queue Settings

This tab is displayed when the Backpressure behavior is set to Persistent Queue.

On Cribl-managed Cribl.Cloud Workers (with an Enterprise plan), this tab exposes only the Clear Persistent Queue button. A maximum queue size of 1 GB disk space is automatically allocated per Worker Process.

Max file size: The maximum data volume to store in each queue file before closing it. Enter a numeral with units of KB, MB, etc. Defaults to 1 MB.

Max queue size: The maximum amount of disk space the queue is allowed to consume. Once this limit is reached, Cribl Stream stops queueing and applies the fallback Queue‑full behavior. Enter a numeral with units of KB, MB, etc.

Queue file path: The location for the persistent queue files. Defaults to $CRIBL_HOME/state/queues. To this value, Cribl Stream will append /<worker‑id>/<output‑id>.

Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to None; Gzip is also available.

Queue-full behavior: Whether to block or drop events when the queue is exerting backpressure (because disk is low or at full capacity). Block is the same behavior as non-PQ blocking, corresponding to the Block option on the Backpressure behavior drop-down. Drop new data throws away incoming data, while leaving the contents of the PQ unchanged.

Clear persistent queue: Click this button if you want to flush out files that are currently queued for delivery to this Destination. A confirmation modal will appear. (Appears only after Output ID has been defined.)

Processing Settings

Post‑Processing

Pipeline: Pipeline to process data before sending the data out using this output.

System fields: A list of fields to automatically add to events that use this output. By default, includes cribl_pipe (identifying the Cribl Stream Pipeline that processed the event). Supports wildcards. Other options include:

  • cribl_host – Cribl Stream Node that processed the event.
  • cribl_wp – Cribl Stream Worker Process that processed the event.
  • cribl_input – Cribl Stream Source that processed the event.
  • cribl_output – Cribl Stream Destination that processed the event.

Advanced Settings

Validate server certs: Toggle to Yes to reject certificates that are not authorized by a CA in the CA certificate path, nor by another trusted CA (e.g., the system’s CA).

Round-robin DNS: Toggle on to enable round-robin DNS lookup across multiple IP addresses, IPv4 and IPv6. When a DNS server resolves a Fully Qualified Domain Name (FQDN) to multiple IP addresses, Cribl Stream will sequentially use each address in the order they are returned by the DNS server for subsequent connection attempts. (This option is visible only when the General Settings > Load balancing option is set to No.)

Compress: Toggle to Yes to compress the payload body before sending.

Request timeout: Amount of time (in seconds) to wait for a request to complete before aborting it. Defaults to 30.

Request concurrency: Maximum number of concurrent requests per Worker Process. When Cribl Stream hits this limit, it begins throttling traffic to the downstream service. Defaults to 5. Minimum: 1. Maximum: 32.

Max body size (KB): Maximum size of the request body before compression. Defaults to 4096 KB. The actual request body size might exceed the specified value because the Destination adds bytes when it writes to the downstream receiver. Cribl recommends that you experiment with the Max body size value until downstream receivers reliably accept all events.

Flush period (s): Maximum time between requests. Low values could cause the payload size to be smaller than its configured maximum. Defaults to 1.

Extra HTTP headers: Name-value pairs to pass as additional HTTP headers.

Extra parameters: Name-value pairs to pass as additional parameters. If you are using Elastic ingest pipelines, specify an extra parameter whose name is pipeline and whose value is the name of your pipeline, similar to these examples.

Elastic version: Determines how to format events. For Elastic Cloud, you must explicitly set version 7.x. For other Elasticsearch clusters, the Auto default will discover the downstream Elasticsearch version automatically, but you have the option to explicitly set version 6.x or 7.x.

Elastic pipeline: To send data to an Elastic Ingest pipeline, optionally enter that pipeline’s name as a constant. Or, enter a JavaScript expression that evaluates outgoing events, and sends matching events to the desired Elastic Ingest pipeline(s). For example, the expression sourcetype=='access_common'?'cribl_pipeline':undefined matches events whose sourcetype is access_common, and sends them to an Elastic Ingest pipeline named cribl_pipeline.

The next two fields appear only when the General Settings > Load balancing option is set to Yes.

DNS resolution period (seconds): Re-resolve any hostnames each time this interval recurs, and pick up destinations from the A records. Defaults to 600 seconds.

Load balance stats period (seconds): Lookback traffic history period. Defaults to 300 seconds.

Failed request logging mode: Determines which data is logged when a request fails. Use the drop-down to select one of these options:

  • None (default).
  • Payload.
  • Payload + Headers. Use the Safe Headers field below to specify the headers to log. If you leave that field empty, all headers are redacted, even with this setting.

Safe headers: List the headers you want to log, in plain text.

Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.

Field Normalization

When sending data to Elasticsearch, this Destination applies the following field normalization to align with the Elasticsearch schema:

  • Renames _time to @timestamp, maintaining millisecond precision.
  • Renames host to host.name.

See also our Elasticsearch Source documentation’s Field Normalization section.

Internal Fields

Cribl Stream uses a set of internal fields to assist in forwarding data to a Destination.

Fields for this Destination:

  • __id
  • __type
  • __index

How Does Load Balancing Work

Cribl Stream will attempt to load-balance outbound data as fairly as possibly across all URLs. For example, if FQDNs/hostnames are used as the Destination addresses, and each resolves to 5 (unique) IPs, then each Worker Process will have its # of outbound connections = {# of IPs x # of FQDNs} for purposes of the Destination.

Data is sent by all Worker Processes to all URLs simultaneously, and the amount sent to each URL depends on these parameters:

  1. Respective destination weight.
  2. Respective destination historical data.

By default, historical data is tracked for 300s. Cribl Stream uses this data to influence the traffic sent to each destination, to ensure that differences decay over time, and that total ratios converge towards configured weights.

If multiple receivers are behind a hostname – i.e., multiple A records – then all resolved IPs will inherit the weight of the host, unless each IP is specified separately. In Cribl Stream load balancing, IP settings take priority over those from hostnames.

If a request fails, Cribl Stream will resend the data to a different endpoint. Cribl Stream will block only if all endpoints are experiencing problems.

Example

Suppose we have two receivers, A and B, each with weight of 1 (i.e., they are configured to receive equal amounts of data). Suppose further that the load-balance stats period is set at the default 300s and – to make things easy – for each period, there are 200 events of equal size (Bytes) that need to be balanced.

IntervalTime RangeEvents to be dispensed
1time=0s —> time=300s200

Both A and B start this interval with 0 historical stats each.

Let’s assume that, due to various circumstances, 200 events are “balanced” as follows: A = 120 events and B = 80 events – a difference of 40 events and a ratio of 1.5:1.

IntervalTime RangeEvents to be dispensed
2time=300s —> time=600s200

At the beginning of interval 2, the load-balancing algorithm will look back to the previous interval stats and carry half of the receiving stats forward. I.e., receiver A will start the interval with 60 and receiver B with 40. To determine how many events A and B will receive during this next interval, Cribl Stream will use their weights and their stats as follows:

Total number of events: events to be dispensed + stats carried forward = 200 + 60 + 40 = 300. Number of events per each destination (weighed): 300/2 = 150 (they’re equal, due to equal weight). Number of events to send to each destination A: 150 - 60 = 90 and B: 150 - 40 = 110.

Totals at end of interval 2: A=120+90=210, B=80+110=190, a difference of 20 events and a ratio of 1.1:1.

Over the subsequent intervals, the difference becomes exponentially less pronounced, and eventually insignificant. Thus, the load gets balanced fairly.

Notes on HTTP-based Outputs

  • Cribl Stream will attempt to use keepalives to reuse a connection for multiple requests. After two minutes of the first use, the connection will be thrown away, and a new one will be reattempted. This is to prevent sticking to a particular destination when there is a constant flow of events.

  • If the server does not support keepalives (or if the server closes a pooled connection while idle), a new connection will be established for the next request.

  • When resolving the Destination’s hostname with load balancing disabled, Cribl Stream will pick the first IP in the list for use in the next connection. Enable Round-robin DNS to better balance distribution of events between Elasticsearch cluster nodes.