Cribl LogStream – Docs

Cribl LogStream Documentation

Questions? We'd love to help you! Meet us in #Cribl Community Slack (sign up here)
Download entire manual as PDF – v.3.1.1

Persistent Queues

LogStream's persistent queuing (PQ) feature helps minimize data loss if a downstream receiver is unreachable. PQ provides durability by writing data to disk for the duration of the outage, and forwarding it upon recovery.

Persistent queues are implemented on the outbound side, meaning that each Source can take advantage of a Destination's queue.

How Does Persistent Queueing Work

Each LogStream output has an in-memory queue that helps it absorb temporary imbalances between inbound and outbound data rates. E.g., if there is an inbound burst of data, the output will store events in the queue, and then output them at the rate to which the receiver can sync (as opposed to blocking or dropping them). Only when this queue is full will the output impose backpressure upstream.

Life Without PQ

Backpressure behavior can be configured to one of Block, Drop Events, or (on Destinations that support it) Persistent Queue. In Block mode, the output will refuse to accept new data until the receiver is ready.

The system will back propagate block "signals" all the way back to the sender (assuming that the sender supports backpressure, too). In general, TCP-based senders support backpressure, but this is not a guarantee: Each upstream application's developer is responsible for ensuring that the application stops sending data once LogStream stops sending TCP acknowledgments back to it.

In Drop mode, the Destination will discard new events until the receiver is ready. In some environments, the in-memory queues and their block/drop behavior are acceptable.

PQ + FIFO = Durability

Persistent queues serve environments where more durability is required (e.g., outages last longer than memory queues can sustain), or where upstream senders do not support backpressure (e.g., ephemeral/network senders).

Engaging persistent queues in these scenarios can help minimize data loss. Once the in-memory queue is full, the LogStream Destination will write its data to disk. Then, when the receiver is ready, the output will start draining the queues in FIFO (first in, first out) fashion.

Persistent Queue Details and Constraints

Persistent queues are:

  • Available on the output side (i.e., after processing).
  • Engaged only when all of that output's receivers exert blocking.
  • Drained when at least one receiver can accept data.
  • Not infinite in size. I.e., if data cannot be delivered out, you might run out of disk space.
  • Not able to fully protect in cases of application failure. E.g., in-memory data might get lost if a crash occurs.
  • Not able to protect in cases of hardware failure. E.g., disk failure, corruption, or machine/host loss.
  • TLS-encrypted only for data in flight, and only on Destinations where TLS is supported and enabled. To encrypt data at rest, including disk writes/reads, you must configure encryption on the underlying storage volume(s).

Persistent Queue Support

The following LogStream Destinations support Persistent Queuing:

  • Splunk Single Instance
  • Splunk Load Balanced
  • Splunk HEC
  • Kinesis
  • Cloudwatch Logs
  • SQS
  • Azure Monitor Logs
  • Azure Event Hubs
  • StatsD
  • StatsD Extended
  • Graphite
  • TCP JSON
  • Syslog
  • Elasticsearch
  • Honeycomb
  • InfluxDB
  • Wavefront
  • SignalFx

Configuring Persistent Queueing

Persistent Queueing is configured individually for each output that supports it. To enable persistent queueing, go to the output's (Destination's) configuration page and set the Backpressure Behavior control to Persistent Queueing. This exposes the following additional controls:

  • Max file size: The maximum size to store in each queue file before closing it. Enter a numeral with units of KB, MB, etc. Defaults to 1 MB.

  • Max queue size: The maximum amount of disk space that the queue is allowed to consume, on each Worker Process. Once this limit is reached, queueing is stopped, and data blocking is applied. Enter a numeral with units of KB, MB, etc.

  • Queue file path: The location for the persistent queue files. This will be of the form: your/path/here/<worker-id>/<output-id>. Defaults to $CRIBL_HOME/state/queues.

  • Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to None; Gzip is also available.

  • Queue-full behavior: Determines whether to block or drop events when the queue is exerting backpressure (because disk is low or at full capacity). Block is the same behavior as non-PQ blocking, corresponding to the Block option on the Backpressure behavior drop-down.Drop new data throws away incoming data, while leaving the contents of the PQ unchanged.

🚧

Minimum Free Disk Space

For queuing to operate properly, you must provide sufficient disk space. You configure the minimum disk space in global ⚙️ Settings (lower left) > General Settings > Limits > Min Free Disk Space. If available disk space falls below this threshold, LogStream will stop maintaining persistent queues, and data loss will begin. The default minimum is 5GB. Be sure to set this on your Worker Nodes (rather than on the Leader Node) when in distributed mode.

Updated 7 days ago

Persistent Queues


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.