Persistent queuing (PQ) is a feature that helps minimize data loss if a downstream receiver is unreachable. Durability is provided by writing data to disk for the duration of the outage, and forwarding it upon recovery.
Each LogStream output has an in-memory queue that helps it absorb temporary imbalances between inbound and outbound data rates. E.g., if there is an inbound burst of data, the output will store events in the queue, and output them at the rate that the receiver can sync (as opposed to blocking or dropping them). Only when this queue is full will the output impose backpressure upstream.
Backpressure behavior can be configured to either block or drop. In block mode, the output will refuse to accept new data until the receiver is ready. The system will back propagate block "signals" all the way back to the sender (assuming it supports backpressure, too). In drop behavior, the output will discard new events until the receiver is ready.
In some environments, the in-memory queues and their block/drop behavior are acceptable. Persistent queues serve environments where more durability is required (e.g., outages last longer than memory queues can sustain), or where upstream senders do not support backpressure (e.g., ephemeral/network senders),
Engaging persistent queues in these scenarios can help minimize data loss. Once the in-memory queue is full, the LogStream output will write its data to disk. Then, when the receiver is ready, the output will start draining the queues in FIFO (first in, first out) fashion.
Persistent queues are:
- Available at the output side (i.e., after processing).
- Engaged only when all of the receivers of that output exert blocking.
- Drained when at least one receiver can accept data.
- Not infinite in size. I.e., if data cannot be delivered out, you might run out of disk space.
- Not able to fully protect in cases of application failure. E.g., in-memory data might get lost if a crash occurs.
- Not able to protect in cases of hardware failure. E.g., disk failure, corruption, or machine/host loss.
The following LogStream Destinations support Persistent Queuing:
- Splunk Single Instance
- Splunk Load Balanced
- Splunk HEC
- Cloudwatch Logs
- Azure Monitor Logs
- Azure Event Hubs
- StatsD Extended
- TCP JSON
Persistent Queueing is configured individually for each output that supports it. To enable persistent queueing, go to the output's (Destination's) configuration page and set the Backpressure Behavior control to Persistent Queueing. This exposes the following additional controls:
Max file size: The maximum size to store in each queue file before closing it. Enter a numeral with units of KB, MB, etc. Defaults to
Max queue size: The maximum amount of disk space the queue is allowed to consume. Once this limit is reached, queueing is stopped, and data blocking is applied. Enter a numeral with units of KB, MB, etc.
Queue file path: The location for the persistent queue files. This will be of the form:
your/path/here/<worker-id>/<output-id>. Defaults to
Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to
Gzipis also available.
Minimum Free Disk Space
Sufficient disk space is required for queuing to operate properly. You configure the minimum disk space in Settings > General Settings > Limits > Min Free Disk Space. If available disk space falls below this threshold, LogStream will stop maintaining persistent queues, and data loss will begin. The default is 5GB. Be sure to set this on your worker nodes rather than on the master node when in distributed mode.
Updated 6 months ago