/ / ·

Persistent Queues

Cribl Edge’s persistent queuing (PQ) feature helps minimize data loss if a downstream receiver or internal process is unreachable or slow to respond. PQ provides durability by writing data to disk for the duration of the outage, and then forwarding it upon recovery.

Persistent queues can be enabled:

On Push Sources.
On Streaming Destinations. (Sources can also take advantage of a Destination’s queue to keep data flowing.)

The PQ feature may be restricted based on your Cribl license. See the Pricing page for details.

Persistent Queues Supplement In‑Memory Queues

Persistent queues trigger differently on the Destination versus Source side.

Destination Side

On each Cribl Edge Destination, an in-memory buffer helps the Destination absorb temporary imbalances between inbound and outbound data rates. For example, if there is an inbound burst of data, the Destination will store events in its buffer, and will then output them at the rate that the downstream receiver can catch up.

Only when this buffer is full will the Destination impose backpressure upstream. (This threshold varies per Destination type.) This is where persistent queues help safeguard your data.

Life Without PQ

You can configure each Destination’s backpressure behavior to one of Block, or Drop Events, or (on Destinations that support it) Persistent Queue.

In Block mode, the output will refuse to accept new data until the receiver is ready. The system will back propagate block “signals” all the way back to the sender (assuming that the sender supports backpressure, too). In general, TCP-based senders support backpressure, but this is not a guarantee: Each upstream application’s developer is responsible for ensuring that the application stops sending data once Cribl Edge stops sending TCP acknowledgments back to it.

In Drop mode, the Destination will discard new events until the receiver is ready. In some environments, the in-memory queues and their block or drop behavior are acceptable.

PQ = Durability

Persistent queues serve environments where more durability is required (such as, outages last longer than memory queues can sustain), or where upstream senders do not support backpressure (such as, ephemeral/network senders).

Engaging persistent queues in these scenarios can help minimize data loss. Once the in-memory buffer is full, the Source or Destination will write its data to disk. Then, when the downstream receiver or Cribl process is ready, the Source/Destination will start draining its queue.

By default, after data flow is re-established, a Destination will forward events in FIFO (first in, first out) order: It will send out earlier queued events before newly arriving events. However, in Cribl Edge 4.1 and later, you can instead prioritize new events by disabling Destinations’ Strict ordering control.

Source Side

Push Sources’ config modals also provide a PQ option. When you enable PQ, you can choose between two trigger conditions:

Always On mode will use PQ as a buffer for all events.
Smart mode will engage PQ only upon backpressure from downstream receivers or Cribl processes.

Source and/or Destination PQ?

If your Source(s) and Destination(s) both support persistent queues, which side should you enable? If you prioritize maximum data retention and delivery over performance, Cribl recommends that you enable both Source PQ and Destination PQ.

When PQ is engaged, throughput will be somewhat slower. But in exchange for this extra latency, you’ll minimize your risk of data loss.

Because of this latency penalty, it is redundant to enable PQ on a Source whose upstream sender is configured to safeguard events in its own disk buffer.

Persistent Queue Details and Constraints

Persistent queues are:

Available on Push Sources.
Available on the output side (that is, after processing) of all streaming Destinations, with these exceptions: Syslog and Graphite (when you select UDP as the outbound protocol) and SNMP Trap.
Implemented at the Worker Process level, with independent sizing configuration and dynamic engagement per Worker Process.
With load-balanced Destinations (such as Splunk Load Balanced, Splunk HEC, Elasticsearch, TCP JSON, and Syslog with TCP), engaged only when all of the Destination’s receivers are blocking data flow. (Here, a single live receiver will prevent PQ from engaging on the corresponding Destination.)
Note that PQ will not engage if at least one outbound TCP socket connection is active, even if the receiver is sending a backpressure signal.
- For Kafka-based Destinations (such as Kafka, Confluent Cloud, Amazon MSK, and Azure Event Hubs), PQs engage when the brokers can’t be reached.
On Destinations, engaged only when receivers are down, unreachable, blocking, or throwing a serious error (such as a connection reset). Destination-side PQ is not designed to engage when receivers’ data consumption rate simply slows down.
Drained when at least one receiver can accept data.
Not infinite in size. This means that if data cannot be delivered out, you might run out of disk space.
Not able to fully protect in cases of application failure. For example, if a crash occurs, in-memory data might get lost.
Not able to protect in cases of hardware failure, such as disk failure, corruption, or machine/host loss.
TLS-encrypted only for data in flight, and only on Destinations where TLS is supported and enabled. To encrypt data at rest, including disk writes/reads, you must configure encryption on the underlying storage volume(s).

Beyond the failure scenarios above: If you turn off PQ on a Source or Destination (or shut down the whole Source/Destination) before the disk queue has drained, the queued events will be orphaned on disk.

Configuring Persistent Queueing

The following sections outline configuring PQ for Cribl.Cloud Workers managed by Cribl, and then for on-prem and hybrid Cribl.Cloud Workers.

Configuring PQ for Cribl-Managed Cribl.Cloud Workers

Persistent queuing may be supported for Cribl-managed Workers in Cribl.Cloud deployments, depending on your Cribl license. Configuration is considerably simpler with Cribl-managed Workers than with hybrid and on-prem Workers (both covered below). You configure PQ individually on each Source and Destination that supports it.

Sources

Click a Source to open its configuration modal.
Select the Persistent Queue Settings tab.
Toggle Enable Persistent Queue to on.

On Cribl-managed Workers, this tab exposes only the Enable Persistent Queue toggle. When enabled, PQ is automatically configured in Always On mode, with a maximum queue size of 1 GB of data per PQ‑enabled Source, per Worker Process. The 1 GB limit is on uncompressed inbound data, and no compression is applied to the queue.

These options are not configurable. If you want to finely configure the maximum queue size, compression, PQ mode, and other options, use a hybrid Group.

Destinations

Click a Destination to open its configuration modal.
Set Backpressure behavior to Persistent Queueing.

On Cribl-managed Workers, the resulting Persistent Queue Settings tab exposes only a Clear Persistent Queue button. Once enabled, PQ is automatically configured with a maximum queue size of 1 GB of uncompressed data per PQ‑enabled Destination, per Worker Process. The 1 GB limit is on uncompressed outbound data, and no compression is applied to the queue. If the queue fills up, Cribl Edge will block data flow.

These options are not configurable. If you want to finely configure the maximum queue size, compression, queue-full fallback behavior, and other options, use a hybrid Group.

If you want to disable persistent queuing:

Click the Clear Persistent Queue button.
Follow the resulting link to make sure the queue fully drains.
In the config modal’s General Settings, select a different Backpressure behavior, then re-save the config.

Logs

Cribl internal logs for Cribl-managed Workers might include entries labeled Revived PQ Worker Process. These indicate where a Cribl-managed Worker was newly assigned to drain a persistent queue left over from a shut-down Worker Node. You can use these log events to troubleshoot Cloud PQ issues.

Configuring PQ for On-Prem and Hybrid Workers

Persistent queuing may be supported for Workers in on-prem deployments and for hybrid Cribl.Cloud Workers, depending on your Cribl license.(Cribl-managed Cloud Workers have a simpler configuration.) You configure persistent queueing individually for each Source and Destination that supports it.

On Sources:

Click a Source to open its configuration modal.
Select Persistent Queue Settings.
Toggle Enable Persistent Queue to on.

On Destinations:

Click a Destination to open its configuration modal.
Set Backpressure behavior to Persistent Queueing.

These selections expose the additional controls outlined below.

For an example of optimizing several PQ throttling options listed below, see our Configure Cribl Stream to Avoid Data Loss With Persistent Queuing blog post.

Source-Side PQ Only

Mode: Select a condition for engaging persistent queues.

Always On: This default option will always write events to the persistent queue, before forwarding them to Cribl Edge’s data processing engine.
Smart: This option will engage PQ only when the Source detects backpressure from Cribl Edge’s data processing engine.

Max buffer size: The maximum number of events to hold in memory before reporting backpressure to the sender and writing the queue to disk. Defaults to 1000. (This buffer is per connection, not just per Worker Process – and this can dramatically expand memory usage.)

Commit frequency: The number of events to send downstream before committing that Stream has read them. Defaults to 42.

Always On versus Smart Mode

In Cribl Edge 4.1 and later, Source-side PQ defaults to Always on mode when you configure a new Source. Always on offers the highest data durability, by minimizing backpressure and potential data loss on senders. However, this mode can slightly slow down data throughput, and can require you to provision more machines and/or faster disks.

If you are upgrading from a pre-4.1 version where Smart mode was the default, this change does not affect your existing Sources’ configurations. However:

If you create new Stream Sources programmatically, and you want to enforce the previous Smart mode, you’ll need to update your existing code.
You can optimize Workers’ startup connections and CPU load at Fleet Settings > Worker Processes.

Consider switching to Always on mode if you find Smart mode causing Source-side PQ to frequently engage, and you’ve configured PQ with a large Max file size (such as, 1 GB). However, a second option here is to retain Smart mode, but reduce the Max file size to a value no higher than the default 1 MB.

(Source-side PQ excessively triggers backpressure only when it reaches the last queued file and that file is large. Keeping this threshold low prevents that condition.)

Common PQ Settings (Source and Destination Sides)

Max file size: The maximum data volume to store in each queue file before closing it and (optionally) applying the configured Compression. Enter a numeral with units of KB, MB, and so forth. If not specified, Cribl Edge applies the default 1 MB.

Max queue size: The maximum amount of disk space that the queue is allowed to consume, on each Worker Process. Once this limit is reached, this Source or Destination will stop queueing data; Sources will block, and Destinations will apply your configured Queue‑full behavior. Enter a numeral with units of KB, MB, and so forth. If not specified, the implicit 0 default will enable Cribl Edge to fill all available disk space on the volume.

Queue file path: The location for the persistent queue files. Defaults to $CRIBL_HOME/state/queues. To this value, Cribl Edge will append /<worker‑id>/<output‑id>.

Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to None; Gzip is also available.

Max Queue Size with Compression

If you enable Compression and also enter a Max queue size limit, be aware of how these options interact:
Cribl Edge currently applies the Max queue size limit against the uncompressed data volume entering the queue.
With this combination, Cribl recommends that you set the Max queue size limit higher than the volume’s total available disk space – again disregarding compression. This will maximize queue saturation and minimize data loss. (For details, see Known Issues.)

Destination-Side PQ Only

Queue-full behavior: Determines whether to block or drop events when the queue is exerting backpressure (because disk is low or at full capacity). Block is the same behavior as non-PQ blocking, corresponding to the Block option on the Backpressure behavior drop-down. Drop new data throws away incoming data, while leaving the contents of the PQ unchanged.

Clear persistent queue: Click this button if you want to delete the files that are currently queued for delivery to this Destination. A confirmation modal will appear. (Appears only after Output ID has been defined.)

This section’s remaining controls are available in Cribl Edge 4.1 and later.

Strict ordering: The default Yes position enables FIFO (first in, first out) event forwarding. When receivers recover, Cribl Edge will send earlier queued events before forwarding newly arrived events. To instead prioritize new events before draining the queue, toggle this off. Doing so will expose this additional control:

Drain rate limit (EPS): Optionally, set a throttling rate (in events per second) on writing from the queue to receivers. (The default 0 value disables throttling.) Throttling the queue’s drain rate can boost the throughput of new/active connections, by reserving more resources for them. You can further optimize Workers’ startup connections and CPU load at Fleet Settings > Worker Processes.

Minimum Free Disk Space

For queuing to operate properly, you must provide sufficient disk space. In distributed mode, you configure the minimum disk space to support queues (and other features) on each Fleet, at Fleet Settings > General Settings > Limits > Min free disk space. If available disk space falls below this threshold, Cribl Edge will stop maintaining persistent queues, and data loss will begin. The default minimum is 5 GB.

Persistent Queue Support by Destination Type

Persistent Queues support, behavior, and triggers vary by Destination type, as summarized below.

HTTP-Based Destinations

HTTP-based Destinations handle backpressure based on HTTP response codes. The following conditions will trigger PQ:

Connection failure.
HTTP 500 responses.
Data overload – sending more data than the Destination will accept.

HTTP 400 response errors will not engage PQ, and Cribl Edge will simply drop corresponding events. Cribl Edge cannot retry these requests, because they have been flagged as “bad,” and would just fail again. If you see 400 errors, these often indicate a need to correct your Destination’s configuration.

HTTP-based Destinations include:

WebHook
New Relic Logs & Metrics
New Relic Events
Open Telemetry
Sumo Logic
DataDog
Elastic
Honeycomb
Prometheus
Splunk HEC
Signal FX
Wavefront
Google Chronicle
Loki

TCP Load-Balanced Destinations

TCP load-balanced Destinations can be configured with one or multiple receivers. If one or more receivers go down, Cribl Edge will continue sending data to any healthy receivers. The following conditions will trigger PQ:

Connection errors on all receivers.
As long as even one of the Destination’s receivers is healthy, Cribl Edge will redirect data to that receiver, and will not engage PQ.
Data overload – sending more data than the Destination will accept.

TCP load-balanced Destinations include:

Splunk Load Balanced
TCP JSON
Syslog

Destinations Without PQ Support

Filesystem-based Destinations, and Destinations that use the UDP protocol, do not support PQ. These include:

Amazon S3 Compatible Stores
Data Lakes > Amazon S3
Data Lakes > Amazon Security Lake
Azure Blob Storage
Filesystem/NFS
Google Cloud Storage
MinIO
SNMP Trap
StatsD (with UDP)
StatsD Extended (with UDP)
Syslog (with UDP)

Filesystem-based Destinations do not support PQ because they already persist events to disk, before sending them to their final destinations.

UDP-based Destinations do not support PQ because the protocol is not reliable. Cribl Edge gets no indication whether the receiver received an event.

Other Destinations

Other Destinations with a single receiver generally engage PQ based on these trigger conditions:

Connection errors.
Fail to send an event (for any reason).

Getting Notified about PQ Status

Depending on your Cribl license, you may be able to configure Notifications to be sent when Persistent Queue files exceed a configurable percentage of allocated storage, or when they reach the queue-full state described above.

These Notifications always appear in Cribl Edge’s UI and internal logs, and you can also send them to external systems. For setup details, see Destination-State Notifications.

Persistent Queues

Persistent Queues Supplement In‑Memory Queues

Destination Side

Life Without PQ

PQ = Durability

Source Side

Source and/or Destination PQ?

Persistent Queue Details and Constraints

Configuring Persistent Queueing

Configuring PQ for Cribl-Managed Cribl.Cloud Workers

Sources

Destinations

Logs

Configuring PQ for On-Prem and Hybrid Workers

Source-Side PQ Only

Always On versus Smart Mode

Common PQ Settings (Source and Destination Sides)

Max Queue Size with Compression

Destination-Side PQ Only

Minimum Free Disk Space

Persistent Queue Support by Destination Type

HTTP-Based Destinations

TCP Load-Balanced Destinations

Destinations Without PQ Support

Other Destinations

Getting Notified about PQ Status

Common Resources

Need more help?

Cribl Suite v4.10

Cribl Suite v4.9.3

Cribl Suite 4.9.2

Cribl Suite v4.9.1

Cribl Suite v4.9

Persistent Queues

Persistent Queues Supplement In‑Memory Queues​

Destination Side​

Life Without PQ​

PQ = Durability​

Source Side​

Source and/or Destination PQ?​

Persistent Queue Details and Constraints​

Configuring Persistent Queueing​

Configuring PQ for Cribl-Managed Cribl.Cloud Workers​

Sources​

Destinations​

Logs​

Configuring PQ for On-Prem and Hybrid Workers​

Source-Side PQ Only​

Always On versus Smart Mode​

Common PQ Settings (Source and Destination Sides)​

Max Queue Size with Compression ​

Destination-Side PQ Only​

Minimum Free Disk Space​

Persistent Queue Support by Destination Type​

HTTP-Based Destinations​

TCP Load-Balanced Destinations​

Destinations Without PQ Support​

Other Destinations​

Getting Notified about PQ Status​

Common Resources

Need more help?

Cribl Suite v4.10

Cribl Suite v4.9.3

Cribl Suite 4.9.2

Cribl Suite v4.9.1

Cribl Suite v4.9

Persistent Queues Supplement In‑Memory Queues

Destination Side

Life Without PQ

PQ = Durability

Source Side

Source and/or Destination PQ?

Persistent Queue Details and Constraints

Configuring Persistent Queueing

Configuring PQ for Cribl-Managed Cribl.Cloud Workers

Sources

Destinations

Logs

Configuring PQ for On-Prem and Hybrid Workers

Source-Side PQ Only

Always On versus Smart Mode

Common PQ Settings (Source and Destination Sides)

Max Queue Size with Compression

Destination-Side PQ Only

Minimum Free Disk Space

Persistent Queue Support by Destination Type

HTTP-Based Destinations

TCP Load-Balanced Destinations

Destinations Without PQ Support

Other Destinations

Getting Notified about PQ Status