Sources Overview

Cribl Stream can receive continuous data input from various Sources, including Splunk, HTTP, Elastic Beats, Kinesis, Kafka, TCP JSON, and many others. Sources can receive data from either IPv4 or IPv6 addresses.

Push and Pull Sources
Push and Pull Sources

Sources Summary

Cribl Stream supports a variety of HTTP-, TCP-, and UDP-based Sources. All HTTP-based Sources are proxyable.

Collector Sources

Collectors ingest data intermittently in on-demand bursts (“ad hoc collection”), on preset schedules, or by “replaying” data from local or remote stores.

All Collector Sources are proxyable.

These Sources can ingest data only if a Leader is active:

For background and instructions on using Collectors, see:

Check out the example REST Collector configurations in Cribl’s Collector Templates repository. For many popular Collectors, the repo provides configurations (with companion Event Breakers, and event samples in some cases) that you can import into your Cribl Stream instance, saving the time you’d have spent building them yourself.

Push Sources

Supported data Sources that send to Cribl Stream.

These Sources can continue ingesting data even if no Leader is active:

Data from these Sources is normally sent to a set of Cribl Stream Workers through a load balancer. Some Sources, such as Splunk forwarders, have native load-balancing capabilities, so you should point these directly at Cribl Stream.

Pull Sources

Supported Sources that Cribl Stream fetches data from.

These Sources can ingest data only if a Leader is active:

System Sources

Sources supply information generated by Cribl Stream about itself or move data among Workers within your Cribl Stream deployment.

Internal Sources

Similar to System Sources, Internal Sources also supply information generated by Cribl Stream about itself. However, unlike System Sources, they don’t count towards the license usage.

Configuring and Managing Sources

For each Source type, you can create multiple definitions, depending on your requirements.

From the top nav, click Manage, then select a Worker Group to configure. Next, you have two options:

To configure via the graphical QuickConnect UI, click Routing > QuickConnect (Stream) or Collect (Edge). Next, click Add Source at left. From the resulting drawer’s tiles, select the desired Source. Next, click either Add Destination or (if displayed) Select Existing.

Or, to configure via the Routing UI, click Data > Sources (Stream) or More > Sources (Edge). From the resulting page’s tiles or left nav, select the desired Source. Next, click New Source to open a New Source modal.

To edit any Source’s definition in a JSON text editor, click Manage as JSON at the bottom of the New Source modal, or on the Configure tab when editing an existing Source. You can directly edit multiple values, and you can use the Import and Export buttons to copy and modify existing Source configurations as .json files.

When JSON configuration contains sensitive information, it is redacted during export.

Capturing Source Data

To capture data from a single enabled Source, you can bypass the Preview pane, and instead capture directly from a Manage Sources page. Just click the Live button beside the Source you want to capture.

To capture live data, you must have Worker Nodes registered to the Worker Group for which you’re viewing events. You can view registered Worker Nodes from the Status tab in the Source.

Source > Live button
Source > Live button

You can also start an immediate capture from within an enabled Source’s config modal, by clicking the modal’s Live Data tab.

Source modal > Live Data tab
Source modal > Live Data tab

Monitoring Source Status

Each Source’s configuration modal offers two tabs for monitoring: Status and Charts.

Status Tab

The Status tab provides details about the Workers in the group and their status. An icon shows whether the Worker is operating normally.

You can click each Worker’s row to see specific information, for example, to identify issues when the Source displays an error. The specific set of information provided depends on the Source type. The data represents only process 0 for each Worker Node.

The content of the Status tab is loaded live when you open it and only displayed when all the data is ready. With a lot of busy Workers in a group, or Workers located far from the Leader, there may be a delay before you see any information.

The statistics presented are reset when the Worker restarts.

Charts Tab

The Charts tab presents a visualization of the recent activity on the Source. The following data is available:

  • Events in
  • Thruput in (events per second)
  • Bytes in
  • Thruput in (bytes per second)

This data (in contrast with the status tab) is read almost instantly and does not reset when restarting a Worker.

Preconfigured Sources

To accelerate your setup, Cribl Stream ships with several common Sources configured for typical listening ports, but not switched on. Open, clone (if desired), modify, and enable any of these preconfigured Sources to get started quickly:

System and Internal Sources:

In a preconfigured Source’s configuration, never change the Address field, even though the UI shows an editable field. If you change these fields’ value, the Source will not work as expected.

After you create a Source and deploy the changes, it can take a few minutes for the Source to become available in Cribl.Cloud’s load balancer. However, Cribl Stream will open the port, and will be able to receive data, immediately.

Cribl.Cloud Ports and TLS Configurations

Cribl.Cloud provides several data Sources and ports already enabled for you, plus 11 additional TCP ports (20000-20010) that you can use to add and configure more Sources.

The Cribl.Cloud portal’s Data Sources tab displays the pre‑enabled Sources, their endpoints, the reserved and available ports, and protocol details. For each existing Source listed here, Cribl recommends using the preconfigured endpoint and port to send data into Cribl Stream.

Available ports and TLS certificates
Available ports and TLS certificates
Cribl HTTP and Cribl TCP Sources/Destinations

Use the Cribl HTTP Destination and Source, and/or the Cribl TCP Destination and Source, to relay data between Worker Nodes connected to the same Leader. This traffic does not count against your ingestion quota, so this routing prevents double-billing. (For related details, see Exemptions from License Quotas.)

Backpressure Behavior and Persistent Queues

By default, a Cribl Stream Source will respond to a backpressure situation by blocking incoming data. Backpresssure triggers exist when an in-memory buffer is full and/or when downstream Destinations/receivers are unavailable. The Source will refuse to accept new data until it can flush its buffer.

This will propagate block signals back to the sender, if it supports backpressure. Note that UDP senders (including SNMP Traps and some syslog senders) do not provide this support. In this situation, Cribl Stream will simply drop new events until the Source can process them.

Persistent Queues

Push Sources’ config modals provide a Persistent Queue Settings option to minimize loss of inbound streaming data. Here, the Source will write data to disk until its in-memory buffer recovers. Then, it will drain the disk-queued data in FIFO (first in, first out) order.

When you enable Source PQ, you can choose between two trigger conditions: Smart Mode will engage PQ upon backpressure from Destinations, whereas Always On Mode will use PQ as a buffer for all events.

For details about the PQ option and these modes, see Persistent Queues.

Persistent queues, when engaged, slow down data throughput somewhat. It is redundant to enable PQ on a Source whose upstream sender is configured to safeguard events in its own disk buffer.

Other Backpressure Options

The S3 Source provides a configurable Advanced Settings > Socket timeout option, to prevent data loss (partial downloading of logs) during backpressure delays.

Diagnosing Backpressure Errors

When backpressure affects HTTP Sources (Splunk HEC, HTTP/S, Raw HTTP/S, and Kinesis Firehose), Cribl Stream internal logs will show a 503 error code.