These docs are for Cribl Stream 4.0 and are no longer actively maintained.
See the latest version (4.13).
Splunk Search
Cribl Stream supports collecting search results from Splunk queries. The queries can be both simple and complex, as well as real-time searches. This page covers how to configure the Collector.
Configuring a Splunk Search Collector
From the top nav, click Manage, then select a Worker Group to configure. Next, select Data > Sources, then select Collectors > Splunk Search from the Manage Sources page’s tiles or left nav. Click New Collector to open the Splunk Search > New Collector modal, which provides the following options and fields.
The sections described below are spread across several tabs. Click the tab links at left to navigate among tabs. Click Save when you’ve configured your Collector.
Collector Sources currently cannot be selected or enabled in the QuickConnect UI.
Collector Settings
The Collector Settings determine how data is collected before processing.
Collector ID: Unique ID for this Collector. E.g., splunk2search
.
Search endpoint: Rest API used to conduct a search. Defaults to services/search/jobs/export
.
Output mode: Format of the returned output. Defaults to JSON format.To parse the returned JSON, add the Cribl event breaker which parses newline delimited events in the Event Breakers tab.
Events returned from Splunk search can also be returned in the more compact CSV format. To use CSV format, set the Output mode to CSV and specify the CSV event breaker in the Event Breakers tab.
Search
In the Search dropdown, type your query parameters:
Search: Enter the Splunk query. For example: index=myAppLogs level=error channel=myApp
OR | mstats avg(myStat) as myStat WHERE index=myStatsIndex
.
Search head: Enter the search head base URL. The default is https://localhost:8089
.
Earliest: You can enter the earliest time boundary for the search. This maybe be an exact or relative time. For example: 2022-01-14T12:00:00Z
or -16m@m
.
Latest: You can enter the latest time boundary for the search. This maybe be an exact or relative time. For example: 2022-01-14T12:00:00Z
or -16m@m
.
Authentication
In the Authentication drop-down, use the buttons to select one of these options:
None: Don’t use authentication. Compatible with REST servers like AWS, where you embed a secret directly in the request URL.
Basic: Displays Username and Password fields for you to enter HTTP Basic authentication credentials.
Basic (credentials secret): Provide username and password credentials referenced by a secret. Select a stored text secret in the resulting Credentials secret drop-down, or click Create to configure a new secret.
Bearer Token: Provide the
token
value configured and generated in Splunk.Bearer Token (text secret): Provide the Bearer Token referenced by a secret. Select a stored text secret in the resulting Token (text secret) drop-down, or click Create to configure a new secret.
Optional Settings
Extra parameters: Optional HTTP request parameters to append to the request URL. These refine or narrow the request. Click + Add Parameter to add parameters as key-value pairs:
- Name: Field name.
- Value: JavaScript expression to compute the field’s value (can be a constant).
Extra headers: Click + Add Header to (optionally) add collection request headers as key-value pairs:
- Name: Header name.
- Value: JavaScript expression to compute the header’s value (can be a constant).
In the Request Timeout (secs) field, you can set a maximum time period (in seconds) for an HTTP request to complete before Cribl Stream treats it as timed out. Defaults to 0
, which disables timeout metering.
When running a real-time search you must update the Request Timeout Parameter to avoid having the collector stuck in a forever running state. Updating the Request Timeout Parameter stops the search after the allocated period of time.
Round-robin DNS: Toggle on to enable round-robin DNS lookup across multiple IP addresses, IPv4 and IPv6. When a DNS server resolves a Fully Qualified Domain Name (FQDN) to multiple IP addresses, Cribl Stream will sequentially use each address in the order they are returned by the DNS server for subsequent connection attempts.
Disable time filter: Toggle to Yes
if your Run or Schedule configuration specifies a date range and no events are being collected. This will disable the Collector’s event time filtering to prevent timestamp conflicts.
Tags: Optionally, add tags that you can use for filtering and grouping in Cribl Stream. Use a tab or hard return between (arbitrary) tag names.
Result Settings
The Result Settings determine how Cribl Stream transforms and routes the collected data.
Custom Command
In this section, you can pass the data from this input to an external command for processing, before the data continues downstream.
Enabled: Defaults to No
. Toggle to Yes
to enable the custom command.
Command: Enter the command that will consume the data (via stdin
) and will process its output (via stdout
).
Arguments: Click + Add Argument to add each argument to the command. You can drag arguments vertically to resequence them.
Event Breakers
In this section, you can apply event breaking rules to convert data streams to discrete events.
Event Breaker rulesets: A list of event breaking rulesets that will be applied, in order, to the input data stream. Defaults to System Default Rule
.
Event Breaker buffer timeout: How long (in milliseconds) the Event Breaker will wait for new data to be sent to a specific channel, before flushing out the data stream, as-is, to the Routes. Minimum 10
ms, default 10000
(10 sec), maximum 43200000
(12 hours).
Fields
In this section, you can add Fields to each event, using Eval-like functionality.
Name: Field name.
Value: JavaScript expression to compute the field’s value (can be a constant).
Result Routing
Send to Routes: If set to Yes
(the default), Cribl Stream will send events to normal routing and event processing. Toggle to No
to select a specific Pipeline/Destination combination. The No
setting exposes these two additional fields:
- Pipeline: Select a Pipeline to process results.
- Destination: Select a Destination to receive results.
The default Yes
setting instead exposes this field:
- Pre-processing Pipeline: Pipeline to process results before sending to Routes. Optional.
This field is always exposed:
- Throttling: Rate (in bytes per second) to throttle while writing to an output. Also takes values with multiple-byte units, such as
KB
,MB
,GB
, etc. (Example:42 MB
.) Default value of0
indicates no throttling.
You might disable Send to Routes when configuring a Collector that will connect data from a specific Source to a specific Pipeline and Destination. One use case might be a Splunk Search Collector that gathers a known, simple type of data. This approach keeps the Collector’s configuration self‑contained and separate from Cribl Stream’s routing table for live data – potentially simplifying the Routes structure.
Pre-processing Pipeline: Pipeline to process results before sending to Routes. Optional, and available only when Send to Routes is toggled to Yes
.
Throttling: Rate (in bytes per second) to throttle while writing to an output. Also takes values with multiple-byte units, such as KB
, MB
, GB
, etc. (Example: 42 MB
.) Default value of 0
indicates no throttling.
Advanced Settings
Advanced Settings enable you to customize post-processing and administrative options.
Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.
Time to live: How long to keep the job’s artifacts on disk after job completion. This also affects how long a job is listed in Job Inspector. Defaults to 4h
.
Remove Discover fields : List of fields to remove from the Discover results. This is useful when discovery returns sensitive fields that should not be exposed in the Jobs user interface. You can specify wildcards (such as aws*
).
Resume job on boot: Toggle to Yes
to resume ad hoc collection jobs if Cribl Stream restarts during the jobs’ execution.
How Cribl Stream Pulls Data
This Collector will gather data from the specified Search head URL. If you enable scheduled collection, searches will repeat on the interval specified in the Schedule modal’s Cron schedule field.
A single Worker executes each collection job. If the Leader goes down, search jobs in progress will complete, but future scheduled searches will not run until the Leader relaunches.