Get Your Data into Cribl Search
You can use the Get Data In workflow to quickly connect your data Sources to Cribl Search without owning or configuring Cribl Stream. Then, use Datatyping to filter and shape data, and organize your data into Datasets. This workflow is intended to be used end-to-end as a fast, but not highly configurable, way to ingest and search data.
For a more granular, configurable, but more complex setup that requires Cribl Stream, see Connect to External Data.
Prerequisites
You must have at least one lakehouse engine configured to add data Sources. See lakehouse engines in Cribl Search.
Workflow Overview
This is a three-step workflow to configure data sources and start searching:
- Connect your data sources.
- Use Datatypes to filter and shape data.
- Use Datasets to organize your data.
Get Your Data In
To configure the flow, first connect your data Sources. This is the data you’ll target with your searches.
- In Cribl Search, select Data.
- On the Get Data In tab, select Add Source and choose a Source type.
For step-by-step instructions for each Source, see Sources.
Confirm that Data is Flowing
Once you have a Source configured, you can confirm that data is flowing:
- Select Live Data to open the Live Data window.
- Adjust the filters to narrow the results.
Filters
| Filter | Description |
|---|---|
| Objects | Filter by Source, Datatype, or Dataset. |
| Source | Show only events from a specific configured Source (for example, a particular syslog or HTTP input). |
| Datatypes | Show only events that match selected Datatypes (for example, aws_vpcflow, zeek_json), regardless of Source. |
| Datasets | Show only events being written into specific Datasets, so you can verify Dataset routing. |
Capture Level
| Level | Description |
|---|---|
| At source | Capture events as they arrive from the Source, before any processing. |
| After datatyping | Capture events after Datatype rules have run and fields are parsed. |
| After Dataset detection | Capture events after both datatyping and Dataset routing, reflecting what will be stored. |
Size and Duration
| Setting | Description |
|---|---|
| Maximum events to capture | The most events the live data capture will return. Limit: 10,000 events. |
| Capture time (seconds) | How long the capture runs. Limit: 3600 seconds. |
Types
| Type | Description |
|---|---|
| Uncategorized | Events not categorized into a specific Datatype. |
| Default AI Auto-datatype | Events that did not match any custom Datatype rule and were categorized by the default AI auto-datatype. |
| Orphaned | Events that no longer have a valid Lakehouse Dataset assigned. Cribl Search treats an event as orphaned when a Dataset rule routed it to a Dataset that no longer exists. |
| Dropped | Events you intentionally discarded (by a Drop rule, devnull rule, or backpressure) before they reached a downstream destination. |
Filter and Shape Data with Datatypes
A Datatype is a reusable definition that tells Cribl Search how to recognize a specific type of event and parse it into fields for fast, schema-aware queries. In the Get Your Data In flow, datatyping occurs directly after Cribl Search ingests data from the Source.
Default AI Auto-Datatypes
Cribl Search includes a default AI auto-datatype that matches all data by default using a wildcard (*). The lakehouse
engine uses this AI-generated v2 Datatype and its associated auto ruleset to classify events after it has tried any
user-defined Datatype rules.
Create a v2 Datatype Rule
- In Cribl Search, select Datatyping (auto).
- Select Add Datatype Rule.
- Give your rule a name, an optional description, and a Kusto expression to match to the data.
- Select an existing v2 Datatype from the Datatype drop-down.
- Select Add.
For full details and configuration options, see Shape Your Data and Datatypes.
Organize Data with Datasets
Lakehouse Datasets determine what data Cribl Search retains, how long it is stored, and which engine backs the search experience.
Add a Dataset Rule
In the Get Data In flow, go to the Organize Your Data step.
- Select Add Dataset Rule to create a new Dataset rule.
- Edit a rule by selecting it in the table.
- Reorder rules by dragging them; rules are evaluated top-down.
For each rule, you can configure:
- Name: A short identifier for the rule.
- Description: Optional notes about what the rule does.
- Kusto expression: The filter that selects events (for example,
datatype == "aws_vpcflow"). - Send data to: Either Dataset (choose a Dataset from the list) or Drop (discard matching events).
Place more specific rules above broader ones. For example, put a rule for host == "api-server" above * so API
traffic is routed first.
For more configuration details and organization tips, see Organize Your Data.
View Orphaned Data
Events that do not match any rule, or that match a rule pointing to a deleted or invalid Dataset, appear as Orphaned. In the Get Data In flow, you can:
- View Live Data - See a sample of orphaned events as they arrive.
- View Last 24h - Run a search for orphaned events from the past 24 hours.
Use these to inspect what is not being routed correctly and add or adjust rules as needed.