On This Page

Home / Search/Get Your Data into Cribl Search

Get Your Data into Cribl Search

You can use the Get Data In workflow to quickly connect your data Sources to Cribl Search without owning or configuring Cribl Stream. Then, use Datatyping to filter and shape data, and organize your data into Datasets. This workflow is intended to be used end-to-end as a fast, but not highly configurable, way to ingest and search data.

For a more granular, configurable, but more complex setup that requires Cribl Stream, see Connect to External Data.


Prerequisites

You must have at least one lakehouse engine configured to add data Sources. See lakehouse engines in Cribl Search.


Workflow Overview

This is a three-step workflow to configure data sources and start searching:

  1. Connect your data sources.
  2. Use Datatypes to filter and shape data.
  3. Use Datasets to organize your data.

Get Your Data In

To configure the flow, first connect your data Sources. This is the data you’ll target with your searches.

  1. In Cribl Search, select Data.
  2. On the Get Data In tab, select Add Source and choose a Source type.

For step-by-step instructions for each Source, see Sources.

Confirm that Data is Flowing

Once you have a Source configured, you can confirm that data is flowing:

  1. Select Live Data to open the Live Data window.
  2. Adjust the filters to narrow the results.

Filters

FilterDescription
ObjectsFilter by Source, Datatype, or Dataset.
SourceShow only events from a specific configured Source (for example, a particular syslog or HTTP input).
DatatypesShow only events that match selected Datatypes (for example, aws_vpcflow, zeek_json), regardless of Source.
DatasetsShow only events being written into specific Datasets, so you can verify Dataset routing.

Capture Level

LevelDescription
At sourceCapture events as they arrive from the Source, before any processing.
After datatypingCapture events after Datatype rules have run and fields are parsed.
After Dataset detectionCapture events after both datatyping and Dataset routing, reflecting what will be stored.

Size and Duration

SettingDescription
Maximum events to captureThe most events the live data capture will return. Limit: 10,000 events.
Capture time (seconds)How long the capture runs. Limit: 3600 seconds.

Types

TypeDescription
UncategorizedEvents not categorized into a specific Datatype.
Default AI Auto-datatypeEvents that did not match any custom Datatype rule and were categorized by the default AI auto-datatype.
OrphanedEvents that no longer have a valid Lakehouse Dataset assigned. Cribl Search treats an event as orphaned when a Dataset rule routed it to a Dataset that no longer exists.
DroppedEvents you intentionally discarded (by a Drop rule, devnull rule, or backpressure) before they reached a downstream destination.

Filter and Shape Data with Datatypes

A Datatype is a reusable definition that tells Cribl Search how to recognize a specific type of event and parse it into fields for fast, schema-aware queries. In the Get Your Data In flow, datatyping occurs directly after Cribl Search ingests data from the Source.

Default AI Auto-Datatypes

Cribl Search includes a default AI auto-datatype that matches all data by default using a wildcard (*). The lakehouse engine uses this AI-generated v2 Datatype and its associated auto ruleset to classify events after it has tried any user-defined Datatype rules.

Create a v2 Datatype Rule

  1. In Cribl Search, select Datatyping (auto).
  2. Select Add Datatype Rule.
  3. Give your rule a name, an optional description, and a Kusto expression to match to the data.
  4. Select an existing v2 Datatype from the Datatype drop-down.
  5. Select Add.

For full details and configuration options, see Shape Your Data and Datatypes.


Organize Data with Datasets

Lakehouse Datasets determine what data Cribl Search retains, how long it is stored, and which engine backs the search experience.

Add a Dataset Rule

In the Get Data In flow, go to the Organize Your Data step.

  1. Select Add Dataset Rule to create a new Dataset rule.
  2. Edit a rule by selecting it in the table.
  3. Reorder rules by dragging them; rules are evaluated top-down.

For each rule, you can configure:

  • Name: A short identifier for the rule.
  • Description: Optional notes about what the rule does.
  • Kusto expression: The filter that selects events (for example, datatype == "aws_vpcflow").
  • Send data to: Either Dataset (choose a Dataset from the list) or Drop (discard matching events).

Place more specific rules above broader ones. For example, put a rule for host == "api-server" above * so API traffic is routed first.

For more configuration details and organization tips, see Organize Your Data.

View Orphaned Data

Events that do not match any rule, or that match a rule pointing to a deleted or invalid Dataset, appear as Orphaned. In the Get Data In flow, you can:

  • View Live Data - See a sample of orphaned events as they arrive.
  • View Last 24h - Run a search for orphaned events from the past 24 hours.

Use these to inspect what is not being routed correctly and add or adjust rules as needed.


Next Steps