On This Page

Home / Search/ Get Data In/Shape Data with Datatype Rules

Shape Data with Datatype Rules

Parse and structure data being ingested into Cribl Search, so you can search it fast from the start.


Highlights
  • Try Auto-Datatyping first, then handle uncategorized events with your own Datatype rules.
  • Point your rules at stock v2 Datatypes, or define custom Datatypes.
  • For detailed info on how Datatyping works (with or without lakehouse engines), see Datatypes in Cribl Search.

Datatyping in Lakehouse Engines

When your data flows from Sources into a lakehouse engine, it’s parsed into structured events through a process called Datatyping. Here’s how you can use it to shape your data:

  1. Use Auto-Datatyping: Let Cribl Search assign Datatypes automatically.
  2. Check for uncategorized data: See if any events were missed.
  3. Define your own Datatype rules: To handle the uncategorized data, map specific patterns to the existing stock Datatypes.
  4. Add custom Datatypes: To parse data not covered by the stock Datatypes, edit or add entirely new Datatypes.
Datatyping flow in Cribl Search lakehouse engines
Datatyping flow in Cribl Search lakehouse engines

To manage Datatyping in lakehouse engines: from the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > 2. Datatyping (auto).

Datatyping in lakehouse engines
Datatyping in lakehouse engines

1. Parse Automatically (Auto-Datatyping)

By default, Cribl Search automatically analyzes incoming events and assigns a matching Datatype. This requires no configuration on your part, and covers many common log types and data formats.

For most types of data, Auto-Datatyping just works.

2. Check for Uncategorized Data

Data that doesn’t match any Datatypes displays as Uncategorized.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > Datatyping.
  2. Under Uncategorized Data, select:
    • View Live Data to see a sample of uncategorized data as it arrives.
    • View Last 24h to run a search for uncategorized data from the past 24 hours.

3. Add Datatype Rules

To handle data uncategorized by Auto-Datatyping, define your own rules that map specific data patterns to stock v2 Datatypes or custom v2 Datatypes.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > Datatyping.
  2. Select Add Datatype Rule. Name and describe your rule.
  3. In Kusto expression to match, enter a KQL expression that matches a subset of the uncategorized data.

    For details, see Datatype Rule Expressions.

  4. From Datatype, select the Datatype to assign. You can choose from stock v2 Datatypes and your custom v2 Datatypes.
  5. Make sure that Enabled in the top right corner is checked, and confirm with Add.

If you add more rules, drag them to change the order. Rules run top-down, and the first match wins. Put more specific rules above broader ones.

Datatype Rule Expressions

Point your Datatype KQL expressions at the fields that are guaranteed to exist before Datatype assignment:

FieldDescription
_rawRaw event text before Datatyping.
__inputIdSource identifier in type:id format.

Supported types: cribl_http, datadog_agent, elastic, http_raw, open_telemetry, prometheus_rw, splunk, splunk_hec, syslog, tcp, tcpjson, wef, wiz_webhook.

Example: syslog:my_source_id.

You can copy __inputId or other fields from the arriving events. To see them, select Live Data on the Datatyping page.

You can:

  • Create KQL expressions that evaluate to true/false for matching events.
  • Set case-insensitive conditions using = and wildcards (*).
  • Pipe into | where ..., | find ..., or | search ... for richer logic.

But:

  • You can’t use expressions that aggregate or reshape data (such as stats or project).
  • You can’t use let or set statements.

See the examples below for typical use cases. For full reference on the Cribl Search implementation of KQL, see Language Reference.

Text in _rawOne SourceText + SourceSource FamilyChained Filters

Combine these patterns to catch multiple log formats with a single rule:

HTTP 5xx ErrorsMemory Exhaustion

4. Add Custom Datatypes

To handle uncategorized data that’s not covered by the stock v2 Datatypes:

  1. Add a custom v2 Datatype.
  2. Create a Datatype rule that points at your custom Datatype.

Fix Timezone Offsets

Cribl Search parses each event and interprets its timestamp. If the event includes timezone information, Cribl Search uses it. Otherwise, it assumes UTC.

You can deal with timezone offsets at different stages:

Override Cribl Search Processing

To bypass Cribl Search timestamping, Datatyping, or Dataset rules, add override fields to your events before they reach Cribl Search. You can add these in Cribl Stream, or in any upstream sender.

FieldWhat Cribl Search Does
_time
Skips timestamp parsing and uses this value (in UTC) as the event time.
datatypeSkips Datatype rules and applies the specified stock or custom v2 Datatype.

If the Datatype doesn’t exist, falls back to Auto-Datatyping.
isParsed
(use with datatype)
Uses the fields already on the event instead of reparsing _raw.

Always pair with datatype. Without it, events fall back to Auto-Datatyping.

Use when data is fully shaped upstream.
datasetSkips Dataset rules and routes directly to the specified Dataset.

If the Dataset doesn’t exist, routes to main with _dataset_reason = "does not exist".

For pre-parsed events from Cribl Stream or Cribl Edge, set both datatype and isParsed. If you also route by Dataset upstream, set dataset, too. See Ingest Cribl Stream/Edge Data into Cribl Search.

Next Steps

Now that you’ve shaped your data, organize it into Datasets.