On This Page

Home / Search/ Get Data In/Shape Data with Datatype Rules

Shape Data with Datatype Rules

Parse and structure data being ingested into Cribl Search, so you can search it fast from the start.


Highlights
  • Try Auto-Datatyping first, then handle uncategorized events with your own Datatype rules.
  • Point your rules at stock v2 Datatypes, or define custom Datatypes.
  • For detailed info on how Datatyping works (with or without lakehouse engines), see Datatypes in Cribl Search.

Datatyping in Lakehouse Engines

When your data flows from Sources into a lakehouse engine, it’s parsed into structured events through a process called Datatyping. Here’s how you can use it to shape your data:

  1. Use Auto-Datatyping: First, let Cribl Search assign Datatypes automatically.
  2. Check for uncategorized data: See if any events were missed.
  3. Define your own Datatype rules: To handle the uncategorized data, map specific patterns to the existing stock Datatypes.
  4. Add custom Datatypes: To parse data not covered by the stock Datatypes, edit or add entirely new Datatypes.

Want to know more about Datatyping? See Datatypes in Cribl Search.

1. Parse Automatically (Auto-Datatyping)

By default, Cribl Search automatically analyzes incoming events and assigns a matching Datatype. This requires no configuration on your part, and covers many common log types and data formats.

For most types of data, Auto-Datatyping just works.

2. Check for Uncategorized Data

Data that doesn’t match any Datatypes displays as Uncategorized.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > Datatyping (auto).
  2. Under Uncategorized Data, select:
    • View Live Data to see a sample of uncategorized data as it arrives.
    • View Last 24h to run a search for uncategorized data from the past 24 hours.

3. Add Datatype Rules

To handle data uncategorized by Auto-Datatyping, define your own rules that map specific data patterns to stock v2 Datatypes or custom v2 Datatypes.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > Datatyping (auto).
  2. Select Add Datatype Rule. Name and describe your rule.
  3. In Kusto expression to match, enter a KQL expression that matches a subset of the uncategorized data.

    For details, see Datatype Rule Expressions.

  4. From Datatype, select the Datatype to assign. You can choose from stock v2 Datatypes and your custom v2 Datatypes.
  5. Make sure that Enabled in the top right corner is checked, and confirm with Add.

Datatype Rule Expressions

Point your Datatype KQL expressions at the fields that are guaranteed to exist before Datatype assignment:

FieldDescription
_rawRaw event text before Datatyping.
__inputIdSource identifier in type:id format.

Supported types: cribl_http, datadog_agent, elastic, http_raw, open_telemetry, prometheus_rw, splunk, splunk_hec, syslog, tcp, tcpjson, wef, wiz_webhook.

Example: syslog:my_source_id.

You can:

  • Create KQL expressions that evaluate to true/false for matching events.
  • Set case-insensitive conditions using = and wildcards (*).
  • Pipe into | where ..., | find ..., or | search ... for richer logic.

But:

  • You can’t use expressions that aggregate or reshape data (such as stats or project).
  • You can’t use let or set statements.

See the examples below for typical use cases. For full reference on the Cribl Search implementation of KQL, see Language Reference.

Text in _rawOne SourceText + SourceSource FamilyChained Filters
_raw = "*failed login*"

Matches events that contain failed login (case-insensitive) anywhere in their raw text:

# Event 1
Mar 19 12:01:44 auth0 sshd[2041]: failed login for user admin from 10.0.0.5

# Event 2
2026-03-19T08:15:00Z WARN Failed Login attempt on account jdoe
__inputId = "syslog:my_source_id"

Matches all events from a syslog Source with ID my_source_id.

_raw = "*timeout*" | where __inputId = "open_telemetry:otel"

Matches events from an OpenTelemetry Source with ID otel that contain timeout (case-insensitive) anywhere in their raw text:

# Event 1
{"severity":"ERROR","body":"upstream timeout after 30s","resource":{"service.name":"api-gw"}}

# Event 2
{"severity":"WARN","body":"connection timeout to redis-01:6379","resource":{"service.name":"cache"}}
| where __inputId startswith "cribl_http:"

Matches any event from all Cribl HTTP Sources, regardless of their ID.

_raw = "*sshd*" | search "Failed password" or "Invalid user"

Matches events that contain sshd (case-insensitive) in their raw text and whose full text includes Failed password or Invalid user, or both:

# Event 1
Mar 19 14:22:01 prod-01 sshd[3421]: Failed password for root from 203.0.113.14 port 54321 ssh2

# Event 2
Mar 19 14:22:03 prod-01 sshd[3422]: Invalid user deploy from 198.51.100.7 port 48291

4. Add Custom Datatypes

To handle uncategorized data that’s not covered by the stock v2 Datatypes:

  1. Add a custom v2 Datatype.
  2. Create a Datatype rule that points at your custom Datatype.

Timestamps and Timezones

Cribl Search parses each event and interprets the timestamp. If the event includes timezone info, Search uses it. If not, Search assumes UTC.

To fix timezone offsets:

  • From Cribl Stream: Set the _time field in Stream to the correct time. Search uses this value and ignores the timestamp in the event.
  • Without Stream: Create a Datatype in Cribl Search that includes the correct timezone label. This corrects the offset during Datatype processing.

Use Override Fields

When sending data to Cribl Search, you can use these override fields:

FieldEffect
datasetForces Cribl Search to ignore Dataset rules and route directly to the specified Dataset. If the Dataset does not exist, events route to main with _dataset_reason set to “does not exist.”
datatypeForces Cribl Search to ignore Datatype rules and use the specified Datatype. If the Datatype doesn’t exist, events run through Auto-Datatyping with normal matching logic.
_timeForces Cribl Search to ignore the time in the event and use the _time value from Stream (in UTC).
isParsedTells Cribl Search to skip Datatyping and use fields sent from Stream. Use this when you already parse data in Stream and don’t want to parse it again.

Next Steps

Now that you’ve shaped your data, organize it into Datasets.