On This Page

Home / Search/ Get Data In/Shape Data with Datatype Rules

Shape Data with Datatype Rules

Parse and structure data being ingested into Cribl Search, so you can search it fast from the start.


Highlights
  • Try Auto-Datatyping first, then handle uncategorized events with your own Datatype rules.
  • Point your rules at stock v2 Datatypes, or define custom Datatypes.
  • For detailed info on how Datatyping works (with or without lakehouse engines), see Datatypes in Cribl Search.

Datatyping in Lakehouse Engines

When your data flows from Sources into a lakehouse engine, it’s parsed into structured events through a process called Datatyping. Here’s how you can use it to shape your data:

  1. Use Auto-Datatyping: First, let Cribl Search assign Datatypes automatically.
  2. Check for uncategorized data: See if any events were missed.
  3. Define your own Datatype rules: To handle the uncategorized data, map specific patterns to the existing stock Datatypes.
  4. Add custom Datatypes: To parse data not covered by the stock Datatypes, edit or add entirely new Datatypes.

Want to know more about Datatyping? See Datatypes in Cribl Search.

1. Parse Automatically (Auto-Datatyping)

By default, Cribl Search automatically analyzes incoming events and assigns a matching Datatype. This requires no configuration on your part, and covers many common log types and data formats.

For most types of data, Auto-Datatyping just works.

2. Check for Uncategorized Data

Data that doesn’t match any Datatypes displays as Uncategorized.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > Datatyping.
  2. Under Uncategorized Data, select:
    • View Live Data to see a sample of uncategorized data as it arrives.
    • View Last 24h to run a search for uncategorized data from the past 24 hours.

3. Add Datatype Rules

To handle data uncategorized by Auto-Datatyping, define your own rules that map specific data patterns to stock v2 Datatypes or custom v2 Datatypes.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Get Data In > Datatyping.
  2. Select Add Datatype Rule. Name and describe your rule.
  3. In Kusto expression to match, enter a KQL expression that matches a subset of the uncategorized data.

    For details, see Datatype Rule Expressions.

  4. From Datatype, select the Datatype to assign. You can choose from stock v2 Datatypes and your custom v2 Datatypes.
  5. Make sure that Enabled in the top right corner is checked, and confirm with Add.

Datatype Rule Expressions

Point your Datatype KQL expressions at the fields that are guaranteed to exist before Datatype assignment:

FieldDescription
_rawRaw event text before Datatyping.
__inputIdSource identifier in type:id format.

Supported types: cribl_http, datadog_agent, elastic, http_raw, open_telemetry, prometheus_rw, splunk, splunk_hec, syslog, tcp, tcpjson, wef, wiz_webhook.

Example: syslog:my_source_id.

You can:

  • Create KQL expressions that evaluate to true/false for matching events.
  • Set case-insensitive conditions using = and wildcards (*).
  • Pipe into | where ..., | find ..., or | search ... for richer logic.

But:

  • You can’t use expressions that aggregate or reshape data (such as stats or project).
  • You can’t use let or set statements.

See the examples below for typical use cases. For full reference on the Cribl Search implementation of KQL, see Language Reference.

Text in _rawOne SourceText + SourceSource FamilyChained Filters
_raw = "*failed login*"

Matches events that contain failed login (case-insensitive) anywhere in their raw text:

# Event 1
Mar 19 12:01:44 auth0 sshd[2041]: failed login for user admin from 10.0.0.5

# Event 2
2026-03-19T08:15:00Z WARN Failed Login attempt on account jdoe
__inputId = "syslog:my_source_id"

Matches all events from a syslog Source with ID my_source_id.

_raw = "*timeout*" | where __inputId = "open_telemetry:otel"

Matches events from an OpenTelemetry Source with ID otel that contain timeout (case-insensitive) anywhere in their raw text:

# Event 1
{"severity":"ERROR","body":"upstream timeout after 30s","resource":{"service.name":"api-gw"}}

# Event 2
{"severity":"WARN","body":"connection timeout to redis-01:6379","resource":{"service.name":"cache"}}
| where __inputId startswith "cribl_http:"

Matches any event from all Cribl HTTP Sources, regardless of their ID.

_raw = "*sshd*" | search "Failed password" or "Invalid user"

Matches events that contain sshd (case-insensitive) in their raw text and whose full text includes Failed password or Invalid user, or both:

# Event 1
Mar 19 14:22:01 prod-01 sshd[3421]: Failed password for root from 203.0.113.14 port 54321 ssh2

# Event 2
Mar 19 14:22:03 prod-01 sshd[3422]: Invalid user deploy from 198.51.100.7 port 48291

4. Add Custom Datatypes

To handle uncategorized data that’s not covered by the stock v2 Datatypes:

  1. Add a custom v2 Datatype.
  2. Create a Datatype rule that points at your custom Datatype.

Fix Timezone Offsets

Cribl Search parses each event and interprets its timestamp. If the event includes timezone information, Cribl Search uses it. Otherwise, it assumes UTC.

You can deal with timezone offsets at different stages:

Override Cribl Search Processing

Use override fields to bypass the Cribl Search timestamping, Datatyping, or Dataset rules. Add these fields in Cribl Stream, or in any upstream sender, before your events reach Cribl Search.

FieldWhat Cribl Search Does
_timeSkips timestamp parsing and uses this value (in UTC) as the event time.
datatypeSkips Datatype rules and uses the specified Datatype.

If the Datatype doesn’t exist, events fall back to Auto-Datatyping.
isParsedSkips Datatyping entirely and uses the fields already on the event.

Use this when you parse data upstream and don’t want Search to parse it again.
datasetSkips Dataset rules and routes directly to the specified Dataset.

If the Dataset doesn’t exist, routes to main with _dataset_reason = "does not exist".

Next Steps

Now that you’ve shaped your data, organize it into Datasets.