/ / / / /

Federated Search v2

Use v2 Datatypes and Datasets to speed up your federated searches. See what’s currently supported.

Highlights
v2 Datatypes are gradually replacing v1 Datatypes.
Federated Datasets currently support v2 Datatypes for NDJSON, Delimited Text (CSV), and Parquet formats, on S3 and Azure Blob.
Certain limitations apply.

What Are v2 Datatypes?

Next-generation Datatypes in Cribl Search. They power high-speed lakehouse engines, and let you use the new, high-performance architecture for federated queries into object storage like Amazon S3 or Azure Blob.

Whereas v2 Datatypes work with lakehouse engines out of the box, support for federated Datasets is rolling out gradually, and requires some reconfiguration on your part.

For basic info on v2 Datatypes, see v2 Datatypes in Cribl Search.

For info on Datatypes in general, see Datatypes in Cribl Search.

No Action Required For Now

You don’t have to migrate your existing federated Datasets at this point. However, we recommend using v2 Datatypes for new Datasets where supported.

Why Switch from v1 to v2

Use v2 Datatypes and Datasets to get:

Faster queries: Filtering and parsing happen closer to the data, replacing the older rule-chain model.
Clearer mapping: Each path and glob pattern maps directly to a Datatype, so you always know which parsing applies.
Future-proofed: Switching now aligns you with where federated search is going, as well as with high-speed lakehouse engines.

What’s Supported Today

You can use v2 Datasets with:

Cribl-hosted Search Datasets: v2 Datatypes work with lakehouse engines by default.
Amazon S3 and Azure Blob Storage federated Datasets, when formatted as NDJSON, delimited text (CSV), or Parquet (see limitations).
Cribl Lake Datasets: Newline-delimited JSON and Parquet objects in Lake storage, configured with filter rows on the Dataset (see Dataset Type and Filters for Lake Datasets). DDSS Lake Datasets support only v1 Datasets.

For full details on what’s supported, see Current Limitations.

Switch a Federated Dataset from v1 to v2

You can’t directly migrate from v1 to v2, but you can clone your existing v1 Dataset and modify it.

1. Check What’s Supported

Start by reviewing the limitations to understand current support.

2. Clone and Reconfigure Your Dataset

Clone your existing v1 Dataset.
Switch the cloned Dataset’s Type from v1 to v2.
Configure the bucket/container path(s) and other settings.
For Amazon S3, see v2 Dataset Configuration for S3.
For Azure Blob, see v2 Dataset Configuration for Azure Blob.
For Cribl Lake, see Dataset Type and Filters for Lake Datasets.
Confirm with Save.

3. Run a Search to Verify

Run a test search against your new v2 Dataset. If the results include the correct datatype field, the switch worked.

Dataset configuration may be cached briefly after saving, so if results look off at first, wait a moment and re-run.

Current Limitations of Federated Search v2

Support for v2 Datasets on Amazon S3, Azure Blob Storage, and Cribl Lake is expanding gradually. Read on to see the current status.

Supported Data Formats

You can currently use the following data formats with v2 federated Datasets:

JSON Newline Delimited
Delimited Text
Parquet

Other formats aren’t supported yet.

Cribl Lake Formats

For Cribl Lake v2 Datasets, you’ll match newline-delimited JSON and Parquet objects using the Filters table (glob plus Datatype ID), up to two rows for mixed storage. DDSS Lake Datasets require v1 Datasets. See Dataset Type and filters for Lake Datasets.

If _raw contains valid JSON, Cribl Search can extract fields from that JSON. For guidance on when to store only _raw versus sending structured top-level fields, see Structure Events for Cribl Lake and Manage Lake Datasets.

Unavailable v2 Datatype Options for Federated Search

If you configure a v2 Datatype with the following options, you won’t be able to apply that Datatype to federated Datasets:

When you’re creating a federated Dataset, the Cribl Search UI hides any v2 Datatypes that use these configurations.

One Bucket/Container Path per Dataset

Each v2 Dataset on Amazon S3, Azure Blob Storage, or Cribl Lake supports only one bucket/container path. Creating multiple paths via the API will cause search errors.

You can add multiple filters to that path to handle different file types (e.g., JSON and CSV). Filters are applied in order, so place specific patterns before general ones.

Glob Patterns Match the Full Object Path

Glob patterns match the entire path (e.g., bucket/prefix/folder/file.csv), not just the filename. Use recursive patterns like **/*.csv to include files in subfolders.

Cribl Search doesn’t validate patterns on save, so test with a small data subset first.

CSV Headers Are Parsed as Events

There’s no setting to mark a header row. If your CSV files have one, it will appear as the first data event instead of being used as column names.

Federated Search v2 ​

Highlights ​

What Are v2 Datatypes? ​

No Action Required For Now ​

Why Switch from v1 to v2 ​

What’s Supported Today ​

Switch a Federated Dataset from v1 to v2 ​

1. Check What’s Supported ​

2. Clone and Reconfigure Your Dataset ​

3. Run a Search to Verify ​

Current Limitations of Federated Search v2 ​

Supported Data Formats ​

Cribl Lake Formats ​

Unavailable v2 Datatype Options for Federated Search ​

One Bucket/Container Path per Dataset ​

Glob Patterns Match the Full Object Path ​

CSV Headers Are Parsed as Events ​

Common Resources