Federated Search v2
Use v2 Datatypes and Datasets to speed up your federated searches. See what’s currently supported.
Highlights
- v2 Datatypes are gradually replacing v1 Datatypes.
- Federated Datasets currently support v2 Datatypes for NDJSON, Delimited Text (CSV), and Parquet formats, on S3 and Azure Blob.
- Certain limitations apply.
What Are v2 Datatypes?
Next-generation Datatypes in Cribl Search. They power high-speed lakehouse engines, and let you use the new, high-performance architecture for federated queries into object storage like Amazon S3 or Azure Blob.
Whereas v2 Datatypes work with lakehouse engines out of the box, support for federated Datasets is rolling out gradually, and requires some reconfiguration on your part.
For basic info on v2 Datatypes, see v2 Datatypes in Cribl Search.
For info on Datatypes in general, see Datatypes in Cribl Search.
No Action Required For Now
You don’t have to migrate your existing federated Datasets at this point. However, we recommend using v2 Datatypes for new Datasets where supported.
Why Switch from v1 to v2
Use v2 Datatypes and Datasets to get:
- Faster queries: Filtering and parsing happen closer to the data, replacing the older rule-chain model.
- Clearer mapping: Each path and glob pattern maps directly to a Datatype, so you always know which parsing applies.
- Future-proofed: Switching now aligns you with where federated search is going, as well as with high-speed lakehouse engines.
What’s Supported Today
You can use v2 Datasets with:
- Cribl-hosted Search Datasets: v2 Datatypes work with lakehouse engines by default.
- Amazon S3 and Azure Blob Storage federated Datasets, when formatted as NDJSON, delimited text (CSV), or Parquet (see limitations).
- Cribl Lake Datasets: Newline-delimited JSON and Parquet objects in Lake storage, configured with filter rows on the Dataset (see Dataset Type and Filters for Lake Datasets). DDSS Lake Datasets support only v1 Datasets.
For full details on what’s supported, see Current Limitations.
Switch a Federated Dataset from v1 to v2
You can’t directly migrate from v1 to v2, but you can clone your existing v1 Dataset and modify it.
1. Check What’s Supported
Start by reviewing the limitations to understand current support.
2. Clone and Reconfigure Your Dataset
- Clone your existing v1 Dataset.
- Switch the cloned Dataset’s Type from v1 to v2.
- Configure the bucket/container path(s) and other settings.
For Amazon S3, see v2 Dataset Configuration for S3.
For Azure Blob, see v2 Dataset Configuration for Azure Blob.
For Cribl Lake, see Dataset Type and Filters for Lake Datasets.
- Confirm with Save.
3. Run a Search to Verify
Run a test search against your new v2 Dataset. If the results include the correct datatype field, the switch worked.
Dataset configuration may be cached briefly after saving, so if results look off at first, wait a moment and re-run.
Current Limitations of Federated Search v2
Support for v2 Datasets on Amazon S3, Azure Blob Storage, and Cribl Lake is expanding gradually. Read on to see the current status.
Supported Data Formats
You can currently use the following data formats with v2 federated Datasets:
JSON Newline DelimitedDelimited TextParquet
Other formats aren’t supported yet.
Cribl Lake Formats
For Cribl Lake v2 Datasets, you’ll match newline-delimited JSON and Parquet objects using the Filters table (glob plus Datatype ID), up to two rows for mixed storage. DDSS Lake Datasets require v1 Datasets. See Dataset Type and filters for Lake Datasets.
Unavailable v2 Datatype Options for Federated Search
If you configure a v2 Datatype with the following options, you won’t be able to apply that Datatype to federated Datasets:
When you’re creating a federated Dataset, the Cribl Search UI hides any v2 Datatypes that use these configurations.
One Bucket/Container Path per Dataset
Each v2 Dataset on Amazon S3, Azure Blob Storage, or Cribl Lake supports only one bucket/container path. Creating multiple paths via the API will cause search errors.
You can add multiple filters to that path to handle different file types (e.g., JSON and CSV). Filters are applied in order, so place specific patterns before general ones.
Glob Patterns Match the Full Object Path
Glob patterns match the entire path (e.g., bucket/prefix/folder/file.csv), not just the filename. Use recursive
patterns like **/*.csv to include files in subfolders.
Cribl Search doesn’t validate patterns on save, so test with a small data subset first.
CSV Headers Are Parsed as Events
There’s no setting to mark a header row. If your CSV files have one, it will appear as the first data event instead of being used as column names.