Federated Search v2
Use v2 Datatypes and Datasets to speed up your federated searches. See what’s currently supported.
Highlights
- v2 Datatypes are gradually replacing v1 Datatypes.
- Federated Datasets currently support v2 Datatypes for NDJSON and CSV on S3 and Azure Blob.
- Certain limitations apply.
What Are v2 Datatypes?
Next-generation Datatypes in Cribl Search. They power high-speed lakehouse engines, and let you use the new, high-performance architecture for federated queries into object storage like Amazon S3 or Azure Blob.
Whereas v2 Datatypes work with lakehouse engines out of the box, support for federated Datasets is rolling out gradually, and requires some reconfiguration on your part.
For basic info on v2 Datatypes, see v2 Datatypes in Cribl Search.
For info on Datatypes in general, see Datatypes in Cribl Search.
No Action Required For Now
You don’t have to migrate your existing federated Datasets at this point. However, we recommend using v2 Datatypes for new Datasets where supported.
Why Switch from v1 to v2
Use v2 Datatypes and Datasets to get:
- Faster queries: Filtering and parsing happen closer to the data, replacing the older rule-chain model.
- Clearer mapping: Each path and glob pattern maps directly to a Datatype, so you always know which parsing applies.
- Future-proofed: Switching now aligns you with where federated search is going, as well as with high-speed lakehouse engines.
What’s Supported Today
As of Cribl Search 4.17.0, you can use v2 Datatypes with Amazon S3 and Azure Blob Storage Datasets. Supported data formats are limited to NDJSON and delimited text.
For full details on what’s supported, see Current Limitations.
Switch a Federated Dataset from v1 to v2
You can’t directly migrate from v1 to v2, but you can clone your existing v1 Dataset and modify it.
1. Check What’s Supported
Start by reviewing the limitations to understand current support.
2. Clone and Reconfigure Your Dataset
- Clone your existing v1 Dataset.
- Switch the cloned Dataset’s Type from v1 to v2.
- Configure the bucket/container path(s) and other settings.
For Amazon S3, see v2 Dataset Configuration for S3.
For Azure Blob, see v2 Dataset Configuration for Azure Blob.
- Confirm with Save.
3. Verify by Running a Search
Run a test search against your new v2 Dataset. If the results include the correct datatype field, the switch worked.
Dataset configuration may be cached briefly after saving, so if results look off at first, wait a moment and re-run.
Current Limitations of Federated Search v2
Support for v2 federated Datasets is expanding gradually. Read on to see the current status.
Only NDJSON and Delimited Text Are Supported
You can currently use only JSON Newline Delimited and Delimited Text data formats. Other
formats, such as Parquet, are not supported.
Cribl Lake Datasets Don’t Support v2
You can’t currently use v2 Datatypes with Cribl Lake Datasets, even for JSON formats.
Some v2 Datatype Options Are Not Available for Federated Search
If you configure a v2 Datatype with the following options, you won’t be able to apply that Datatype to federated Datasets:
When you’re creating federated Dataset, the Cribl Search UI hides any v2 Datatypes that use these configurations.
One Bucket/Container Path per Dataset
Each v2 federated Dataset supports only one bucket/container path. Creating multiple paths via the API will cause search errors.
You can add multiple filters to that path to handle different file types (e.g., JSON and CSV). Filters are applied in order, so place specific patterns before general ones.
Glob Patterns Match the Full Object Path
Glob patterns match the entire path (e.g., bucket/prefix/folder/file.csv), not just the filename. Use recursive
patterns like **/*.csv to include files in subfolders.
Cribl Search doesn’t validate patterns on save, so test with a small data subset first.
CSV Headers Are Parsed as Events
There’s no setting to mark a header row. If your CSV files have one, it will appear as the first data event instead of being used as column names.