Managing Datasets
Learn how to create, edit, partition, datatype, and delete Datasets.
This topic primarily covers Datasets that you populate from Cribl Stream, using the Cribl Lake Destination. However, for selected types of data, you have the option to create and populate a Dataset using Cribl Lake Direct Access.
You can also send Cribl Search results to Cribl Lake by using the Search export
operator. Data ingested from Search to Lake will typically have field names and values pre-parsed, and such parsing is a prerequisite to achieving Lakehouse performance boost.
Create a New Dataset
To create a new Dataset in Cribl Lake:
In the sidebar, select Datasets.
Select Add Dataset above the Dataset table.
Enter an ID for the new Dataset and, optionally, a Description.
The identifier can contain letters (lowercase or uppercase), numbers, and underscores, and can be no longer than 512 characters.
You can’t change the identifier once a Dataset is created, so make sure it is meaningful and relevant to the data you plan to store in it.
The Dataset identifier must be unique and must not match any existing Cribl Search Datasets. Duplicate identifiers can cause problems when searching Cribl Lake data.
Define the Retention period. Cribl Lake will store data in the Dataset for the time duration you enter here. The allowed range is anywhere from 1 day to 10 years.
If you plan to link the Dataset to a Lakehouse, we recommend setting the retention period to 30 days or longer, as Lakehouses cover a rolling window of up to 30 days.
Select the Data format: either
JSON
(the default) orParquet
(a columnar storage format ideal for big data analytics). See Parquet for information on how to best search Parquet data.Optionally, select the Lakehouse you want to link to the Dataset. When you select a Lakehouse at this point, the Dataset will become a mirrored Dataset.
Optionally, in Advanced Settings, select up to three Lake Partitions for the Dataset.
If you are linking the Dataset to a Lakehouse, then in Advanced Settings, you can optionally select up to five Lakehouse Indexed fields.
Confirm with Save.

You will now be able to select this new Dataset by using its identifier in a Cribl Lake Destination or Cribl Lake Collector.
Only Organization Owners or Admins can add or modify Cribl Lake Datasets.
Edit a Dataset
To edit an existing Dataset:
- In the sidebar, select Datasets.
- Select the Dataset you want to edit and change the desired information, including description, retention period, partitions, and assigned Lakehouse. You can’t modify the identifier of an existing Dataset.
- Confirm with Save.
Change the Retention Period
If you choose to change an existing Dataset’s retention period, this affects all data in the Dataset.
If you increase the retention period, data currently in the Dataset will adopt the new retention period.
If you decrease the retention period, data older than the new time window will be lost. Make sure that this is your intention before you save the change.
Delete a Dataset
You can delete a Dataset either from the Dataset table, or from an individual Dataset’s page.
To delete a Dataset from the Dataset table:
- In the sidebar, select Datasets.
- Select the check box next to the Dataset(s) you want to delete.
- Select Delete Selected Datasets.
Alternatively, select the Dataset’s row in the Dataset table, and then select Delete.
Scheduled Deletion
A Dataset is not deleted instantly. Instead, once you select it for deletion, it will be marked with “Deletion in progress”. At this point you can no longer edit it, or connect it to Collectors or Destinations. Data that is older than 24 hours will be removed at midnight UTC the following day, while more recent data may be deleted after that time. Once a Dataset is marked for deletion, you are no longer charged for any data that it contains.
You cannot delete built-in Datasets, or Datasets that have any connected Collectors or Destinations.
Lake Partitions and Lakehouse Indexed Fields
Each Dataset can have up to three partitions configured, and, if you’re using Lakehouse, up to five Lakehouse indexed fields, to speed up searching and replaying data from the Lake.
A typical use case for applying partitions and indexed fields is to accelerate hostname
or sourcetype
if you are receiving data from multiple hosts or sources.
You can use partitions and indexed fields both in Search queries and in the filter for runs of the Cribl Lake Collector.
Which Fields to Use as Partitions and Indexed Fields?
To ensure that search and replay work best, make sure you provide broader partitions or Lakehouse indexed fields first. The order in which you configure partitions or indexed fields for a Dataset influences the speed of search and replay of the data.
We also do not recommend partitions or indexed fields that:
- Contain PII (personally identifiable information).
- Contain objects.
- Have a name starting with an underscore (such as
_raw
).
For partitions, additionally avoid using fields that:
- Have high cardinality values.
- Are nullable (that is, can have the value of
null
).
Don’t use
source
ordataset
as partitions or Lakehouse Indexed fields. These are internal fields reserved for Cribl, and are not supported as partitions/indexed fields.
Datatypes
Cribl Lake Datasets, like regular Cribl Search Datasets, can be associated with Datatypes. Datatypes help separate Dataset data into discrete events, timestamp them, and parse them as needed.
To configure Datatypes for a Cribl Lake Dataset:
- On the top bar, select Products, and then select Search.
- In the sidebar, select Data.
- In the list of Datasets select your Cribl Lake Dataset.
- Select the Processing tab.
- In the Datatypes list, add desired Rulesets via the Add Datatype Ruleset button. (The order of Rulesets matters.)
- When the list of Rulesets is complete, confirm with Save.
For more information about using Datatypes and Rulesets in searches, see Cribl Search Datatypes.