Manage Lake Datasets
Create, edit, partition, datatype, and delete Lake Datasets.
This topic primarily covers Lake Datasets that you populate from Cribl Stream, using the Cribl Lake Destination. However, for selected types of data, you have the option to create and populate a Lake Dataset using Cribl Lake Direct Access.
You can also send Cribl Search results to Cribl Lake by using the Search export
operator. Data ingested from Search to Lake will typically have field names and values pre-parsed, and such parsing is a prerequisite to achieving Lakehouse performance boost.
Create a New Lake Dataset
To create a new Lake Dataset:
In the Cribl Lake sidebar, select Datasets.
Select Add Dataset above the Dataset table.
Enter an ID for the new Dataset and, optionally, a Description.
The identifier can contain letters (lowercase or uppercase), numbers, and underscores, and can be no longer than 512 characters.
You can’t change the identifier once a Dataset is created, so make sure it is meaningful and relevant to the data you plan to store in it.
The Lake Dataset identifier must be unique and must not match any existing Cribl Search Datasets. Duplicate identifiers can cause problems when searching Cribl Lake data.
Define the Retention period. Cribl Lake will store data in the Dataset for the time duration you enter here. The allowed range is anywhere from 1 day to 10 years.
If you plan to link the Lake Dataset to a Lakehouse, we recommend setting the Dataset’s retention period to 30 days or longer, as Lakehouses cover a rolling window of up to 30 days.
Select the Data format.
JSON
is the default.Parquet
is a columnar storage format ideal for big data analytics (for tips on searching Parquet data, see Parquet).DDSS
is available for a Dataset that you will populate using Direct Access.Optionally, select the Lakehouse you want to link to the Dataset. When you select a Lakehouse at this point, the Lake Dataset will become a mirrored Dataset. (You cannot select this with a Dataset populated using Direct Access.)
Optionally, in Advanced Settings, select up to three Lake Partitions for the Dataset.
If you are linking the Dataset to a Lakehouse, then in Advanced Settings, you can optionally select up to five Lakehouse Indexed fields.
Confirm with Save.

You will now be able to select this new Dataset by using its identifier in a Cribl Lake Destination or Cribl Lake Collector.
Only Organization Owners or Admins can add (above) or modify (below) Lake Datasets.
Edit a Lake Dataset
To edit an existing Lake Dataset:
- In the Cribl Lake sidebar, select Datasets.
- Select the Dataset you want to edit and change the desired information, including description, retention period, partitions, and assigned Lakehouse. You can’t modify the identifier of an existing Lake Dataset.
- Confirm with Save.
Change a Lake Dataset’s Retention Period
If you change the retention period of an existing Lake Dataset, this affects all data in the Dataset.
If you increase the retention period, data currently in the Dataset will adopt the new retention period.
If you decrease the retention period, data older than the new time window will be lost. Make sure that this is your intention before you save the change.
Delete a Lake Dataset
You can delete a Lake Dataset either from the Dataset table, or from an individual Dataset’s page.
To delete a Lake Dataset from the Dataset table:
- In the Cribl Lake sidebar, select Datasets.
- Select the check box next to the Dataset(s) you want to delete.
- Select Delete Selected Datasets.
Alternatively, select the Dataset’s row in the Dataset table, and then select Delete.
Scheduled Deletion
A Lake Dataset is not deleted instantly. Instead, once you select it for deletion, it will be marked with “Deletion in progress”. At this point you can no longer edit it, or connect it to Collectors or Destinations. Data that is older than 24 hours will be removed at midnight UTC the following day, while more recent data may be deleted after that time. Once a Lake Dataset is marked for deletion, you are no longer charged for any data that it contains.
You cannot delete built-in Lake Datasets, or Lake Datasets that have any connected Collectors or Destinations.
Lake Partitions and Lakehouse-Indexed Fields
Each Lake Dataset can have up to three partitions configured, and, if you’re using a Lakehouse, up to five Lakehouse-indexed fields, to speed up searching and replaying data from Cribl Lake.
A typical use case for applying partitions and indexed fields is to accelerate hostname
or sourcetype
if you are receiving data from multiple hosts or sources.
You can use partitions and indexed fields both in Search queries and in the filter for runs of the Cribl Lake Collector.
Which Fields to Use as Partitions and Indexed Fields?
To ensure that search and replay work best, make sure you provide broader partitions or Lakehouse-indexed fields first. The order in which you configure partitions or indexed fields for a Lake Dataset influences the speed of search and replay of the data.
We also do not recommend partitions or indexed fields that:
- Contain PII (personally identifiable information).
- Contain objects.
- Have a name starting with an underscore (such as
_raw
).
For partitions, additionally avoid using fields that:
- Have high cardinality values.
- Are nullable (that is, can have the value of
null
).
Don’t use
source
ordataset
as partitions or Lakehouse-indexed fields. These are internal fields reserved for Cribl, and are not supported as partitions/indexed fields.
Datatypes for Lake Datasets
Lake Datasets, like regular Cribl Search Datasets, can be associated with Datatypes. Datatypes help separate Dataset data into discrete events, timestamp them, and parse them as needed.
To configure Datatypes for a Lake Dataset:
- On the top bar, select Products, and then select Search.
- In the sidebar, select Data.
- In the list of Datasets select your Lake Dataset.
- Select the Processing tab.
- In the Datatypes list, add desired Rulesets via the Add Datatype Ruleset button. (The order of Rulesets matters.)
- When the list of Rulesets is complete, confirm with Save.
For more information about using Datatypes and Rulesets in searches, see Cribl Search Datatypes.