Cribl Lakehouses
To speed up searching data from Cribl Lake, you can assign specific Lake Datasets to Cribl Lakehouses. Lakehouses are a parallel option to regular Cribl Lake Datasets. When searching Lakehouses, Cribl Search returns results with much shorter latency than when searching regular Lake Datasets.
When you assign a Dataset to a Lakehouse, data ingested into the Dataset is simultaneously sent to a cache storage system, which allows for quicker retrieval. Data sent to the Lakehouse ages out after up to 30 days, calculated from the time of ingest into the Lakehouse.
For examples of how a Lakehouse can enable efficient storage and fast search of high-volume recent telemetry data, see: Lakehouse with VPC Flow Logs, Lakehouse with CloudTrail Logs, and Lakehouse with CloudFront Logs.
Availability
Cribl Lakehouse is available only in Cribl.Cloud Organizations on certain plans.
Lakehouse Sizes
Each Lakehouse has a size. The size defines the amount of data that can be ingested into and stored in the Lakehouse. When sizing your Lakehouse, consider your expected data ingest for Datasets associated with that Lakehouse. Then, select a size based on this estimate. (An undersized Lakehouse will be able to cache fewer days of data.)
The following sizes are available:
Size | Capacity (per day) |
---|---|
Small | 600 GB |
Medium | 1200 GB |
Large | 2400 GB |
XLarge | 4800 GB |
XXLarge | 9600 GB |
3XLarge (Contact Support) | 14 TB |
6XLarge (Contact Support) | 28 TB |
Exceeding Lakehouse Capacity
When data flowing into a Lakehouse exceeds the Lakehouse capacity, this will degrade both ingest speed and search speed. To prevent these impacts, try to avoid sustained excessive loads. If they occur, consider expanding the Lakehouse capacity, as covered in the next section.
Resize a Lakehouse
After creating a Lakehouse, if you decide you need less or more storage capacity, you can resize an existing Lakehouse.
To do so, on the Lakehouses page, select the existing Lakehouse and choose a different size.
When you resize a Lakehouse, reserve some time for its resouces to be reprovisioned. During this period, Lakehouse-cached searching will be unavailable. So any search against Datasets connected to the Lakehouse will proceed at the speed of a regular search. (Cribl Search will also consume credits for CPU time, because these searches are not covered by Lakehouse flat billing.) After the resize operation is complete, your Lakehouse-cached data will be searchable again.
Lakehouse Retention
Each Lakehouse covers a rolling window of up to 30 days, calculated from the time of ingest into the Lakehouse.
Lake Dataset Retention | Lakehouse Retention |
---|---|
1 day | 1 day |
30 days | 30 days |
120 days | 30 days |
Add a New Lakehouse
A Dataset can be assigned to only one Lakehouse. However, a Lakehouse can have more than one Dataset assigned to it.
To add a new Lakehouse, as an Organization Owner:
- In the Cribl Lake sidebar, select Lakehouses.
- Select Add Lakehouse.
- Enter an ID for the new Lakehouse and, optionally, a Description.
- Define the Size of the Lakehouse, based on expected ingest volume.
- You can change the size later.
- There are two larger Lakehouse sizes: 3XLarge and 6XLarge. To use one of these sizes, Contact Support.
The new Lakehouse will be fully active after it has been provisioned, which might take up to an hour.

Link a Lake Dataset to a Lakehouse
You can link one or more Lake Datasets to each Lakehouse. To do so, you need to be an Organization Admin or Lake Admin.
Linking Datasets is possible only after the Lakehouse has been fully provisioned.
You can link Datasets in two ways, influencing how the Dataset will behave in relation to the Lakehouse:
- Link an already existing Dataset, to only cache data ingested into the Dataset in the future.
- Link a new Dataset, to mirror the Dataset’s contents.
Link an Existing Lake Dataset to a Lakehouse
To link an existing Dataset to a Lakehouse:
- In the sidebar, select Datasets.
- On the resulting Datasets page, select the Dataset you want to assign to a Lakehouse.
- In the Lakehouse drop-down, select the Lakehouse you want to assign the Dataset to.
- To get the most out of the Lakehouse, set Dataset’s Retention period to 30 days or longer, as Lakehouses cover a rolling window of up to 30 days.

When you assign an existing Lake Dataset to a Lakehouse, only data newly ingested to that Dataset will be sent to the Lakehouse.
If you link an empty existing Dataset to a Lakehouse, it will behave like a mirrored Dataset.
Link a New Lake Dataset to a Lakehouse
To link a Dataset to a Lakehouse during its creation:
- In the sidebar, select Datasets.
- Select Add Dataset above the Dataset table.
- Configure the Dataset as described in Create a New Dataset.
- In the Lakehouse dropdown, select the Lakehouse you want to link to.
Assigning a Dataset to a Lakehouse during creation results in a mirrored Dataset.
Mirrored Datasets
When you assign a Dataset to a Lakehouse while creating the Dataset, the Lakehouse will mirror the contents of the Dataset.
Data sent to a mirrored Dataset will be ingested into the Lakehouse regardless of the event_time
of the event.
Search will always use the Lakehouse when searching a mirrored Dataset,
unless you explicitly turn it off by setting the lakehouse
option to off
.
Insert Data into a Lakehouse
To populate a Lakehouse, see Prepare Data for Use with Lakehouse.
Delete a Lakehouse
To delete a Lakehouse:
- In the sidebar, select Lakehouses.
- Select the check box next to the Lakehouse(s) you want to delete.
- Select Delete Selected Lakehouses.
A Lakehouse is not deleted instantly. Instead, once you select the Lakehouse for deletion, it will be marked with “Deletion in progress”. At this point, you can no longer edit the Lakehouse, nor connect it to Datasets.
Data that is older than 24 hours will be removed at midnight UTC the following day, while more-recent data might be deleted after that time. Once a Lakehouse is marked for deletion, you are no longer charged to maintain any data that it contains.
When you remove a Lakehouse, its data will still be available to be searched and replayed from its standard Lake Dataset.
Lakehouse Status
A Lakehouse can have one of the following statuses:
Status | Description |
---|---|
Ready | Lakehouse can be linked to Datasets and used. You can resize it. |
Provisioning | Lakehouse is being prepared, you cannot resize it or link it to Datasets. |
Delayed | Provisioning the Lakehouse has encountered an issue, but it will move to Ready status when the issue is resolved. White the Lakehouse is Delayed, you can resize it, but you can’t link it to Datasets. |
Terminated | Lakehouse has been marked for deletion. |
Monitor Lakehouse Usage
To control your usage of Lakehouses, and to make decisions about their potential resizing, you can display charts that present data per Lakehouse.
Select the sidebar’s Monitoring link. On the resulting page, use the drop-down at the upper right to select among Lakehouses. You’ll see the following charts.
- Events: Number of events ingested into the Lakehouse.
- Throughput: Number of bytes ingested into the Lakehouse.
- Queries per Minute: Number of queries performed on the Lakehouse.
- Replication Latency: Latency of ingesting data into the Lakehouse.

On the Lakehouses page, you can see summary data for all configured Lakehouses. Ingestion Rate measures the past 24 hours, compared with each Lakehouse’s configured capacity.
Max Throughput graphically compares actual throughput to a guideline representing the recommended maximum, again based on the configured capacity.
