On This Page

Home / Search/ Get Data In/Lakehouse Engines in Cribl Search

Lakehouse Engines in Cribl Search

Add a lakehouse engine so you can ingest data directly into Cribl Search.


Highlights
  • Lakehouse engines store and accelerate your data within Cribl Search.
  • Storage auto-scales with your data volume and retention settings.
  • Choose an engine size that covers your raw, uncompressed daily ingest. Resize or add more engines as needed.

Lakehouse Engines Store and Accelerate Data

When you send data into Cribl Search with Sources, it’s parsed by Datatypes and stored in Search Datasets on a lakehouse engine.

A Lakehouse engine is a storage-plus-compute unit that ingests, stores, and accelerates your data inside Cribl Search. It can keep your data hot for up to 10 years, with no storage tiering to manage.

Each lakehouse engine:

  • Accepts data from one or more Sources.
  • Uses Datatypes to break events into fields.
  • Stores data in Search Datasets until their retention expires.
  • Powers fast, schema-aware searches and AI workflows over that stored data.

Lakehouse Engine Sizes

To choose the right lakehouse engine size, think about the amount of raw, uncompressed data you expect per day, then include headroom for spikes and growth.

If your ingest rate changes, or you experience ingest or search latency, you can resize your lakehouse engine. If the available sizes are not enough, you can add more lakehouse engines to distribute the workload.

Lakehouse Engine Sizes Available

The ingest rate limit applies only to raw incoming data, not to fields you add or transform during processing.

Lakehouse Engine SizeMaximum Ingest per Day
Nano75 GB
Micro150 GB
X-Small300 GB
Small600 GB
Medium1,200 GB
Large2,400 GB
X-Large4,800 GB
2X-Large9,600 GB
3X-Large
Contact Support
14 TB
4X-Large
Contact Support
19 TB
5X-Large
Contact Support
24 TB
6X-Large
Contact Support
28 TB

Lakehouse Engine Compression Ratio

Cribl Search compresses ingested data at rest. The exact compression ratio depends on many factors, including the shape and content of your events, but it typically falls between 10:1 and 12:1.

Lakehouse Engine Billing

Because engine size acts as a hard limit on ingest, your costs are bounded, with no surprises from traffic spikes. You can scale your lakehouse engine up or down at any time to match your actual data needs.

With each lakehouse engine, you’re charged for two things:

ComponentBilling BasisHow It’s Measured
Engine sizeMaximum data ingest per day.
Measured at ingest, before compression, separately from any upstream Stream or Edge processing.
StorageAmount of data retained over time.

This auto-scales with your data volume and retention periods.
Measured after compression.

Estimated compression ratio is 10:1 to 12:1.

To estimate and optimize storage, set individual retention periods of your Search Datasets. See Plan Your Search Datasets for details.

To see how engine size and storage translate to costs, see Cribl Search Pricing.

See also: How Lakehouse and Federated Engines Are Billed.

Ingest Is Measured at the Engine

A Lakehouse engine counts every byte it receives, regardless of any processing that happened upstream. The engine has no visibility into what your data looked like before it arrived.

This matters most when you process data in Cribl Stream or Cribl Edge before sending it to a Lakehouse engine. Common processing patterns can inflate the byte count that the engine sees:

  • Parsing raw events into structured formats like JSON or CSV.
  • Enriching events with extra fields or lookup values.
  • Keeping a copy of _raw alongside the parsed event.

For example, if Stream receives 1 GB of raw data and reshapes it into a more verbose format, the Lakehouse engine can end up ingesting 2 GB or 3 GB. Stream’s upstream ingest number and the engine’s ingest number reflect different things, so they don’t have to match.

If you don’t need Stream or Edge processing before storage, you can send data straight to a Lakehouse engine. Cribl Search has native Sources such as Syslog, Splunk HEC, OpenTelemetry, and Raw HTTP that keep the engine’s ingest count aligned with the raw data volume you expect. Search also parses incoming events through Auto-Datatyping and Custom Datatypes v2, so in many cases you’re better off skipping Stream or Edge pre-processing altogether.

Lakehouse Engine Retention

You don’t set retention for a lakehouse engine as a whole, but for each of its Search Datasets individually. Each Search Dataset can keep data for 1 day to 10 years, and its storage scales accordingly. After the retention period ends, Cribl Search deletes the data.

For details, see Create Search Datasets and Organize Data with Dataset Rules.

Add a New Lakehouse Engine

Search Admins and above can add lakehouse engines from the Cribl Search Engines tab.

  1. On the Cribl.Cloud top bar, select Products > Search > Data.
  2. Select the Engines tab, then Add Engine.
    Adding a lakehouse engine in Cribl Search
    Adding a lakehouse engine in Cribl Search
  3. Give your engine an ID (for example, palo_alto_logs) unique across your Workspace. You won’t be able to change it later.

    The main ID is reserved.

  4. Set the Lakehouse Engine Size. You can resize it later if needed.
  5. Confirm with Save.

When the lakehouse engine status is Ready, you can create Search Datasets and connect your Sources.

Check Lakehouse Engine Status

You can check the status of a lakehouse engine in the Cribl Search Engines tab.

  1. From the Cribl.Cloud top bar, select Products > Search > Data > Engines.
  2. Check the Status column.
StatusMeaning
ProvisioningSetting up the engine.
DelayedSetup is taking longer than expected.
FailedEngine hit an error and can’t recover.
ReadyEngine is fully operational.
BlockedEngine is down and trying to recover.
ResizingEngine size is being changed.
TerminatedEngine is being deleted.

Resize a Lakehouse Engine

Search Admins and above can resize lakehouse engines from the Cribl Search Engines tab.

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Engines.
  2. Select the lakehouse engine you want to resize.
  3. Set the new lakehouse engine Size. See Lakehouse Engine Sizes.
  4. Confirm with Save.

Wait until the lakehouse engine status changes from Resizing to Ready again.

Delete a Lakehouse Engine

Search Admins and above can delete lakehouse engines from the Cribl Search Engines tab. This is irreversible.

When you delete a lakehouse engine, here’s what happens to its Search Datasets:

  • All Search Datasets on the engine are removed, along with the data they contain.
  • If the engine hosts the main Dataset, main moves to another lakehouse engine that you pick, but wipes all its data. You keep main’s ID, retention, and schema, but start ingesting data from scratch.

To delete a lakehouse engine:

  1. On the Cribl.Cloud top bar, select Products > Search > Data > Engines.
  2. Select the lakehouse engine you want to delete.
  3. Select Delete Engine.
  4. If the engine hosts the main Dataset, select another lakehouse engine in Move Dataset to, then select Next. If you don’t have another lakehouse engine, add one first.
  5. Type DELETE to confirm, then select Delete Engine again.

Next Steps

Now that your lakehouse engine is ready, create Search Datasets to organize your data and set retention.