Home / Search/ Troubleshooting/Lakehouse Search Differences

Lakehouse Search Differences

Identify Cribl Search behavior that differs when searching Lakehouses versus standard Cribl Lake Datasets.


When you run Cribl Search queries against a Lakehouse-assigned Dataset, some behavior and results differ from a corresponding search executed against a Cribl Lake Dataset without Lakehouse caching. This page identifies operators, functions, data types, and other details to keep in mind when searching Lakehouses.

You can disable using Lakehouse for a specific query by using the set statement with the lakehouse option set to off.

Unlimited Events with sort

The sort operator’s MaxNoOfOutputEvents parameter defaults to unlimited with Lakehouse, rather than to 10000 events as in a regular search. However, you can set this limit parameter as desired.

Strict Boolean Comparisons

Boolean comparisons in Lakehouse searches are strongly typed. Assume a query condition like field1==field2, in which one field is a boolean, while the other is a boolean-compatible value (such as "yes", "y", "t", "true", or "1" for true; or "no", "f", "false", or "0" for false).

In a mixed comparison – for example, where field1’s value is true and field2’s value is "yes" – note that field1==field2 will not match. (When searching the same data without Lakehouse caching, looser typing allows these values to match.)

Statistical Aggregations with Booleans

A statistical aggregation that aggregates mathematically on a boolean value will return a null in Lakehouse searches. (Running the same aggregation without Lakehouse caching will return a numeric value – either 1 for true or 0 for false.)

This null return value applies to the following functions when their Expression argument takes a boolean value: avg, percentile, stdev, stdevp, sum, variance, and variancep. (A boolean value in these functions’ Predicate argument does not cause a null return value.)

The null return value also occurs with a boolean value in the following functions’ sole Expression argument: avgif, stdevif, sumif, and varianceif.

dcount Excludes null Values

When applying the dcount aggregation function, Lakehouse searches do not count null values toward the total, whereas searches without Lakehouse caching do.

This means that any null values in your data will cause the dcount value to be off by 1, compared to running the same query against the same data without Lakehouse caching (or in a different data store).

Extra null Column with timestats

Applying the timestats aggregation function to a search against a Lakehouse-assigned Dataset might return a null column when only empty values are found. Searches without Lakehouse caching do not return this column.

Higher dedup and eventstats Result Counts

Applying the dedup or eventstats operator to a search against a Lakehouse-assigned Dataset can return a higher result count, compared to searching the same data without Lakehouse caching. This is because dedup and eventstats in Lakehouse mode can process unlimited events, compared to the 50,000-event limit applied to dedup and eventstats in searches without caching.

Different extract type=regex overwrite Behavior

When using the extract operator with type=regex and the overwrite option for existing event fields, the behavior is different with and without Lakehouse caching. Without Lakehouse, if overwrite is set to false, the extraction will convert any existing field values to an array. However, Lakehouse searches will omit any existing field values, regardless of whether overwrite is set to true or false.

For example, if you extract key src_ip with value 10.2.3.3, and the field already exists with value 10.1.2.2, a non-Lakehouse search with ow=false will yield a field value of ["10.1.2.2", "10.2.3.3"]. But a Lakehouse search will yield a field value of 10.2.3.3.

To consistently preserve the field’s original value, you can assign that value to a different variable earlier in the query.