Lakehouse Search Differences
Identify Cribl Search operators, functions, and data types that behave differently when searching Lakehouses.
When you run Cribl Search queries against a Lakehouse-assigned Dataset, some behavior and results differ from a corresponding search executed against a Cribl Lake Dataset without Lakehouse caching. This page identifies operators, functions, data types, and other details to keep in mind when searching Lakehouses.
Unlimited Events with sort
The sort
operator’s MaxNoOfOutputEvents
parameter defaults to unlimited with Lakehouse, rather than to 10000
events as in a regular search. However, you can set this limit parameter as desired.
Strict Boolean Comparisons
Boolean comparisons in Lakehouse searches are strongly typed. Assume a query condition like field1==field2
, in which one field is a boolean, while the other is a boolean-compatible value (such as "yes"
, "y"
, "t"
, "true"
, or "1"
for true
; or "no"
, "f",
"false"
, or "0"
for false
).
In a mixed comparison – for example, where field1
’s value is true
and field2
’s value is "yes"
– note that field1==field2
will not match. (When searching the same data without Lakehouse caching, looser typing allows these values to match.)
Statistical Aggregations with Booleans
A statistical aggregation that aggregates mathematically on a boolean value will return a null
in Lakehouse searches. (Running the same aggregation without Lakehouse caching will return a numeric value – either 1
for true
or 0
for false
.)
This null
return value applies to the following functions when their Expression
argument takes a boolean value: avg
, percentile
, stdev
, stdevp
, sum
, variance
, and variancep
. (A boolean value in these functions’ Predicate
argument does not cause a null
return value.)
The null
return value also occurs with a boolean value in the following functions’ sole Expression
argument: avgif
, stdevif
, sumif
, and varianceif
.
dcount
Excludes null
Values
When applying the dcount
aggregation function, Lakehouse searches do not count null
values toward the total, whereas searches without Lakehouse caching do.
This means that any null
values in your data will cause the dcount
value to be off by 1, compared to running the same query against the same data without Lakehouse caching (or in a different data store).
Extra null
Column with timestats
Applying the timestats
aggregation function to a search against a Lakehouse-assigned Dataset might return a null
column when only empty values are found. Searches without Lakehouse caching do not return this column.
Higher dedup
and eventstats
Result Counts
Applying the dedup
or eventstats
operator to a search against a Lakehouse-assigned Dataset can return a higher result count, compared to searching the same data without Lakehouse caching. This is because dedup
and eventstats
in Lakehouse mode can process unlimited events, compared to the 50,000-event limit applied to dedup
and eventstats
in searches without caching.