Home / Search/ Language Reference/ Functions/ Statistical Functions/dcountif

dcountif

The dcountif aggregation function calculates an estimate of the number of distinct values of Expression of rows for which Predicate evaluates to true.

Use this function with the summarize, eventstats, and timestats operators.

Syntax

    dcountif( Expression, Predicate, [, Accuracy] )

Arguments

  • Expression: Expression used for aggregation calculation. Use any scalar expression that returns a bool value. Wildcards are not supported for field names.
  • Predicate: Expression used to filter rows.
  • Accuracy: Optional. An integer that defines the requested estimation accuracy. If unspecified, the default value is 1.

Results

Returns an estimate of the number of distinct values of Expression of rows for which Predicate evaluates to true in the group.

dcountif could return an error in cases where all or none of the rows pass the Predicate expression.

Example

This example summarizes the estimated cardinality of destination ports 80 and 443, by destination address:

dataset="cribl_search_sample" 
| summarize distinctCountWebPorts=dcountif(dstport, dstport in (80,443)) by dstaddr

Estimation Accuracy

This function uses a variant of the HyperLogLog (HLL) algorithm, which does a stochastic estimation of set cardinality. The algorithm provides a knob that can be used to balance accuracy and execution time per memory size:

AccuracyError (%)Entry count
01.6212
10.8214
20.4216
30.28217
40.2218

The Entry count column is the number of 1-byte counters in the HLL implementation.

The algorithm includes some provisions for doing a perfect count (zero error), if the set cardinality is small enough:

  • When the accuracy level is 1 – 1,000 values are returned
  • When the accuracy level is 2 – 8,000 values are returned

The error bound is probabilistic, not a theoretical bound. The value is the standard deviation of error distribution (the sigma), and 99.7% of the estimations will have a relative error of under 3 x sigma.

The following image shows the probability distribution function of the relative estimation error, in percentages, for all supported accuracy settings:

Search page
Search page