Home / Search/ Search Your Data/Common Query Examples

Common Query Examples

Learn common ways of searching data.


Show n Rows

Let’s see some data. What’s in a random sample of five rows?

dataset="cribl_search_sample"
 | take 5

Select a Subset of Fields

Use project to pick out only the fields you want.

dataset="cribl_search_sample"
 | project method, source, status, url

Returns:

methodsourcestatusurl
GETfile://opt/cribl/log/access.log200/api/v1/health

Filter by Expression

We’ll use an expression to find specific field and value matches. Let’s see only GET events in the access.log:

dataset="cribl_internal_logs" source="*/access.log" method="GET"
 | project method, source, status, url

Returns:

methodsourcestatusurl
GETfile://opt/cribl/log/access.log200/api/v1/health
GETfile://opt/cribl/log/access.log

Compute Derived Fields

Create a new field by computing a value in every row:

dataset="cribl_search_sample"
 | limit 5
 | extend query=query()

This query uses the context function query().

It’s possible to reuse a column name and assign a calculation result to the same column.

Example:

 | extend x = x + 1, y = x
 | extend x = x + 1

Aggregate Groups of Rows

To count the number of events by src we’ll use the count function.

dataset="cribl_search_sample"
 | limit 1000
 | summarize cnt=count() by srcaddr

Returns:

srccnt
10.255.0.2482978

The summarize operator groups together rows that have the same values in the by clause, and then uses an aggregation function (for example, count) to combine each group in a single row. In this case, there’s a row for each src and a field for the count of rows in that src.

You can use several aggregation functions in one summarize operator to produce several computed columns. For example, we could get the count of events per src, and the sum of unique types of events per src.

dataset="cribl_search_sample" dataSource="access*"
 | limit 1000
 | summarize eventCount=count(), useragentCount=dcount(useragent) by clientip

Returns:

srceventCountmessage
10.255.0.24856102

In the results of a summarize operator:

  • Each field is named in the by clause.
  • Each computed expression has a field.
  • Each combination of by values has a row.

Display Multiple Series

Use multiple values in a summarize by clause to create a separate row for each combination of values:

dataset="cribl_search_sample" dataSource="access_combined"
 | limit 1000
 | summarize count() by host, clientip

Display a Time Series

The timestats operator aggregates events by time periods or bins, which is excellent for displaying a time series. Here we slice the results into one-minute sections, or bins:

dataset="cribl_search_sample"
 | limit 1000
 | timestats span=1m

Display Results as Events or Tables

Use the render operator to display results as a list of events or a table.

To display results as a list of events under the Events tab:

dataset=$vt_dummy event<100
 | extend parity=iif(event%2==0, 'even', 'odd')
 | project event, parity
 | render table

To display results as a table under the Chart tab:

dataset=$vt_dummy event<100
 | extend parity=iif(event%2==0, 'even', 'odd')
 | project event, parity
 | render event

Search Cribl Lake

You can search Cribl Lake Datasets the same way you use regular Cribl Search Datasets:

dataset="my_lake_dataset"

See Searching Cribl Lake for more examples.

Search Arrays

In this example, arr1 is an array of strings, and arr2 is an array of numbers:

dataset="ClickhouseData"
arr1[0]='s1'
arr2[0]=1
| limit 1000

As shown here, you can reference specific elements in an array – using zero-based index notation – to retrieve only the data you specify. Your query must specify the field name or data point at each array index. (Cribl Search does not support querying on the index to discover array members.)

Where a Dataset Provider returns array columns as strings, Cribl Search parses these columns back to arrays.

Search Nested JSON Objects

In this example, bar is a field nested within three levels of JSON objects:

dataset="my_warehouse" json1.obj2.nestedObj2.nestedObj3='bar'
 | limit 1000

Subqueries

Subqueries allow you to generate different sets of results in a single search. This lets you perform dynamic lookups with no need to create static, stored lookup tables.

You can use subqueries to append one set of results to another, through the union operator. You can also use them to merge data coming from different Datasets and Dataset Providers, with the help of the join operator.

There are two types of subqueries:

Inline Subqueries

An inline subquery is nested inside another query, using round brackets ().

You may find inline subqueries easier to write than let subqueries, but:

  • You can reference their results only as vectors, using union or join.
  • If you opt to use the cribl operator, you must explicitly include it in the inline subquery.
// main query
dataset=$vt_dummy event<10
 | extend foo=42
 | union (
    // inline subquery
    cribl dataset=$vt_dummy event<10
     | extend bar=24
 )

You could achieve the same result as above by using a let subquery:

let subquery = dataset=$vt_dummy event<10
 | extend bar=24;

dataset=$vt_dummy event<10
 | extend foo=42
 | union subquery

The following example uses an inline subquery to join two different scopes of data:

dataset=$vt_dummy event<10
 | extend foo=42
 | join (
    // inline subquery
    cribl dataset=$vt_dummy event<10
     | extend bar=24
 ) on event;

let Subqueries

You define let subqueries at the beginning of a query, using let statements. You can then reference them by name, like this:

// each let statement ends with a semicolon
let x = 1000;

dataset="cribl_search_sample"
 | limit x

Depending on what a let subquery evaluates to, you can reference it as a vector or as a scalar.

Similarly to inline subqueries, you can use let subqueries with the join and union operators:

let subquery = dataset="$vt_dummy" event < 10
 | extend bar=24;

dataset="$vt_dummy" event < 10
 | extend foo=42
 | join subquery on event

let subqueries can also reference one another, for example:

let subquery = dataset="$vt_dummy" event < 1
 | extend bar=24;

dataset="$vt_dummy" event < 10
 | extend foo=42, subqueryBar=subquery.bar

Reference a let Subquery as a Vector

When a let subquery evaluates to a vector, you can use its results to filter the results of your main query, with help of the in/!in/in~/!in~ operators.

Mind that this functionality is not supported by the implicit cribl operator, so you need to use search, find, or where. For example:

// dummy list of blocklisted IPs
let pretendBlocklistedIps = dataset="cribl_search_sample" dataSource="vpcflowlogs"
 | limit 10
 | distinct srcaddr
 | project blocklistedIp=srcaddr;

dataset="cribl_search_sample" dataSource="vpcflowlogs"
 | where srcaddr in pretendBlocklistedIps.blocklistedIp;

Reference a let Subquery as a Scalar

To reference a let subquery as a scalar, mind the following:

  • The subquery needs to produce exactly one event.
  • You must specify a field, using dot notation (subquery1.text). You can also reference a nested field (subquery1.headers.test).
  • The field you specify must be a scalar value, not an object.
  • You can’t specify fields that don’t exist.

See the following example, which calculates the 95th percentile of requests in the Cribl internal logs.

let num_percentile = 95;

let response_time_percentile = dataset="cribl_internal_logs" method status response_time
 | summarize total=count(), response_time=percentile(response_time, num_percentile);

let long_responses = dataset="cribl_internal_logs" method status response_time
 | where response_time > response_time_percentile.response_time
 | summarize count=count(), maxResponseTime=max(response_time);

print strcat('There are ', long_responses.count, ' requests in the ', num_percentile, ' percentile. Total requests were: ', response_time_percentile.total, '. Highest response time was: ', long_responses.maxResponseTime, 'ms');

Subqueries and Time Ranges

A subquery can have its own time range. However, the time range of the main query (either set in the query itself, or configured through the UI) still applies to the final results.

For example, the following search won’t get you any earlysample results, because they’re outside of the time range of the main query:

// subquery time range between 50 and 45 minutes ago
let earlysample = dataset="cribl_search_sample" earliest=-50m@m latest=-45m@m
 | extend stage="one"
 | limit 100;

// main query time range between 10 and 5 minutes ago
dataset="cribl_search_sample" earliest=-10m@m latest=-5m@m
 | extend stage="two"
 | limit 100
 | union earlysample

To go around this, you can apply that second time range to another subquery, and then let the main query contain both time ranges:

let earlysample = dataset="cribl_search_sample" earliest=-50m@m latest=-45m@m
 | extend stage="one"
 | limit 100;

let latesample = dataset="cribl_search_sample" earliest=-10m@m latest=-5m@m
 | extend stage="two"
 | limit 100;

dataset="cribl_search_sample" earliest=-50m@m latest=-5m@m
 | limit 100
 | union earlysample, latesample