Optimize Searches
Get strategies and tips for optimizing your searches.
The practices on this page help you make searches more efficient and faster.
Constrain Initial Query with Limit, Count, or Time Window
Until you’re certain how large your Dataset is, you should add a limit
, count()
, or time limit to your initial query.
Instead of this:
dataset="cribl_search_sample"
Do this:
dataset="cribl_search_sample" | limit 1000
Or this:
dataset="cribl_search_sample" dataSource="access_combined"
| limit 1000
| summarize count() by host, clientip
Or this, to constrain results to a 2-month window:
dataset="cribl_search_sample" earliest=-2mon
Unstructured Searches Are Fastest
Unstructured term searches will return faster than structured searches. Consider using a broad search (with a limit) in initial investigations.
Instead of this:
dataset="my_dataset" email="jim@company.com" | limit 1000
Do this:
dataset="my_dataset" "jim@company.com" | limit 1000
Place Filters Early in Queries
Try to push filtering expressions to the left-most side of your query wherever possible.
Instead of this:
dataset="<my-datasource>"
| where tenantId == "foo-bar-12345"
| where proc == "bash"
| where data_source == "stdout"
Do this:
dataset="<my-datasource>" tenantId="foo-bar-12345" proc="bash" data_source="stdout"
Place Intensive Operations Last
To minimize costs, try to move functions like lookup
and sort
to the right-most part of a query. First, filter or aggregate your Dataset to reduce the data volume sent to these functions.
Instead of this:
dataset="<my-datasource>" dataSource=”VPC Flow Logs” | lookup service_names on dst_port | summarize count() by service_name
Do this:
dataset="<my-datasource>" dataSource=”VPC Flow Logs” | summarize count() by dst_port | lookup service_names on dst_port
Summarize to Search Faster
Any search that returns a large number of raw events back to the UI will be slower than a summary result set.
Instead of this:
dataset="my_data"
Do this:
dataset="my_data" | summarize count() by cid
Summarize Before Join
When using joins with aggregation operators, it’s more efficient to summarize-then-join rather than join-then-summarize.
Instead of this:
let URLMethods=dataset="access_common_data";
dataset="my-datasource" | join URLMethods on URL | summarize count() by method
Do this:
let URLMethods=dataset="access_common_data" | summarize count() by URL, method;
dataset="my-datasource" | summarize count() by URL, method | join URLMethods on URL
Search Faster with Comma Syntax
Many operators can perform multiple functions simultaneously if you link your query together using commas instead of pipes.
Instead of this:
... | extend field1=”foo” | extend field2=”bar” | extend field3=”pike”
Do this:
... | extend field1=”foo”, field2=”bar”, field3=”pike”
Move Partition Tokens to Queries
Token-based partitions in your Dataset can drastically increase your search time if the directory paths are very broad. If you instead specify the tokens as part of your search, this will reduce the search span to only those sub-trees. Ideally, keep time to the left-most portion of your path, and keep tokens to the right wherever possible, as shown here.
If this is your Dataset definition:
data/${dataSource}/${_time:%Y}/${_time:%m}/${_time:%d}/${_time:%H}
Then instead of this:
dataset="myDataset" | summarize count() by destination | where dataSource=”cisco”
Do this:
dataset="myDataset" dataSource=”cisco” | summarize count() by destination
Send Exclusively to Cribl Stream to Speed Up Large Result Sets
When search results expand beyond several thousand events, sending the results to Cribl Stream via the send
operator is faster than returning events to the Cribl Search UI.
Instead of this:
dataset="my_data" | send tee=true
Do this:
dataset="my_data" | send
Optimize Parquet with project
When searching Parquet files, use project
in the second query clause, to narrow the subsequent expressions to only your fields of interest.
Instead of this:
dataset="a-parquet-datasource" | summarize sum(bytes) by customer, account
Do this:
dataset="a-parquet-datasource" | project bytes, customer, account | summarize sum(bytes) by customer, account