Cribl - Docs

Getting started with Cribl LogStream

Questions? We'd love to help you! Meet us in #cribl (sign up)

Changelog    Guides

Sampling

Sampling at index-time

Let's say that you wanted troubleshoot with and analyze highly verbose/voluminous data, for example, CDN logs, ELB Access Log or VPC Flows but you were concerned about storage requirements and search performance. With Sampling you can bring in enough samples so that your analysis remains statistically significant but you can also do all the troubleshooting necessary.
See the example below or more details: Access Logs and Firewall Logs

Sampling Example

Let's use the out-of-the-box Sampling function to sample all events from sourcetype=='access_combined' where status is '200'` 5:1 (and all others 1:1). This should lower the volume of all verbose successes (200s) but still bring in ALL potentially erroneous events (400s, 500s etc) that can be used for troubleshooting.

  • First make sure you have a route & pipeline configured to match desired events.
  • Next, let's add a Regex Extract function and extract the status field form _raw and let's call it __status. Remember, fields that start with __ are special fields in Cribl and can be used anywhere in a pipeline.

Next, let's add a Sampling function, scope it to all events where sourcetype=='access_combined'. Let's apply a filter condition of __status == 200 and a Sample Rate of 5

To confirm that sampling works, compare the event count of all 200s before and after. In addition, each time an event goes thru the Sampling function an index-time field is added to it sampled::<rate>. You can use that to in your statistical functions as necessary.