Let's say that you wanted troubleshoot with and analyze highly verbose/voluminous data, for example, CDN logs, ELB Access Log or VPC Flows but you were concerned about storage requirements and search performance. With Sampling you can bring in enough samples so that your analysis remains statistically significant but you can also do all the troubleshooting necessary.
See the example below or more details: Access Logs and Firewall Logs
Let's use the out-of-the-box Sampling function to sample all events from
status is '200'` 5:1 (and all others 1:1). This should lower the volume of all verbose successes (200s) but still bring in ALL potentially erroneous events (400s, 500s etc) that can be used for troubleshooting.
- First make sure you have a route & pipeline configured to match desired events.
- Next, let's add a Regex Extract function and extract the status field form
_rawand let's call it
__status. Remember, fields that start with
__are special fields in Cribl and can be used anywhere in a pipeline.
Next, let's add a Sampling function, scope it to all events where
sourcetype=='access_combined'. Let's apply a filter condition of
__status == 200 and a Sample Rate of
To confirm that sampling works, compare the event count of all
200s before and after. In addition, each time an event goes thru the Sampling function an index-time field is added to it
sampled::<rate>. You can use that to in your statistical functions as necessary.