This guide walks you through planning, installing, and configuring a single-instance deployment of Cribl LogStream. You'll capture some realistic sample log data, and then use LogStream's built-in Functions to redact, parse, refine, and shrink the data.
By the end of this guide, you'll have assembled all of LogStream's basic building blocks: a Source, Route, Pipeline, several Functions, and a Destination. You can complete this tutorial using LogStream's included sample data, without connections to – or licenses on – any inbound or outbound services.
Assuming a cold start (from initial LogStream download and installation), this guide might take about an hour. But you can work through it in chunks, and LogStream will persist your work between sessions.
If you've already downloaded, installed, and launched LogStream, skip ahead to Add a Source.
The minimum requirements for running this tutorial are the same as for a LogStream production single-instance deployment.
- Linux 64-bit kernel >= 3.10 and glibc >= 2.17
- Examples: Ubuntu 16.04, Debian 9, RHEL 7, CentOS 7, SUSE Linux Enterprise Server 12+, Amazon Linux 2014.03+
- Linux 64-bit
- Tested so far on Ubuntu (14.04, 16.04, 18.04, and 20.04), CentOS 7.9, and Amazon Linux 2
- +4 physical cores, +8GB RAM – all beyond your basic OS/VM requirements
- 5GB free disk space (more if persistent queuing is enabled)
We assume that 1 physical core is equivalent to 2 virtual/hyperthreaded CPUs (vCPUs) on Intel/Xeon or AMD processors; and to 1 (higher-throughput) vCPU on Graviton2/ARM64 processors.
- Firefox 65+, Chrome 70+, Safari 12+, Microsoft Edge
By default, LogStream listens on the following ports:
HTTP Inbound, default
You can override these defaults as needed.
For higher processing volumes, users typically enable LogStream's Distributed Deployment option. While beyond the scope of this tutorial, that option has a few additional requirements, which we list here for planning purposes:
4200must be available on the Leader Node for Workers' communications.
- Git (184.108.40.206 or higher) must be installed on the Leader Node, to manage configuration changes.
See Sizing and Scaling for further details about configuring LogStream to handle large data streams.
Download the latest version of LogStream at https://cribl.io/download/.
Un-tar the resulting
.tgz file in a directory of your choice (e.g.,
/opt/). Here's general syntax, then a specific example:
tar xvzf cribl-<version>-<build>-<arch>.tgz tar xvzf cribl-2.3.1-1d4e05c5-linux-x64.tgz
You'll now have LogStream installed in a
cribl subdirectory, by default:
/opt/cribl/. We'll refer to this
cribl subdirectory throughout this documentation as
In your terminal, switch to the
$CRIBL_HOME/bin directory (e.g,:
/opt/cribl/bin). Here, you can start, top, and verify the LogStream server using these basic
./cribl CLI commands:
- Get status:
For other available commands, see CLI Reference.
Next, in your browser, open
http://localhost:9000) and log in with default credentials (
Register your copy of LogStream if desired.
After registering, you'll be prompted to change the default password.
Each LogStream Source represents a data input. Options include Splunk, Elastic Beats, Kinesis, Kafka, syslog, HTTP, TCP JSON, and others.
For this tutorial, we'll enable a LogStream built-in datagen (i.e., data generator) that generates a stream of realistic sample log data.
From LogStream's top menu, select Sources.
From the Data Sources page's tiles or left menu, select Datagen.
(You can use the search box to jump to the Datagen tile.)
Click + Add New to open the New Datagen source pane.
In the Input ID field, name this Source
In the Data Generator File drop-down, select
This generates...log events for a business scenario. We'll look at their structure shortly, in Capture and Filter Sample Data.
The On slider in the Enabled column indicates that your Datagen Source has started generating sample data.
Each LogStream Destination represents a data output. Options include Splunk, Kafka, Kinesis, InfluxDB, Snowflake, Databricks, TCP JSON, and others.
For this tutorial, we'll use LogStream's built-in DevNull Destination. This simply discards events – not very exciting! But it simulates a real output, so it provides a configuration-free quick start for testing LogStream setups. It's ideal for our purposes.
To verify that DevNull is enabled, let's walk through setting up a Destination, then setting it up as LogStream's default output:
From LogStream's top menu, select Destinations.
Select DevNull from the Data Destinations page's tiles or left menu.
(You can use the search box to jump to the DevNull tile.)
On the resulting devnull row, look for the Live indicator under Enabled. This confirms that the DevNull Destination is ready to accept events.
From the Data Destinations page's left nav, select the Default Destination at the top.
On the resulting Manage Default Destination page, verify that the Default Output ID drop-down points to the devnull Destination we just examined.
We've now set up data flow on both sides. Is data flowing? Let's check.
From the top menu, select Monitoring. (On very narrow displays, you might need to select it from the ••• overflow menu.) This opens a summary dashboard, where you should see a steady flow of data in and out of LogStream. The left graph shows events in/out. The right graph shows bytes in/out.
Monitoring displays data from the preceding 24 hours. You can use the Monitoring submenu to open detailed displays of LogStream components, collection jobs and tasks, and LogStream's own internallogs. Click Sources on the lower submenu to switch to this view:
This is a compact display of each Source's inbound events and bytes as a sparkline. You can click each Source's Expand button (highlighted at right) to zoom up detailed graphs.
Click Destinations on the lower submenu. This displays a similar sparklines view, where you can confirm data flow out to the
With confidence that we've got data flowing, let's send it through a LogStream Pipeline, where we can add Functions to refine the raw data.
A Pipeline is a stack of LogStream Functions that process data. Pipelines are central to refining your data, and also provide a central LogStream workspace – so let's get one going.
From the top menu, select Pipelines.
You now have a two-pane view, with
business on the left and party on the righta Pipelines list on the left and Sample Data controls on the right. (We'll capture some sample data momentarily.)
At the Pipelines pane's upper right, click + Pipeline, then select Create Pipeline.
In the new Pipeline's ID field, enter a unique identifier. (For this tutorial, you might use
Optionally, enter a Description of this Pipeline's purpose.
Your empty Pipeline now prompts you to preview data, add Functions, and attach a Route. So let's capture some data to preview.
The right Sample Data pane provides multiple tools for grabbing data from multiple places (inbound streams, copy/paste, and uploaded files); for previewing and testing data transformations as you build them; and for saving and reloading sample files.
Since we've already got live (simulated) data flowing in from the datagen Source we built, let's grab some of that data.
In the right pane, click Capture New.
Click Capture, then accept the drop-down's defaults – click Start.
When the modal finishes populating with events, click Save as Sample File.
In the SAMPLE FILE SETTINGS pop-up, change the generated File Name to a name you'll recognize, like
Click Save. This saves to the File Name you entered above, and closes the modal. You're now previewing the captured events in the right pane. (Note that this pane's Preview Simple tab now has focus.)
Click Show more to expand one or more events.
By skimming the key-value pairs within the data's
_raw fields, you'll notice the scenario underlying this preview data (provided by the
businessevents.log datagen): these are business logs from a mobile-phone provider.
To set up our next step, find at least one
marketState K=V pair. Having captured and examined this raw data, let's use this K=V pair to crack open LogStream's most basic data-transformation tool, Filtering.
Click the right pane's Sample Data tab.
Again click Capture New.
In the Capture Sample Data modal, replace the Filter Expression field's default
truevalue with this simple regex:
You can also click the Expand button at the Filter Expression field's right edge to open a modal to validate your expression. The adjacent drop-down enables you to restore previously used expressions
Click Capture, then Start.
Using the Capture drop-down's default limits of 10 seconds and 10 events, you'll notice that with this filter applied, it takes much longer for LogStream to capture 10 matching events.
Click Cancel (and confirm your selection) to discard this filtered data and close the modal.
On the right pane's Sample Data tab, click Simple beside
This restores our preview of our original, unfiltered capture. We're ready to transform this sample data in more interesting ways, by building out our Pipeline's Functions.
true. As we just saw with data capture, you can replace this value with an expression that scopes the Function down to particular matching events.
In this Pipeline, we'll use some of LogStream's core Functions to:
- Redact (mask) sensitive data
- Extract (parse) the
_rawfield's key-value pairs as separate fields.
- Add a new field.
- Delete the original
_rawfield, now that we've extracted its contents.
- Rename a field for better legibility..
In the right Preview pane, notice each that event includes a social key, whose value is a (fictitious) raw Social Security number. Before this data goes any further through our Pipeline, let's use LogStream's
Mask Function to swap in an md5 hash of each SSN.
In the left Pipelines pane, click
+ + Function.
Mask, then click it.
In the new Function's Masking Rules, click the into Match Regex field.
Enter or paste this regex, which simply looks for digits following
In Replace Expression, paste the following hash function. The backticks are literal:
Note that Apply to Fields defaults to
_raw. This is what we want to target, so we'll accept this default.
You'll immediately notice some obvious changes:
The Preview pane has switched from its IN to its OUT tab, to show you the outbound effect of the Pipeline you just saved.
_rawfield has changed color, to indicate that it's undergone some redactions.
Now locate at least one event's Show more link, and click to expand it. You can verify that the
social values have now been hashed.
Having redacted sensitive data, we'll next use a Parser function to lift up all the
_raw field's key-value pairs as fields:
In the left Pipelines pane, click
Parser, then click it.
Leave the Operation Mode set to its
Set the Type to
Leave the Source Field set to its
You should see the Preview pane instantly light up with a lot more fields, parsed from
_raw. You now have rich structured data, but not all of this data is particularly interesting: Note how many fields have
NA ("Not Applicable") values. We can enhance the Parser Function to ignore fields with
In the Function's Fields Filter Expression field (near the bottom), enter this negation expression:
Note the single-quoted value. If you type (rather than paste) this expression, watch how typeahead matches the first quote you type.
Click Save, and watch the Preview pane.
Several fields should disappear – such as
ReplyTo. The remaining fields should display meaningful values. Congratulations! Your log data is already starting to look better-organized and less bloated.
If you didn't see the fields change, slide the Parser Function Off, click Save below, and watch the Preview pane change. Using these toggles, you can preserve structure as you test and troubleshoot each Function's effect.
Note that each Function also has a Final toggle, defaulting to Off. Enabling Final anywhere in the Functions stack will prevent data from flowing to any Functions lower in the UI.
Be sure to toggle the Function back On, and click Save again, before you proceed!
Next, let's add an extra field, and conditionally infer its value from existing values. We'll also remove the
_raw field, now that it's redundant. To add and remove fields, the Eval Function is our pal.
Let's assume we want to enrich our data by identifying the manufacturer of a certain popular phone handset. We can infer this from the existing
phoneType field that we've lifted up for each event.
In the left Pipelines pane, click
Eval, then click it.
Click into the new Function's Evaluate Fields table.
Here you add new fields to events, defining each field as a key-value pair. If we needed more key-value pairs, we could click
+ Add Fieldfor more rows.
In Name, enter:
In Value Expression, enter this JS ternary expression that tests
phoneType.startsWith('iPhone') ? 'Apple' : 'Other'(Note the
:operators, and the single-quoted values.)
Click Save. Examine some events in the Preview pane, and each should now contain a
phoneCompanyfield that matches its
Now that we've parsed out all of the
_raw field's data – it can go. Deleting a (large) redundant field will give us cleaner events, and reduced load on downstream resources.
Still in the Eval Function, click into Remove Fields
_rawand press Tab or Enter.
The Preview pane's diff view should now show each event's
_raw field stripped out.
Our log data has now been cleansed, structured, enriched, and slimmed-down. Let's next look at how to make it more legible, by giving fields simpler names.
In the left Pipelines pane, click
This rhythm should now be familiar to you.
Rename, then click it.
Click into the new Function's Rename Fields table.
This has the same structure you saw above in Eval: Each row defines a key-value pair.
In Current Name, enter the longhaired existing field name:
In New Name, enter the simplified field name:
Watch any event's
conversationIdfield in the Preview pane as you click Save at left. This field should change to
IDin all events.
We've already refined our data substantially. To further slim it down, a Pipeline can entirely remove events that aren't of interest for a particular downstream service.
As the "Pipeline" name implies, your LogStream installation can have multiple Pipelines, each configured to send out a data stream tailored to a particular Destination. This helps you get the right data in the right places most efficiently.
Here, let's drop all events for customers who use prepaid monthly phone service (i.e., not postpaid):
In the left Pipelines pane, click
Drop, then click it.
Click into the new Function's Filter field.
Replace the default
truevalue with this JS negation expression:
Now scroll through the right Preview pane. Depending on your data sample, you should now see multiple events struck out and faded – indicating that LogStream will drop them before forwarding the data.
Torture the data enough, and it will confess. By what factor have our transformations refined our data's volume? Let's check.
In the right Preview pane, click the Basic Statistics button:
Even without the removal of the
_raw field (back in Eval) and the dropped events, you should see a substantial % reduction in the Full Event Length.
Woo hoo! Before we wrap up our configuration: If you're curious about individual Functions' independent contribution to the data reduction shown here, you can test it now. Use the toggle Off > Save > Basic Statistics sequence to check various changes.
We've now built a complete, functional Pipeline. But so far, we've tested its effects only on the static data sample we captured earlier. To get dynamic data flowing through a Pipeline, we need to filter that data in, by defining a LogStream Route.
At the Pipelines page's top left, click Attach Pipeline to Route.
This displays the Routes page. It's structured very similarly to the Pipelines page, so the rhythm here should feel familiar.
Enter a unique, meaningful Route Name, like
Leave the Filter field set to its
truedefault, allowing it to deliver all events.
Because a Route delivers events to a Pipeline, it offers a first stage of filtering. In production, you'd typically configure each Route to filter events by appropriate
&&) and OR (
Set the Pipeline drop-down to our configured
Set the Output drop-down to either
This doesn't matter, because we've set
defaultas a pointer to
devnull. In production, you'd set this carefully.
You can leave the Description empty, and leave Final set to Yes.
Grab the new Route by its left handle, and drag it above the
defaultRoute, so that our new Route will process events first. You should see something like the screenshot below.
Click Save to save the new Route to the Routing table.
The sparklines should immediately confirm that data is flowing through your new Route:
To confirm data flow through the whole system we've built, select Monitoring > Routes from LogStream's top menu and examine
Also select Monitoring > Pipelines and examine
Look at you! Give yourself a pat on the back! In this short, scenic tour – with no hit to your cloud-services charges – you've build a simple but complete LogStream system, exercising all of its basic components:
- Downloaded, installed, and run LogStream.
- Configured a Source to hook up an input.
- Configured a Destination to feed an output.
- Monitored data throughput, and checked it twice.
- Built a Pipeline.
- Configured LogStream Functions to redact, parse, enrich, trim, rename, and drop event data.
- Added and attached a Route to get data flowing through our Pipeline.
Interested in guided walk-throughs of more-advanced LogStream features? We suggest that next, you check out:
LogStream Sandboxes: Work through general and specific scenarios in containers. with terminal access and free, hosted data inputs and outputs.
Use Cases documentation: Bring your own services to build solutions to specific challenges.
Cribl Concept: Pipelines – Video showing how to build and use Pipelines at multiple LogStream stages.
Cribl Concept: Routing – Video about using Routes to send different data through different paths.
Oh yeah, you've still got the LogStream server running, with its
businessevent.log datagen wtill firing events. If you'd like to shut these down for now, in reverse order:
Go to Data > Sources > Datagens.
businesseventto Off, and click Save. (Refer back to the screenshot above.)
In your terminal's
$CRIBL_HOME/bindirectory, shut down the server with:
That's it! Enjoy using LogStream.
Updated 3 days ago