Cribl LogStream ‚Äď Docs

Cribl LogStream Documentation

Questions? We'd love to help you! Meet us in #Cribl Community Slack (sign up)
Download entire manual as PDF - v2.3.2

Getting Started Guide

This guide walks you through planning, installing, and configuring a single-instance deployment of Cribl LogStream. You'll capture some realistic sample log data, and then use LogStream's built-in Functions to redact, parse, refine, and shrink the data.

By the end of this guide, you'll have assembled all of LogStream's basic building blocks: a Source, Route, and Pipeline, several Functions, and a Destination. You can complete this tutorial using LogStream's included sample data, without connections to ‚Äď or licenses on ‚Äď any inbound or outbound services.

Assuming a cold start (from initial LogStream download and installation), this guide might take about an hour. But you can work through it in chunks, and LogStream will persist your work between sessions.

ūüĎć

If you've already downloaded, installed, and launched LogStream, skip ahead to Add a Source.

Requirements for this Tutorial

The minimum requirements for running this tutorial are the same as for a LogStream production single-instance deployment.

OS

  • Linux: RedHat, CentOS, Ubuntu, Amazon Linux (64bit)

System

  • +4 physical cores = +8 vCPUs; +8GB RAM ‚Äď all beyond your basic OS/VM requirements
  • 5GB free disk space (more if persistent queuing is enabled)

‚ĄĻ

We assume that 1 physical core is equivalent to 2 virtual/hyperthreaded CPUs (vCPUs). For details, see Recommended AWS, Azure, and GCP Instance Types.

Browser Support

  • Firefox 65+, Chrome 70+, Safari 12+, Microsoft Edge

Network Ports

By default, LogStream listens on the following ports:

Component

Default Port

UI default

9000

HTTP Inbound, default

10080

User options

  • Other data ports as required.

You can override these defaults as needed.

Plan for Production

For higher processing volumes, users typically enable LogStream's Distributed Deployment option. While beyond the scope of this tutorial, that option has a few additional requirements, which we list here for planning purposes:

  • Port 4200 must be available on the Master Node for Workers' communications.
  • Git (1.8.3.1 or higher) must be installed on the Master Node, to manage configuration changes.

See Sizing and Scaling for further details about configuring LogStream to handle large data streams.

Download and Install LogStream

Download the latest version of LogStream at https://cribl.io/download/.

Un-tar the resulting .tgz file in a directory of your choice (e.g., /opt/). Here's general syntax, and a specific example:

tar xvzf cribl-<version>-<build>-<arch>.tgz
tar xvzf cribl-2.3.1-1d4e05c5-linux-x64.tgz

You'll now have LogStream installed in a cribl subdirectory, by default: /opt/cribl/. We'll refer to this cribl subdirectory throughout this documentation as $CRIBL_HOME.

Run LogStream

In your terminal, switch to the $CRIBL_HOME/bin directory (e.g,: /opt/cribl/bin). Here, you can start, top, and verify the LogStream server using these basic ./cribl CLI commands:

  • Start: ./cribl start
  • Stop: ./cribl stop
  • Get status: ./cribl status

ūüĎć

For other available commands, see CLI Reference.

Next, in your browser, open http://<hostname>:9000 (e.g., http://localhost:9000) and log in with default credentials (admin, admin).

Register your copy of LogStream to receive a free decoder ring.

After registering, you'll be prompted to change the default password.

And actually, you don't need the decoder ring! You're now ready to configure a working LogStream installation ‚Äď with a Source, Destination, Pipeline, and Route ‚Äď and to assemble several built-in Functions to refine sample log data.

Get Data Flowing

Add a Source

Each LogStream Source represents a data input. Options include Splunk, Elastic Beats, Kinesis, Kafka, syslog, HTTP, TCP JSON, and others.

For this tutorial, we'll enable a LogStream built-in datagen (i.e., data generator) that generates a stream of realistic sample log data.

Addiing a datagen Source

  1. From LogStream's top menu, select Data > Sources.

  2. From the Data Sources page's tiles or left menu, select Datagens.

    (You can use the search box to jump to the Datagens tile.)

  3. Click Add New to open the New Datagen source pane.

  4. In the Input ID field, name this Source businessevent.

  5. In the Data Generator File drop-down, select businessevent.log.

    This generates...log events for a business scenario. We'll look at their structure shortly, in Capture and Filter Sample Data.

  6. Click Save.

The On slider in the Enabled column indicates that your datagen Source has started generating sample data.

Configuring a datagen Source

Add a Destination

Each LogStream Destination represents a data output. Options include Splunk, Kafka, Kinesis, InfluxDB, Snowflake, Databricks, TCP JSON, and others.

For this tutorial, we'll use LogStream's built-in DevNull Destination. This simply discards events ‚Äď not very exciting! But it simulates a real output, so it provides a configuration-free quick start for testing LogStream setups. It's ideal for our purposes.

To verify that DevNull is enabled, let's walk through setting up a Destination, then setting it up as LogStream's default output:

  1. From LogStream's top menu, select Data > Destinations.

  2. Select DevNull from the Data Destinations page's tiles or left menu.

    (You can use the search box to jump to the DevNull tile.)

  3. On the resulting devnull row, look for the Live indicator under Enabled. This confirms that the DevNull Destination is ready to accept events.

  4. From the Data Destinations page's left nav, select the Default Destination at the top.

  5. On the resulting Manage Default Destination page, verify that the Default Output ID drop-down points to the devnull Destination we just examined.

We've now set up data flow on both sides. Is data flowing? Let's check.

Monitor Data Throughput

From the top menu, select Monitoring. This opens a summary dashboard, where you should see a steady flow of data in and out of LogStream. The left graph shows events in/out. The right graph shows bytes in/out.

Monitoring dashboard

Monitoring displays data from the preceding 24 hours. You can use the Monitoring submenu to open detailed displays of LogStream components, collection jobs and tasks, and LogStream's own internallogs. Click Sources on the lower (white) submenu to switch to this view:

Monitoring Sources

This is a compact display of each Source's inbound events and bytes as a sparkline. You can click each Source's Expand button (highlighted at right) to zoom up detailed graphs.

Click Destinations on the lower submenu. This displays a similar sparklines view, where you can confirm data flow out to the devnull Destination:

Monitoring Destinations

With confidence that we've got data flowing, let's send it through a LogStream Pipeline, where we can add Functions to refine the raw data.

Create a Pipeline

A Pipeline is a stack of LogStream Functions that process data. Pipelines are central to refining your data, and also provide a central LogStream workspace ‚Äď so let's get one going.

  1. From the top menu, select Pipelines.

    You now have a two-pane view, with business on the left and party on the right a Pipelines list on the left and Sample Data controls on the right. (We'll capture some sample data momentarily.)

  2. At the Pipelines pane's upper right, click + Add Pipeline, then select Create Pipeline.

  3. In the new Pipeline's ID field, enter a unique identifier. (For this tutorial, you might use slicendice.)

  4. Optionally, enter a Description of this Pipeline's purpose.

  5. Click Save.

Your empty Pipeline now prompts you to preview data, add Functions, and attach a Route. So let's capture some data to preview.

Pipeline prompt for population

Capture and Filter Sample Data

The right Sample Data pane provides multiple tools for grabbing data from multiple places (inbound streams, copy/paste, and uploaded files); for previewing and testing data transformations as you build them; and for saving and reloading sample files.

Since we've already got live (simulated) data flowing in from the datagen Source we built, let's grab some of that data.

Capture New Data

  1. In the right pane, click Capture New.

  2. In the Capture Sample Data modal, immediately change the generated File Name to a name you'll recognize, like be_raw.log.

  3. Click Capture, then accept the drop-down's defaults ‚Äď click Start.

  4. Click Save as Sample File. This saves to the File Name you entered above. You're now previewing the events in the right pane. (Note that this pane's Preview Simple tab now has focus.)

  5. Click Show more to expand one or more events.

By skimming the key-value pairs within the data's _raw fields, you'll notice the scenario underlying this preview data (provided by the businessevents.log datagen): these are business logs from a mobile-phone provider.

To set up our next step, find at least one marketState K=V pair. Having captured and examined this raw data, let's use this K=V pair to crack open LogStream's most basic data-transformation tool, Filtering.

Filter Data and Manage Sample Files

  1. Click the right pane's Sample Data tab.

  2. Again click Capture New.

  3. In the Capture Sample Data modal, replace the Filter Expression field's default true value with this simple regex: _raw.match(/marketState=TX/)

    We're going to Texas! If you type this in, rather than pasting it, notice how LogStream provides typeahead assist to complete a well-formed JavaScript expression.

    You can also click the Expand button at the Filter Expression field's right edge to open a modal to validate your expression. The adjacent drop-down enables you to restore previously used expressions

  1. Click Capture, then Start.

    Using the Capture drop-down's default limits of 10 seconds and 10 events, you'll notice that with this filter applied, it takes much longer for LogStream to capture 10 matching events.

  2. Click Cancel to discard this filtered data and close the modal.

  3. On the right pane's Sample Data tab, click Simple beside be_raw.log.

This restores our preview of our original, unfiltered capture. We're ready to transform this sample data in more interesting ways, by building out our Pipeline's Functions.

Refine Data with Functions

Functions are pieces of JavaScript code that LogStream invokes on each event that passes through them. By¬†default, this means all events ‚Äď each Function has a Filter field whose value defaults to true. As¬†we just saw with data capture, you can replace this value with an expression that scopes the Function down to particular matching events.

In this Pipeline, we'll use some of LogStream's core Functions to:

  • Redact (mask) sensitive data
  • Extract (parse) the _raw field's key-value pairs as separate fields.
  • Add a new field.
  • Delete the original _raw field, now that we've extracted its contents.
  • Rename a field for better legibility..

Mask: Redact Sensitive Data

In the right Preview pane, notice each that event includes a social key, whose value is a (fictitious) raw Social Security number. Before this data goes any further through our Pipeline, let's use LogStream's Mask Function to swap in an md5 hash of each SSN.

  1. In the left Pipelines pane, click + Add Function.

  2. Search for Mask, then click it.

  3. In the new Function's Masking Rules, click the into Match Regex field.

  4. Enter or paste this regex, which simply looks for digits following social=: (social=)(\d+)

  5. In Replace Expression, paste the following hash function. The backticks are literal: `${g1}${C.Mask.md5(g2)}`

  6. Note that Apply to Fields defaults to _raw. This is what we want to target, so we'll accept this default.

  7. Click Save.

You'll immediately notice some obvious changes:

  • The Preview pane has switched from its IN to its OUT tab, to show you the outbound effect of the Pipeline you just saved.

  • Each event's _raw field has changed color, to indicate that it's undergone some redactions.

Now locate at least one event's Show more link, and click to expand it. You can verify that the social values have now been hashed.

Mask Function and hashed result

Parser: Extract Events

Having redacted sensitive data, we'll next use a Parser function to lift up all the _raw field's key-value pairs as fields:

  1. In the left Pipelines pane, click + Add Function.

  2. Search for Parser, then click it.

  3. Leave the Operation Mode set to its Extract default.

  4. Set the Type to Key=Value Pairs.

  5. Leave the Source Field set to its _raw default.

  6. Click Save.

Parser configured to extract K=V pairs from _raw

You should see the Preview pane instantly light up with a lot more fields, parsed from _raw. You now have rich structured data, but not all of this data is particularly interesting: Note how many fields have NA ("Not Applicable") values. We can enhance the Parser Function to ignore fields with NA values.

  1. In the Function's Fields Filter Expression field (near the bottom), enter this negation expression: value!='NA'

    Note the single-quoted value. If you type (rather than paste) this expression, watch how typeahead matches the first quote you type.

  2. Click Save, and watch the Preview pane.

Filtering the Parser Function to ignore fields with 'NA' values

Several fields should disappear ‚Äď such as credits, EventConversationID, and ReplyTo. The¬†remaining fields should display meaningful values. Congratulations! Your log data is already starting to look better-organized and less bloated.

ūüĎć

Missed It?

If you didn't see the fields change, slide the Parser Function Off, click Save below, and watch the Preview pane change. Using these toggles, you can preserve structure as you test and troubleshoot each Function's effect.

Note that each Function also has a Final toggle, defaulting to Off. Enabling Final anywhere in the Functions stack will prevent data from flowing to any Functions lower in the UI.

Be sure to toggle the Function back On, and click Save again, before you proceed!

Toggling a Function off and on

Next, let's add an extra field, and conditionally infer its value from existing values. We'll also remove the _raw field, now that it's redundant. To add and remove fields, the Eval Function is our pal.

Eval: Add and Remove Fields

Let's assume we want to enrich our data by identifying the manufacturer of a certain popular phone handset. We can infer this from the existing phoneType field that we've lifted up for each event.

Add Field (Enrich)
  1. In the left Pipelines pane, click + Add Function.

  2. Search for Eval, then click it.

  3. Click into the new Function's Evaluate Fields table.

    Here you add new fields to events, defining each field as a key-value pair. If we needed more key-value pairs, we could click + Add Field for more rows.

  4. In Name, enter: phoneCompany.

  5. In Value Expression, enter this JS ternary expression that tests phoneType's value:
    phoneType.startsWith('iPhone') ? 'Apple' : 'Other' (Note the ? and : operators, and the single-quoted values.)

  6. Click Save. Examine some events in the Preview pane, and each should now contain a phoneCompany field that matches its phoneType.

Adding a field to enrich data

Remove Field (Shrink Data)

Now that we've parsed out all of the _raw field's data ‚Äď it can go. Deleting a (large) redundant field will give us cleaner events, and reduced load on downstream resources.

  1. Still in the Eval Function, click into Remove Fields

  2. Type: _raw and press Tab or Enter.

  3. Click Save.

The Preview pane's diff view should now show each event's _raw field stripped out.

Removing a field to streamline data

Our log data has now been cleansed, structured, enriched, and slimmed-down. Let's next look at how to make it more legible, by giving fields simpler names.

Rename: Refine Field Names

  1. In the left Pipelines pane, click + Add Function.

    This rhythm should now be familiar to you.

  2. Search for Rename, then click it.

  3. Click into the new Function's Rename Fields table.

    This has the same structure you saw above in Eval: Each row defines a key-value pair.

  4. In Current Name, enter the longhaired existing field name: conversationId.

  5. In New Name, enter the simplified field name: ID.

  6. Watch any event's conversationId field in the Preview pane as you click Save at left. This field should change to ID in all events.

Drop: Remove Unneeded Events

We've already refined our data substantially. To further slim it down, a Pipeline can entirely remove events that aren't of interest for a particular downstream service.

ūüĎć

As the "Pipeline" name implies, your LogStream installation can have multiple Pipelines, each configured to send out a data stream tailored to a particular Destination. This helps you get the right data in the right places most efficiently.

Here, let's drop all events for customers who use prepaid monthly phone service (i.e., not postpaid):

  1. In the left Pipelines pane, click + Add Function.

  2. Search for Drop, then click it.

  3. Click into the new Function's Filter field.

  4. Replace the default true value with this JS negation expression: accountType!='PostPaid'

  5. Click Save.

Now scroll through the right Preview pane. Depending on your data sample, you should now see multiple events struck out and faded ‚Äď indicating that LogStream will drop them before forwarding the data.

A Second Look at Our Data

Torture the data enough, and it will confess. By what factor have our transformations refined our data's volume? Let's check.

In the right Preview pane, click the Basic Statistics button:

Displaying Basic Statistics

Even without the removal of the _raw field (back in Eval) and the dropped events, you should see a substantial % reduction in the Full Event Length.

Woo hoo! Before we wrap up our configuration: If you're curious about individual Functions' independent contribution to the data reduction shown here, you can test it now. Use the Toggle Off > Save > Basic Statistics sequence to check various changes.

Add and Attach a Route

We've now built a complete, functional Pipeline. But so far, we've tested its effects only on the static data sample we captured earlier. To get dynamic data flowing through a Pipeline, we need to filter that data in, by defining a LogStream Route.

  1. At the Pipelines page's top left, click Attach Pipeline to Route.

    This displays the Routes page. It's structured very similarly to the Pipelines page, so the rhythm here should feel familiar.

  2. Click + Add Route.

  3. Enter a unique, meaningful Route Name, like demo.

  4. Leave the Filter field set to its true default, allowing it to deliver all events.

    Because a Route delivers events to a Pipeline, it offers a first stage of filtering. In production, you'd typically configure each Route to filter events by appropriate source, sourcetype, index, host, _time, or other characteristics. The Filter field accepts JavaScript expressions, including AND (&&) and OR (||) operators.

  5. Set the Pipeline drop-down to our configured slicendice Pipeline.

  6. Set the Output drop-down to either devnull or default.

    This doesn't matter, because we've set default as a pointer to devnull. In production, you'd set this carefully.

  7. You can leave the Description empty, and leave Final set to Yes.

  8. Grab the new Route by its left handle, and drag it above the default Route, so that our new Route will process events first. You should see something like the screenshot below.

  9. Click Save to save the new Route to the routing table.

Configuring and adding a Route

The sparklines should immediately confirm that data is flowing through your new Route:

Live Routes

To confirm data flow through the whole system we've built, select Monitoring > Routes from LogStream's top menu and examine demo.

Monitoring data flow through Routes

Also select Monitoring > Pipelines and examine slicendice.

Monitoring data flow through Pipelines

What Have We Done?

Look at you! Give yourself a pat on the back! In this short, scenic tour ‚Äď with no hit to your cloud-services charges ‚Äď you've build a simple but complete LogStream system, exercising all of its basic components:

  • Downloaded, installed, and run LogStream.
  • Configured a Source to hook up an input.
  • Configured a Destination to feed an output.
  • Monitored data throughput, and checked it twice.
  • Built a Pipeline.
  • Configured LogStream Functions to redact, parse, enrich, trim, rename, and drop event data.
  • Added and attached a Route to get data flowing through our Pipeline.

Next Steps

Interested in guided walk-throughs of more-advanced LogStream features? We suggest that next, you check out:

  • LogStream Sandboxes: Work through general and specific scenarios in containers. with terminal access and free, hosted data inputs and outputs.

  • Use Cases documentation: Bring your own services to build solutions to specific challenges.

  • Cribl Concept: Pipelines ‚Äď Video showing how to build and use Pipelines at multiple LogStream stages.

  • Cribl Concept: Routing ‚Äď Video about using Routes to send different data through different paths.

Cleaning Up

Oh yeah, you've still got the LogStream server running, with its businessevent.log datagen wtill firing events. If you'd like to shut these down for now, in reverse order:

  1. Go to Data > Sources > Datagens.

  2. Slide businessevent to Off, and click Save. (Refer back to the screenshot above.)

  3. In your terminal's $CRIBL_HOME/bin directory, shut down the server with: ./cribl stop

That's it! Enjoy using LogStream.

Updated 3 days ago

Getting Started Guide


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.