Syslog Best Practices

Cribl Stream can process a syslog stream directly. Moving to Cribl Stream from existing syslog-ng or rsyslog servers fully replaces those solutions with one that is fully supported and easily managed.

Processing syslog in Cribl Stream allows you to readily address these common challenges of ingesting data from syslog senders:

  • Architecture: Cribl Stream routes the syslog stream directly, immediately, and securely to the destinations of your choice, reducing latency and management overhead.
  • Volume: Removing redundant data and unnecessary fields in Cribl Stream typically reduces volume 20–30% overall. It also optimizes the data for downstream services like Splunk or Elasticsearch.
  • Timestamp handling: Cribl Stream intelligently processes events sent from different time zones. It can embed a new, consistent timestamp, and can auto-correct timestamps that are off by an exact number of hours.
  • Severity/Facility accuracy: Each syslog event begins with a bracketed integer that represents Facility and Severity, as defined in the Syslog Protocol. Cribl Stream translates this code (e.g., <165>, <0>) into the correct Facility and Severity values.
  • Metadata: Cribl Stream can automatically set metadata fields, including sourcetype and index.

This tutorial outlines best practices for replacing your syslog server with Cribl Stream. To go even a bit deeper, check out this Cribl Office Hours video.

The Goal: Optimizing Syslog Events

By default, a Cribl Stream Syslog Source produces eight fields: _time, appname, facility (numeric), facilityName (text), host, message, severity (numeric), and severityName (text).

Default, parsed syslog event
Default, parsed syslog event

While this default parsing makes the output much more readable, we haven’t saved any volume – and we now have redundant pairs of fields (numeric versus text) that represent facility and severity.

Our next logical step is to streamline syslog events to something more like this:

Default, parsed syslog event
Default, parsed syslog event

This accomplishes all of the following:

  • Extracts the essentials.
  • Removes the redundancies.
  • Adds a new field to identify the Cribl Stream Pipeline (which we’re about to build).
  • Adds metadata that the Destination needs.
  • Shrinks the outbound _raw payload to just its message component.

Once we optimize syslog in this way, we can achieve still further efficiencies by dropping or downsampling frequent events, and by balancing high-volume syslog inputs across Cribl Stream worker processes.

Overall Architecture

Syslog data, especially when sent via UDP, is best collected as close to the source as possible. Ideally, you should capture syslog data at its origin. (This is true of syslog in general, not just syslog processed in Cribl Stream.)

Also, because syslog senders have no built-in load balancing, Cribl strongly recommends using a load balancer to distribute the load across multiple Worker Nodes.

Example architecture for a single site or location
Example architecture for a single site or location

The load balancer shown above is dedicated only to communication between the syslog senders and the Stream Worker Nodes. Any load balancing between the Worker Group and its Stream Leader would be handled by a separate load balancer.

Load Balancing

When configuring a load balancer to be fed by syslog senders, start with these basic principles:

Avoid Stickiness

In this context, stickiness is when traffic from a given source is always sent to the same destination IP. The optimal configuration of the load balancer avoids “sticky” behavior, and instead spreads the workload across Cribl Stream Worker Nodes as evenly as possible.

Use API Calls to Do Health Checks

If UDP data is being sent, the load balancer has no way to automatically detect whether the destination is up. Configure the load balancer to use API calls to the Worker Nodes to check the health status of each Node (see Health Endpoint).

Listen on Port 514

If possible, configure the load balancer to listen on port 514, and then relay traffic to the Worker Nodes on port 9514. Here’s why:

  • Many syslog senders are hard-coded to send only to port 514, so you need to support them.
  • Cribl Stream itself should not run as root, and therefore cannot listen on ports lower than 1024 without additional OS-level steps (like SETCAP).

A Port for Each Device Class

As explained above, for syslog devices that can send only to port 514, you can configure the load balancer to relay to destination port 9514. But what about the many syslog senders that do support sending to ports other than 514? It’s a best practice among Cribl customers to use a dedicated receiving port for each class of device in their environment. While this takes a little more effort to set up, it enables you to hard-code metadata (and optionally, Pipelines) for the events from that data source. Examples include:

  • 1517: VMware ESXi logs.
  • 1521: Palo Alto Firewall.
  • 1522: F5 load balancers.

We’ll see how to set per-port metadata in the Adding Processing Pipelines section.

UDP Versus TCP

For any given syslog device, you might need to choose between using UDP or TCP. Which is best depends on the characteristics of the sender. Here are some basic guidelines:

  • For single, high-volume senders (over 300GB/day), use UDP if possible. For both the sender and Cribl Stream, UDP imposes lower overhead than TCP. The stateless nature of UDP allows each log event to be directed to a different worker thread than the last. This ensures maximum utilization of the Worker Nodes. See Sizing and Scaling for more details.
  • For lower-volume senders, use TCP if the sender supports it.
  • For all other use cases, use UDP.

Pipeline Planning

In Cribl Stream, we speak of pre-processing Pipelines, processing Pipelines (or just plain Pipelines), and post-processing Pipelines. Cribl recommends combining the first two in an arrangement that we’ve found to be optimal for syslog.

Pre-Processing Syslog Sources

Attach a pre-processing Pipeline to most (or all) of your Syslog Sources. Use the pre-processing Pipeline to apply the same set of ingest-time processing that all syslog events will need, regardless of what subsequently happens on different subsets of the data.

Syslog data is a classic example of where a pre-processing Pipeline is useful: Unlike processing Pipelines, pre-processing Pipelines attach directly to a Source, enabling you to standardize or normalize what comes in. This way, you avoid having to implement that same functionality and logic separately in each processing Pipeline associated with a Syslog Source.

Pipeline for Selected Syslog Data Subsets

Configure dedicated Pipelines (and Routes!) for each distinct subset of data that arrives via syslog senders. These subsets might include DHCP logs from the router, traffic logs from the firewall, operational logs from load balancers, and virtualization logs from on-prem virtualization hypervisors.

Certain kinds of subset-specific processing have become Cribl Stream best practices. These include:

Unless you’re new to Cribl Stream, you’ve already created your own Pipelines, so we’re not going to review that here. (If you are new to Cribl Stream, consider running through the free Cribl Stream Fundamentals sandbox course ASAP.)

Importing the Pre-Processing Pipeline

Even before setting up a Cribl Stream Syslog Source, you’ll want to install the Cribl Pack for Syslog Input (a.k.a. cribl‑syslog‑input Pack), which provides the required pre-processing Pipeline.

If this is your first time installing from the Cribl Dispensary, see the full directions with screenshots. Otherwise:

  1. Open the Cribl Pack Dispenary’s Cribl Pack for Syslog Input page.
  2. Under Releases at right, click the link for the latest release.
  3. In the resulting Assets section, right-click the .crbl file to download it locally.
  4. In Cribl Stream, select a Worker Group, then select Processing > Packs.
  5. Click the Add New button, and select Import from file.
  6. Select the Pack you downloaded, and follow the prompts from there. In most cases, when installing a Dispensary Pack, you should not enter an override value for New Pack ID.
  7. Review the Pack’s README file, available in the Pack’s Settings link and also online.

Let’s examine what this Syslog Input Pack provides, starting with the Routes page.

Routes page in the Cribl Pack for Syslog Input
Routes page in the Cribl Pack for Syslog Input

The first Route matches based on inputId. Anything arriving via a Cribl Stream Syslog Source will match this filter, as long as you’ve configured the Source to use the Pack, as shown below.

On first installation of the Pack, the Pipeline defaults to commonly used configurations. Cribl strongly recommends reviewing the Pipeline’s internal documentation to understand which features are enabled. This documentation consists of Comments for each section, and a Description for each Function to explain what’s happening in that step.

The Pipeline&rsquo;s internal documentation
The Pipeline’s internal documentation

Examining this documentation should make it clear what changes (if any) are needed to suit your deployment environment. Go ahead and make those changes now.

The Lookup File

The Pack ships with a lookup file called SyslogLookup.csv, whose contents you should replace as necessary. To access the file, navigate to Processing > Packs, select the cribl‑syslog‑input Pack, and click Knowledge.

Stock <code>SyslogLookup.csv</code> file, to be filled with customer data
Stock SyslogLookup.csv file, to be filled with customer data

Creating Syslog Sources

Now that you’ve imported the pre-processing Pipeline, the next step is to ensure that you have the Syslog Sources you need.

Configure the in_syslog Source

Cribl Stream ships with a Syslog Source named in_syslog, which is preconfigured to listen for both UDP and TCP traffic on Port 9514. You can clone or directly modify this Source to further configure it, and then enable it.

Use the in_syslog Source for syslog senders that are hard-coded to send to port 514, as described above. In the QuickConnect UI: Click Add Source. From the resulting drawer’s tiles, select [Push >] Syslog. Click Select Existing, then in_syslog.

Or, in the Data Routes UI: From the top nav of a Cribl Stream instance or Group, select Data > Sources. From the resulting page’s tiles or the Sources left nav, select [Push >] Syslog. In the Manage Sources page, click in_syslog.

Configure the fields and options as follows:

General Settings

  • Enabled: Toggle to Yes.
  • UDP Port and TCP Port: Assuming you’re following the guidelines in this tutorial, you’ll have a load balancer relaying incoming traffic from port 514 to port 9514. Therefore, leave the in_syslog Source configured to listen on its default port, 9514, for both TCP and UDP.

TLS Settings (TCP Only)

  • Enabling TLS is strongly recommended if the Source will be receiving data across the Internet.

Processing Settings

  • Metadata: There is no need to add fields here, because for generic senders coming in on 9514, the Pack will set the meta-information via the SyslogLookup.csv file. What you need to do is edit the lookup file to add content appropriate to your deployment.
  • Pre-processing: From the Pipeline drop-down, select PACK: cribl-syslog-input (Syslog Preprocessing).

Create a Source for Each Device Class

As explained above, you should now create a Syslog Source for each vendor/class of syslog sender. Create each Syslog Source as follows:

In the QuickConnect UI: Click Add Source. From the resulting drawer’s tiles, select [Push >] Syslog. Click Add New.

If you use QuickConnect, remember that each Source/Destination pair will be parallel and independent.

Or, in the Data Routes UI: From the top nav of a Cribl Stream instance or Group, select Data > Sources. From the resulting page’s tiles or the Sources left nav, select [Push >] Syslog. Click New Source to open a New Source modal.

Configure the fields and options as follows:

General Settings

  • Input ID: This field specifies the name for the Source, which will appear in the __inputId field as well as on Monitoring pages that show Source information. Common examples include in_syslog_cisco_switch for Cisco switches, in_syslog_f5 for F5 load balancers, and so on.
  • UDP Port and TCP Port: Enter the dedicated port you have chosen for the device class, and use UDP or TCP according to the recommendations above.

TLS Settings (TCP Only)

  • Enabling TLS is strongly recommended if the Source will be receiving data across the Internet.

Processing Settings

  • Metadata: Select Fields (Metadata) and add the fields appropriate for the intended Destination(s), such as sourcetype, index, __timezone, and any other meta-information you want to tie to the sender’s hostname or IP address.
Metadata fields tied to a dedicated-port Syslog source
Metadata fields tied to a dedicated-port Syslog source
  • Pre-processing: From the Pipeline drop-down, select PACK: cribl-syslog-input (Syslog Preprocessing).

Adding Processing Pipelines

Congratulations! You’ve gotten a nice grounding in syslog processing, and you’ve seen methods with which you can:

  • Properly architect the deployment environment, where syslog sender data travels through a load balancer, which then distributes it across a Cribl Stream Worker Group.
  • Import a Pack from the Cribl Pack Dispensary.
  • Create new Cribl Stream Syslog Sources which use the Cribl Pack for Syslog Input.
  • Hard-code meta-information for specific ports, or use the lookup file to map meta-information for specific hosts.

Your logical next step: using Pipelines and Routes, transform syslog data in a way that makes sense for your particular syslog sender(s).

We’ll look at two use cases:

The point is to see how Cribl Stream offers different approaches for different scenarios. For configuring one dedicated Source to receive a given, single dataset, QuickConnect is ideal. But to accommodate a mix of data arriving on the same Source and port, and needing to be divided into subsets, and processed differently by different Pipelines – that is where we need Routes.

Use Case A: QuickConnect a Syslog Sender to a Source

Of the many possible examples, we’ll focus on reducing the volume of VMware ESXi syslog data by dropping events of debug severity.

Data reduction will be significant, because debug severity events can make up 80-90% of syslog data sent from ESXi servers. Not only that: Reducing the data volume reduces the CPU load and storage used to process the data, and searches will respond faster. In a world where we’re searching for needles in a haystack, dropping 80% of the hay makes everything better.

To set this up:

  1. In your Worker Group, navigate to Pipelines, click Add  Pipeline, then click Add Pipeline.
  2. Name the new Pipeline DropNoisySyslog or something similar.
  3. Click Comment and enter a good description of what you’re doing. (This is an important best practice!)
  4. Add a Drop Function with filter severityName=='debug'. (You could also use sampling, or dynamic sampling if you wanted to be fancy about this.)
  5. Click Save and we’re done.

Next, we need to direct VMware ESXi data to our Pipeline. We’ll do this using QuickConnect, which is the fastest way to configure a new Source, tie it to a Destination, and assign pre-processing and processing Pipelines to that Source. (We could also do this with Routes; that simply requires a few more configuration steps.)

In the QuickConnect UI:

  1. Click Add Source.
  2. From the resulting drawer’s tiles, select [Push >] Syslog.
  3. Click Add New.
  4. Configure the fields and options as follows:
    • Name: in_syslog_ESXi.
    • TCP: 1517. This is the TCP port on which ESXi servers send their logs.
    • Pre-Processing > Pack: Select cribl-syslog-input.
    • Fields (metadata): Add a sourcetype field with value esxilogs, and an index field with value vmware‑esxi.
Adding metadata appropriate for ESXi
Adding metadata appropriate for ESXi
  1. Click Save.
  2. On the main QuickConnect page, drag a connector from in_syslog_ESXi to each Destination you want to receive these events. When prompted, select Pipeline, select your DropNoisySyslog, and click Save.
  3. Commit and deploy, and you’re all set!

Use Case B: Route Multiple Syslog Senders to a Source

Let’s imagine you have incoming data from a router. This data is tagged with sourcetype=myrouter, and it includes a mix of DHCP actions, login/authentication attempts, and firewall traffic logs.

Our goal is to send each of the three data subsets of data – DHCP, login, and firewall – to its own Pipeline.

We know that the Cribl Pack for Syslog Input has a Lookup Function that – for data arriving on the default port – should return meta-information such as sourcetype or index. We can combine this metadata with some matching of strings within the data, in Route filters that direct the right data to the right Pipeline.

Although in reality, the myrouter firewall does not exist, the example that follows shows the art of the possible. We’ll spell out the procedures as if the scenario were real. Perhaps one day you’ll use it as a template for an actual deployment.

Create a Pipeline for Each Data Subset

  • Create three Pipelines, naming each one for a data subset: myrouter‑dhcp, myrouter‑auth, and myrouter‑fw‑traffic.
  • Leave the Pipelines empty for now; we’ll add Functions later on.

Create a Route for Each Data Subset

  1. Navigate to Routing > Data Routes.
  2. Click Add Route.
  3. Configure a Route as specified in the table below:
NameFilterPipeline
myrouter-dhcpsourcetype=='myrouter' && raw.match('dhcpd3:')myrouter-dhcp
myrouter-authsourcetype=='myrouter' && raw.match('login')myrouter-auth
myrouter-fw-trafficsourcetype=='myrouter' && raw.match('TRAFFIC')myrouter-fw-traffic
myrouter-othersourcetype=='myrouter'DropNoisySyslog
  1. Repeat the preceding steps until you have configured four new Routes – one for each of the three data subsets, plus a final Route to drop events that don’t match our Filters (i.e., noise).

All Routes should have the Final flag to Yes, and the output set to whatever Destination you think is appropriate for the given data subset.

  1. Click the myrouter-dhcp Route’s ••• (Options) menu, then select Group Actions > Create Group.
  2. Name new Group myrouter.
  3. Drag the Routes you created above into the new Group. (Be sure that myrouter-other is listed last.)
  4. Drag the new Group to whichever vertical position makes the most sense for your deployment.

Once you’ve completed all the above steps, then after collapsing your new Routes, you should see something like this:

Example of Routes in a Group, each tied to the same sender
Example of Routes in a Group, each tied to the same sender

Next, edit each of the myrouter-<data‑subset> pipelines:

  • Use an Eval Function to modify the sourcetype (and perhaps other metadata.)
  • Use appropriate Functions to accomplish the enrichment, suppression, volume reduction, and/or other transformations that the data needs.

Takeaways from This Routing / Pipeline Use Case

  • Routing provides the flexibility needed when dealing with multiple datasets from a single sender, or when a single Source receives multiple datasets.
  • Ensure that you have a dedicated Pipeline (and Route) for each discrete dataset that you want to manipulate in some way (e.g., fw-traffic).
  • Ensure that you have a final Route that matches the general data from the overall sourcetype.
  • Use the Routing page’s Groups option to create logical containers for sets of Routes that belong together.