Single-Instance/​Basic Deployment

Getting started with Cribl Stream on a single instance


For small-volume or light processing environments – or for test or evaluation use cases – a single instance of Cribl Stream might be sufficient to serve all inputs, event processing, and outputs. This page outlines how to implement a single-instance deployment.

This page also provides system requirements and startup procedures for a distributed deployment.

Architecture

Requirements

OS

  • Linux kernel >= 3.10 and glibc >= 2.17.

Example distributions: Ubuntu 16.04, Debian 9, RHEL 7, CentOS Linux 7, 8, or CentOS Stream 9, SUSE Linux Enterprise Server 12, Amazon Linux 2.

Tested so far on Ubuntu (14.04, 16.04, 18.04, and 20.04), CentOS 7.9, and Amazon Linux 2.

System

  • 1 GHz (or faster), 64-bit CPU.
  • 4+ physical cores, 8 GB+ RAM – all beyond your basic OS/VM requirements.
  • 5 GB free disk space (more if persistent queuing is enabled).

Browser

  • The five most-recent versions of Chrome, Firefox, Safari, and Microsoft Edge.

SELinux Support

Here, enforcing mode is supported, but not required.

All quantities listed above are minimum requirements. To fulfill these requirements using cloud-based virtual machines, see Recommended AWS, Azure, and GCP Instance Types.

System Requirements Details

We assume that 1 physical core is equivalent to 2 virtual/​hyperthreaded CPUs (vCPUs) on Intel/Xeon or AMD processors; and to 1 (higher-throughput) vCPU on Graviton2/ARM64 processors. For additional details, see Estimating Number of Cores.

Your total memory allocation per host must accommodate all Worker Processes’ memory usage, plus host OS requirements. Each Worker Process might use up to the maximum heap size, plus some Node.js overhead that isn’t part of heap, plus a third memory draw that will scale upward with configuration details like your type and number of Destinations. Always monitor your Nodes’ memory usage – and especially, check for new requirements after configuration changes like adding new Destinations. For additional details, see Estimating Memory Requirements.

Setting the CRIBL_HOME Environment Variable

The CRIBL_HOME env is available in the Cribl Stream application, but not on your terminal. If you want to use $CRIBL_HOME, you can:

  • Assign it once, using the export command: export CRIBL_HOME=/opt/cribl
  • Set it as a default, by adding it to your to your terminal profile file.

FIPS Mode Requirements

Federal Information Processing Standards FIPS is a set of US government standards and guidelines for information security. You can deploy Cribl Stream in FIPS mode. This mainly restricts the cryptographic algorithms used within Cribl Stream, and also enforces stricter password requirements.

See the FIPS Mode topic for system and password requirements, and instructions for running in FIPS mode.

Installing on Linux

  • Install the package on your instance of choice. Download it here.
  • Ensure that required ports are available (see Network Ports).
  • Un-tar in a directory of choice, e.g., in the /opt/ directory: tar xvzf cribl-<version>-<build>-<arch>.tgz

Installing Cribl Stream and Cribl Edge on the Same Host

You can run an Edge Node on a Cribl Stream Leader Node, or an Edge Node and a Worker Node on the same host. For details, see Installing Cribl Edge and Cribl Stream on the Same Host.

Running

To run Cribl Stream in FIPS mode, do not use the commands below right away; instead, first consult this topic.

Go to the $CRIBL_HOME/bin directory, where the package was extracted (e.g.: /opt/cribl/bin). Here, you can use ./cribl to:

  • Start: ./cribl start
  • Stop: ./cribl stop
  • Reload: ./cribl reload
  • Restart: ./cribl restart
  • Get status: ./cribl status
  • Switch a distributed deployment to single-instance mode: ./cribl mode-single (uses the default address:port 0.0.0.0:9000)

Executing the restart or stop command cancels any currently running collection jobs. For other available commands, see CLI Reference.

Next, go to http://<hostname>:9000 and log in with default credentials (admin:admin). You can now start configuring Cribl Stream with Sources and Destinations, or start creating Routes and Pipelines.

In the case of an API port conflict, the process will retry binding for 10 minutes before exiting.

Shutdown and Restart Sequence

When a Worker Process receives an explicit shutdown command, it follows this sequence:

  1. Shuts down internal system communications: stops receiving any commands from the API Process or distributed Leader.
  2. Shuts down the input Sources.
  3. When the input stream ends, receives a signal event from Cribl Stream’s event processor to flush out any stateful Pipeline Functions (such as Aggregations, Sampling, Dynamic Sampling, and Suppress).
  4. Waits for 10 seconds, to allow data to finish flowing through the streams processing engine. This wait is designed to allow all Destinations to flush out remaining data. However, any data not flushed within this interval – e.g., because of an error on downstream receivers – will be lost.
  5. Exits.

Shutdown/Restart with PQ

Enabling Persistent Queues, on Destinations that support it, generally helps ensure data delivery to your downstream systems. However, note that when a Worker Process restarts, there is a potential for duplicate events to be sent through such Destinations.

This is because PQ doesn’t mark events as safe to discard until they’ve been handed them off to the host OS to send out. So if the Worker Process exits at the final step above before all events have flushed, the final handful of events will not have been marked as committed and re moved. Upon restart, Cribl Stream will still see them, and will resend them.

Enabling Start on Boot

Cribl Stream ships with a CLI utility that can update your system’s configuration to start Cribl Stream at system boot time. The basic format to invoke this utility is:

[sudo] $CRIBL_HOME/bin/cribl boot-start [enable|disable] [options] [args]

You will need to run this command as root, or with sudo. For options and arguments, see the CLI Reference.

Most Linux distributions now use systemd to start processes at boot, while older distributions might still use initd. If you are not sure which service should be configured at startup, check with your Linux administrator. Then follow the corresponding procedure below.

Using systemd

To enable Cribl Stream to start at boot time with systemd, you need to run the boot‑start command. Make sure you first create any user you want to specify to run Cribl Stream. E.g., to run Cribl Stream on boot as existing user cribl, you’d use:

sudo $CRIBL_HOME/bin/cribl boot-start enable -m systemd -u cribl

This will install a unit file (as shown below) named cribl.service, and will start Cribl Stream at boot time as user cribl. A ‑configDir option can be used to specify where to install the unit file. If not specified, this location defaults to /etc/systemd/system/.

If necessary, change ownership for the Cribl Stream installation:

[sudo] chown -R cribl $CRIBL_HOME

Next, use the enable command to ensure that the service starts on system boot:

[sudo] systemctl enable cribl

To disable starting at boot time, run the following command:

sudo $CRIBL_HOME/bin/cribl boot-start disable

Other available systemctl commands are:

systemctl [start|stop|restart|status] cribl

Note the file’s default 65536 hard limit on maximum open file descriptors (known as a ulimit). The minimum recommended value is 65536. Linux tracks this per user account. You can view the current soft ulimit for max open file descriptors with $ ulimit -n while logged in as the same user running the cribl binary.

Installed systemd File
[Unit]
Description=Systemd service file for Cribl Stream.
After=network.target

[Service]
Type=forking
User=cribl
Restart=always
RestartSec=5
LimitNOFILE=65536
PIDFile=/install/path/to/cribl/pid/cribl.pid
ExecStart=/install/path/to/cribl/bin/cribl start
ExecStop=/install/path/to/cribl/bin/cribl stop
ExecReload=/install/path/to/cribl/bin/cribl reload
TimeoutSec=60

[Install]
WantedBy=multi-user.target

Persisting Overrides on systemd

By default, disabling and re-enabling boot start will regenerate the cribl.service file. To persist any overrides – such as proxy or privileged port usage – use this command:

systemctl edit cribl

This opens a text editor that prompts you to enter overrides, then saves them to a persistent file at:

/etc/systemd/system/cribl.service.d/override.conf

Do NOT Run Cribl Stream as Root!

If Cribl Stream needs to listen on low ports 1–1024, it will need privileged access. You can enable this on systemd by adding this configuration key to your override.conf file:

[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE 

If you want to add extra capabilities such as, reading certain resources (e.g., /var/log/*), add CAP_DAC_READ_SEARCH in a space-separated format as follows:

[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH

Using initd

To enable Cribl Stream to start at boot time with initd, you need to run the boot‑start command. If the user that you want to run Cribl Streams does not exist, create it prior to executing. E.g., running Cribl Stream as user cribl on boot:

sudo $CRIBL_HOME/bin/cribl boot-start enable -m initd -u cribl

This will install an init.d script in /etc/init.d/cribl.init.d, and will start Cribl Stream at boot time as user cribl. A ‑configDir option can be used to specify where to install the script. If not specified, this location defaults to /etc/init.d.

If necessary, change ownership for the Cribl Stream installation:

[sudo] chown -R cribl $CRIBL_HOME

To disable starting at boot time, run the following command:

sudo $CRIBL_HOME/bin/cribl boot-start disable

To control Cribl Stream, you can use the following initd commands:

service cribl [start|stop|restart|status]

Persisting Overrides on initd

Notes on preserving required permissions across restarts and upgrades:

Do NOT Run Cribl Stream as Root!

If Cribl Stream is required to listen on ports 1–1024, it will need privileged access. On a Linux system with POSIX capabilities, you can achieve this by adding the CAP_NET_BIND_SERVICE capability. For example:  # setcap cap_net_bind_service=+ep $CRIBL_HOME/bin/cribl

On some OS versions (such as CentOS), you must add an -i switch to the setcap command. For example:  # setcap -i cap_net_bind_service=+ep $CRIBL_HOME/bin/cribl

Important: Upgrading Cribl Stream will remove the CAP_NET_BIND_SERVICE capability from the cribl executable, so you’ll need to re‑run the appropriate setcap command again after each upgrade.

Upon starting the Cribl Stream server, a bind EACCES 0.0.0.0:<port> error in the API or Worker logs (depending on the service) might indicate that setcap did not successfully execute.

System Proxy Configuration

For details on configuring Cribl Stream to send and receive data through proxy servers, see our System Proxy Configuration topic.

Scaling Up

A single-instance installation can be configured to scale up and utilize as many resources on the host as required. See Sizing and Scaling for details.

Anti-Virus Exceptions

If you are running anti-virus software on a Cribl Stream instance’s host OS, here are general guidelines for minimizing accidental blockage of Cribl Stream’s normal operation.

Your overall goals are to prevent the anti-virus software from locking any files while Cribl Stream needs to write to them, and from triggering any changes that Cribl Stream would detect as needing to be committed.

First, if Persistent Queues are enabled on any Destinations, exclude any directories that these Destinations write to. This is especially relevant if you’re writing queues to any custom locations outside of $CRIBL_HOME.

Next, for any non-streaming Destinations that you’ve configured, exclude their staging paths.

Next, exclude these subdirectories of $CRIBL_HOME:

  • state/
  • log/
  • .git/ (usually only exists on Leader Nodes)
  • groups/ (on Leader Nodes)
  • local/ (on Workers or Leader)

Finally, avoid scanning any processes. Except for the queueing/staging directories already listed above, Cribl Stream runs everything in memory, so scanning process memory will slow down Cribl Stream’s processing and reduce throughput.