Cribl LogStream – Docs

Cribl LogStream Documentation

Questions? We'd love to help you! Meet us in #Cribl Community Slack (sign up here)
Download entire manual as PDF – v.3.0.1

Sizing and Scaling

A Cribl LogStream installation can be scaled up within a single instance and/or scaled out across multiple instances. Scaling allows for:

  • Increased data volumes of any size.
  • Increased processing complexity.
  • Increased deployment availability.
  • Increased number of destinations.

Scale Up

A LogStream installation can be configured to scale up and utilize as many resources on the host as required. In a single-instance deployment, you govern resource allocation through the global ⚙️ Settings (lower left) > System > Worker Processes section.

In a distributed deployment, you allocate resources per Worker Group. Navigate to Groups > group-name > Settings > Worker Processes.

Either way, these controls are available:

  • Process count: Indicates the number of Worker Processes to spawn. Positive numbers specify an absolute number of Workers. Negative numbers specify the number of Workers relative to the number of CPUs in the system. like this:
    {number of CPUs available minus this setting}. The default is -2.
    A 0 setting is interpreted as 1 Worker Process. (LogStream corrects for excessive negative offsets by guaranteeing at least 1 Process.)

  • Memory (MB): Amount of memory available to each Worker Process, in MB. Defaults to 2048. (See Estimating Memory Requirements below.)

📘

Throughout these guidelines, we assume that 1 physical core is equivalent to 2 virtual/hyperthreaded CPUs (vCPUs). Each LogStream instance requires the following resources to run:

For example, assuming a Cribl LogStream system with 6 physical cores (12 vCPUs):

  • If Process count is set to 4, then the system will spawn exactly 4 processes.
  • If Process count is set to -2, then the system will spawn 10 processes (12-2).

See Capacity and Performance Considerations below for CPU utilization.

📘

LogStream incorporates guardrails that prevent spawning more processes than available vCPUs.

It's important to understand that Worker Processes operate in parallel, i.e., independently of each other. This means that:

  1. Data coming in on a single connection will be handled by a single Worker Process. To get the full benefits of multiple Worker Processes, data should come in over multiple connections.

    E.g., it's better to have 5 connections to TCP 514, each bringing in 200GB/day, than one at 1TB/day.

  2. Each Worker Process will maintain and manage its own outputs. E.g., if an instance with 2 Worker Processes is configured with a Splunk output, then the Splunk destination will see 2 inbound connections.

Capacity and Performance Considerations

As with most data processing applications, Cribl LogStream's expected resource utilization will be commensurate with the type of processing that is occurring. For instance, a function that adds a static field on an event will likely perform faster than one that applies a regex to finding and replacing a string. At the time of this writing:

  • A Worker Process will utilize up to 1 physical core, or 2 vCPUs.
  • Processing performance is proportional to CPU clock speed.
  • All processing happens in-memory.
  • Processing does not require significant disk allocation.

Estimating Core Requirements

Current guidance for capacity planning is: Allocate 1 physical core for each 400GB/day of IN+OUT throughput. So, to estimate the number of cores needed: Sum your expected input and output volume, then divide by 400GB.

  • Example 1: 100GB IN -> 100GB out to each of 3 destinations = 400GB total = 1 physical core.
  • Example 2: 3TB IN -> 1TB out = 4TB total = 10 physical cores.
  • Example 3: 4 TB IN -> full 4TB to Destination A, plus 2 TB to Destination B = 10TB total = 25 physical cores.

Estimating Memory Requirements

The general guideline for memory allocation is to start with the default 2048 MB (2 GB) per Worker Process, and then add more memory as you find that you're hitting limits.

Memory use is consumed per component, per Worker Process, as follows:

  1. Lookups are loaded into memory.
  2. Memory is allocated to in-memory buffers to hold data to be delivered to downstream services.
  3. Stateful Functions (Aggregations and Suppress) consume memory proportional to the rate of data throughput.
  4. The Aggregations Function's memory consumption further increases with the number of Group by's.
  5. The Suppress Function's memory use further increases with the cardinality of events matching the Key expression. A higher rate of distinct event values will consume more memory.

Recommended AWS, Azure, and GCP Instance Types

You could meet the requirement above with multiples of the following instances:

AWS – Compute Optimized Instances. For other options, see here.

Minimum

Recommended

c5d.2xlarge (4 physical cores, 8vCPUs)
c5.2xlarge (4 physical cores, 8vCPUs)

c5d.4xlarge or higher (8 physical cores, 16vCPUs)
c5.4xlarge or higher (8 physical cores, 16vCPUs)

Azure – Compute Optimized Instances

Minimum

Recommended

Standard_F8s_v2 (4 physical cores, 8vCPUs)

Standard_F16s_v2 or higher (8 physical cores, 16vCPUs)

GCP – Compute Optimized Instances

Minimum

Recommended

c2-standard-8 (4 physical cores, 8vCPUs)
n2-standard-8 (4 physical cores, 8vCPUs)

c2-standard-16 or higher (8 physical cores, 16vCPUs)
n2-standard-16 or higher (8 physical cores, 16vCPUs)

In all cases, reserve at least 5GB disk storage per instance, and more if persistent queuing is enabled.

Scale Out

When data volume, processing needs, or other requirements exceed what a single instance can sustain, a Cribl LogStream deployment can span multiple nodes. This is known as a distributed deployment, and it can be configured and managed centrally by a single master instance. See Distributed Deployment for more details.

Updated 29 days ago

Sizing and Scaling


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.