Crib LogStream - Docs

Getting started with Cribl LogStream

Questions? We'd love to help you! Meet us in #cribl (sign up)
Download manual as PDF - v2.1

    Docs Home

Scaling

A Cribl installation can be scaled up within a single instance and/or out across multiple instances. Scaling allows for:

  • Increased data volumes of any size.
  • Increased processing complexity.
  • Increased deployment availability.

Scale Up


A single instance Cribl installation can be configured to scale up and utilize as many resources on the host as required. Allocation of resources is governed thru Worker Processes Settings section in General Settings.

Memory (MB): Amount of memory available to each worker process, in MB. Defaults to 2048.
Process Count: Indicates the number of worker processes to spawn. Each worker process will utilize up to 1 CPU core. Negative numbers can be used to tie the number of workers relative to the number of CPUs in the system. Any setting less than 1 is interpreted as number of CPUs - setting.

For example, assuming a Cribl LogStream system with 12 CPUs:

  • If Process Count is set to 4, then the system will spawn 4 processes, using up to 4 CPU cores, leaving 8 free.
  • If Process Count is set to -2, then the system will spawn 10 processes (12-2), using up to 10 CPU cores, leaving 2 free.

There are guardrails in place that prevent spawning more processes than CPU cores.

It's important to understand that worker processes operate in parallel, i.e. independently of each other. This means that:

  1. If data comes into a single socket then it will be processed by a single process. I.e. to get the full benefits of multiple worker processes it's important that data comes into from multiple sockets.
    E.g., it is better to have five connections each bringing in 200GB/day than one doing 1TB/day.

  2. Each worker process will maintain and manage its own outputs. E.g., if an instance with 2 worker processes is configured with a Splunk output, then the Splunk destination will see 2 inbound connections.

Capacity and Performance Considerations


Like most data processing applications, Cribl LogStream's expected resource utilization will be commensurate with the type of processing that is occurring. For instance, a function that adds a static ingest-time field on an event will likely perform faster than one that is applying a regex to finding and replace a string. At the time of this writing:

  • A worker process will use 1 CPUs (i.e. 2 vCPUs)
  • All processing happens in-memory
  • Processing does not require significant disk allocation.

Current guidance for capacity planning: 1 CPU for each 250-300GB/day.

Scale Out


When data volume, processing needs or other requirements exceed those that a single instance can sustain, a Cribl LogStream deployment can span multiple nodes. This is known as a Distributed Deployment and it can be configured and managed centrally by a single master instance. See Distributed Deployment section for more details.

Updated 2 months ago

Scaling


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.