/ / / / /

Amazon MSK Destination

Cribl Stream supports sending data to an Amazon Managed Streaming for Apache Kafka (MSK) topic.

Type: Streaming | TLS Support: Configurable | PQ Support: Yes
Kafka uses a binary protocol over TCP. It does not support HTTP proxies, so Cribl Stream must send events directly to receivers. You might need to adjust your firewall rules to allow this traffic.

Configuring Cribl Stream to Output to Amazon MSK

On the top bar, select Products, and then select Cribl Stream. Under Worker Groups, select a Worker Group. Next, you have two options:
- To configure via QuickConnect, navigate to Routing > QuickConnect (Stream) or Collect (Edge). Select Add Destination and select the Destination you want from the list, choosing either Select Existing or Add New.
- To configure via the Routes, select Data > Destinations or More > Destinations (Edge). Select the Destination you want. Next, select Add Destination.
In the New Destination modal, configure the following under General Settings:
- Output ID: Enter a unique name to identify this Amazon MSK Destination definition. If you clone this Destination, Cribl Stream will add -CLONE to the original Output ID.
- Description: Optionally, enter a description.
- Bootstrap servers: List of Kafka bootstrap servers to connect to (for example, localhost:9092).
- Topic: The topic on which to publish events. Can be overwritten using the event’s __topicOut field.
- Region: From the drop-down, select the name of the AWS Region where your Amazon MSK cluster is located.
Next, you can configure the following Optional Settings:
- Acknowledgments: Select the number of required acknowledgments. Defaults to Leader.
- Record data format: Format to use to serialize events before writing to Kafka. Defaults to JSON. When set to Protobuf, the Protobuf Format Settings section (left tab) becomes available.
- Compression: Codec to compress the data before sending to Kafka. Select None, Gzip, Snappy, or LZ4. Defaults to Gzip.
  Cribl strongly recommends enabling compression. Doing so improves Cribl Stream’s performance, enabling faster data transfer using less bandwidth.
- Backpressure behavior: Select whether to block, drop, or queue incoming events when all receivers are exerting backpressure. Defaults to Block.
- Tags: Optionally, add tags that you can use to filter and group Destinations on the Destinations page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
Optionally, you can adjust the Persistent Queue, TLS, Authentication, Processing, Retries, and Advanced settings outlined in the sections below.
Select Save, then Commit & Deploy.

Persistent Queue Settings

The Persistent Queue Settings tab displays when the Backpressure behavior option in General settings is set to Persistent Queue. Persistent queue buffers and preserves incoming events when a downstream Destination has an outage or experiences backpressure.

Before enabling persistent queue, learn more about persistent queue behavior and how to optimize it with your system:

On Cribl-managed Cloud Workers (with an Enterprise plan), this tab exposes only the destructive Clear Persistent Queue button (described at the end of this section). A maximum queue size of 1 GB disk space is automatically allocated per PQ-enabled Destination, per Worker Process. The 1 GB limit is on outbound uncompressed data, and no compression is applied to the queue.
This limit is not configurable. If the queue fills up, Cribl Stream/Edge will block outbound data. To configure the queue size, compression, queue-full fallback behavior, and other options below, use a hybrid Group.

Mode: Use this menu to select when Cribl Stream/Edge engages the persistent queue in response to backpressure events from this Destination. The options are:

Mode	Description
Error	Queues and stores data on a disk only when the Destination is in an error state.
Backpressure	After the Destination has been in a backpressure state for a specified amount of time, Cribl Stream/Edge queues and stores data to a disk until the backpressure event resolves.
Always on	Cribl Stream/Edge immediately queues and stores all data on a disk for all events, even when there is no backpressure.

If a Worker/Edge Node starts with an invalid Mode setting, it automatically switches to Error mode. This might happen if the Worker/Edge Node is running a version that does not support other modes (older than 4.9.0), or if it encounters a nonexistent value in YAML configuration files.

Max file size: The maximum data volume to store in each queue file before closing it. Enter a numeral with units of KB, MB, etc. Defaults to 1 MB.

Max queue size: The maximum amount of disk space that the queue can consume on each Worker Process. When the queue reaches this limit, the Destination stops queueing data and applies the Queue-full behavior. Defaults to 5 GB. This field accepts positive numbers with units of KB, MB, GB, and so on. You can set it as high as 1 TB, unless you’ve configured a different Worker Process PQ size limit on the Group Settings/Fleet Settings page.

Queue file path: The location for the persistent queue files. Defaults to $CRIBL_HOME/state/queues. Cribl Stream/Edge will append /<worker-id>/<output-id> to this value.

Compression: Set the codec to use when compressing the persisted data after closing a file. Defaults to None. Gzip is also available.

Queue-full behavior: Whether to block or drop events when the queue begins to exert backpressure. A queue begins to exert backpressure when the disk is low or at full capacity. This setting has two options:

Block: The output will refuse to accept new data until the receiver is ready. The system will return block signals back to the sender.
Drop new data: Discard all new events until the backpressure event has resolved and the receiver is ready.

Backpressure duration limit: When Mode is set to Backpressure, this setting controls how long to wait during network slowdowns before activating queues. A shorter duration enhances critical data loss prevention, while a longer duration helps avoid unnecessary queue transitions in environments with frequent, brief network fluctuations. The default value is 30 seconds.

Strict ordering: Toggle on (default) to enable FIFO (first in, first out) event forwarding, ensuring Cribl Stream/Edge sends earlier queued events first when receivers recover. The persistent queue flushes every 10 seconds in this mode. Toggle off to prioritize new events over queued events, configure a custom drain rate for the queue, and display this option:

Drain rate limit (EPS): Optionally, set a throttling rate (in events per second) on writing from the queue to receivers. (The default 0 value disables throttling.) Throttling the queue drain rate can boost the throughput of new and active connections by reserving more resources for them. You can further optimize Worker startup connections and CPU load in the Group Settings/Fleet Settings > Worker Processes settings.

Clear Persistent Queue: For Cloud Enterprise only, click this button if you want to delete the files that are currently queued for delivery to this Destination. If you click this button, a confirmation modal appears. Clearing the queue frees up disk space by permanently deleting the queued data, without delivering it to downstream receivers. This button only appears after you define the Output ID.

Use the Clear Persistent Queue button with caution to avoid data loss. See Steps to Safely Disable and Clear Persistent Queues for more information.

Calculating the Time PQ Will Take to Engage

PQ will not engage until Cribl Stream has exhausted all attempts to send events to the Kafka receiver. This can take several minutes if requests continue to fail or time out.

To calculate the longest possible time this can take, multiply the values of Request timeout and Max retries. For the default values (60 seconds and 5, respectively), this would be 60 seconds times 5 retries = 300 seconds, or 5 minutes.

TLS Settings (Client Side)

For Amazon MSK Sources and Destinations:
IAM is the only type of authentication that Cribl Stream supports.
Because IAM auth requires TLS, TLS is automatically enabled.

Validate server certs: Require client to reject any connection that is not authorized by a CA in the CA certificate path, or by another trusted CA (such as the system’s CA). Defaults to toggled on.

Server name (SNI): Leave this field blank. See Connecting to Kafka below.

Certificate name: The name of the predefined certificate.

CA certificate path: Path on client containing CA certificates (in PEM format) to use to verify the server’s cert. Path can reference $ENV_VARS.

Private key path (mutual auth): Path on client containing the private key (in PEM format) to use. Path can reference $ENV_VARS. Use only if mutual auth is required.

Certificate path (mutual auth): Path on client containing certificates in (PEM format) to use. Path can reference $ENV_VARS. Use only if mutual auth is required.

Passphrase: Passphrase to use to decrypt private key.

Minimum TLS version: Optionally, select the minimum TLS version to use when connecting.

Maximum TLS version: Optionally, select the maximum TLS version to use when connecting.

Authentication

Use the Authentication method drop-down to select an AWS authentication method.

Auto: This default option uses the AWS SDK for JavaScript to automatically obtain credentials in the following order of attempts:

IAM Roles for Amazon EC2: Loaded from AWS Identity and Access Management (IAM) roles attached to an EC2 instance.
Shared Credentials File: Loaded from the shared credentials file (~/.aws/credentials).
Environment Variables: Loaded from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
JSON File on Disk: Loaded from a JSON file on disk.
Other Credential-Provider Classes: Other credential-provider classes provided by the AWS SDK for JavaScript.

The Auto method works both when running on AWS and in other environments where the necessary credentials are available through one of the above methods.

SSO Providers

When using the auto authentication method, you can leverage SSO providers like SAML and Okta to issue temporary credentials. These credentials should be set in the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The AWS SDK will then use these environment variables to authenticate.

Manual: If not running on AWS, you can select this option to enter a static set of user-associated IAM credentials (your access key and secret key) directly or by reference. This is useful for Workers not in an AWS VPC, such as those running in a private cloud. The Manual option exposes these corresponding additional fields:

Access key: Enter your AWS access key. If not present, will fall back to the env.AWS_ACCESS_KEY_ID environment variable, or to the metadata endpoint for IAM role credentials.
Secret key: Enter your AWS secret key. If not present, will fall back to the env.AWS_SECRET_ACCESS_KEY environment variable, or to the metadata endpoint for IAM credentials.

Secret: If not running on AWS, you can select this option to supply a stored secret that references an AWS access key and secret key. The Secret option exposes this additional field:

Secret key pair: Use the drop-down to select an API key/secret key pair that you’ve configured in Cribl Stream’s secrets manager. A Create link is available to store a new, reusable secret.

Assume Role

When using Assume Role to access resources in a different region than Cribl Stream, you can target the AWS Security Token Service (STS) endpoint specific to that region by using the CRIBL_AWS_STS_REGION environment variable on your Worker Node. Setting an invalid region results in a fallback to the global STS endpoint.

Enable for MSK: Toggle on to use Assume Role credentials to access MSK.

AssumeRole ARN: Enter the Amazon Resource Name (ARN) of the role to assume.

External ID: Enter the External ID to use when assuming role. This is required only when assuming a role that requires this ID in order to delegate third-party access. For details, see AWS’ documentation.

Duration (seconds): Duration of the Assumed Role’s session, in seconds. Minimum is 900 (15 minutes). Maximum is 43200 (12 hours). Defaults to 3600 (1 hour).

Processing Settings

Post-Processing

Pipeline: Pipeline or Pack to process data before sending the data out using this output.

System fields: A list of fields to automatically add to events that use this output. By default, includes cribl_pipe (identifying the Cribl Stream Pipeline that processed the event). Supports wildcards. Other options include:

cribl_host - Cribl Stream Node that processed the event.
cribl_input - Cribl Stream Source that processed the event.
cribl_output - Cribl Stream Destination that processed the event.
cribl_route - Cribl Stream Route (or QuickConnect) that processed the event.
cribl_wp - Cribl Stream Worker Process that processed the event.

Retries

These settings provide flexibility in handling retries for failed messages, allowing you to balance between quick retries and avoiding excessive load on the system.

The default configuration starts with a 300 milliseconds retry interval, doubles the interval after each retry, and caps the maximum retry interval at 30 seconds. The system will attempt to retry the request up to five times before considering it a failure.

Max retries: Maximum number of times to retry a failed request before the message fails. Defaults to 5. Enter 0 to not retry at all. For example, if set to 5, the system will attempt to retry the request up to five times before considering it a failure.

Initial retry interval (ms): Initial value used to calculate the retry interval, in milliseconds. This value determines the starting point for the retry delay. The default (and minimum) is 300 ms (three seconds). The maximum allowed value is 600000 ms (10 minutes). For example, if set to 1000 ms, the first retry will occur after one second.

Backoff multiplier: Set the backoff multiplier (2-20) to control the retry frequency for failed messages. The multiplier is used in an exponential backoff formula to increase the delay between retries. For faster retries, use a lower multiplier. For slower retries with more delay between attempts, use a higher multiplier. For example, with an initial retry interval of 1000 ms and a multiplier of 2, the retry intervals will be 1,000 ms, 2,000 ms, 4,000 ms, and so on. See the Kafka documentation for details.

Backoff limit (ms): The maximum wait time for a retry, in milliseconds. This setting caps the exponential backoff delay to prevent excessively long wait times. The default (and minimum) value is 30000 ms (30 seconds), and the maximum is 180000 ms (180 seconds). For example, if the calculated retry interval exceeds 180,000 ms, the retry will occur after 180,000 ms instead.

Advanced Settings

Max record size (KB, uncompressed): Maximum size (KB) of each record batch before compression. Setting should be < message.max.bytes settings in Kafka brokers. Defaults to 768.

Max events per batch: Maximum number of events in a batch before forcing a flush. Defaults to 1000.

Flush period (sec): Maximum time between requests. Low values could cause the payload size to be smaller than its configured maximum. Defaults to 1.

Connection timeout (ms): Maximum time to wait for a connection to complete successfully. Defaults to 10000 ms (10 seconds). Valid range is 1000 to 3600000 ms (1 second to 1 hour).

Request timeout (ms): Maximum time to wait for Kafka to respond to a request. Defaults to 60000 ms (1 minute).

Authentication timeout (ms): Maximum time to wait for Kafka to respond to an authentication request. Defaults to 1000 (1 second).

Reauthentication threshold (ms): If the broker requires periodic reauthentication, this setting defines how long before the reauthentication timeout Cribl Stream initiates the reauthentication. Defaults to 10000 (10 seconds).

A small value for this setting, combined with high network latency, might prevent the Destination from reauthenticating before the Kafka broker closes the connection.

A large value might cause the Destination to send reauthentication messages too soon, wasting bandwidth.

The Kafka setting connections.max.reauth.ms controls the reuthentication threshold on the Kafka side.

Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.

Protobuf Format Settings

Definition set: From the drop-down, select OpenTelemetry. This makes the Object type setting available.

Object type: From the drop-down, select Logs, Metrics, or Traces.

Working with Protobufs

This Destination supports supports Binary Protobuf payload encoding. The Protobufs it sends can encode traces, metrics, or logs, the three types of telemetry data defined in the OpenTelemetry Project’s Data Sources documentation:

A trace tracks the progression of a single request.
- Each trace is a tree of spans.
- A span object represents the work being done by the individual services, or components, involved in a request as that request flows through a system.
A metric provides aggreggated statistical information.
- A metric contains individual measurements called data points.
A log, in OpenTelemetry terms, is “any data that is not part of a distributed trace or a metric”.

When configuring Pipelines (including pre-processing and post-processing Pipelines), you need to ensure that events sent to this Destination conform to the relevant Protobuf specification:

For traces, opentelemetry/proto/trace/v1/trace.proto.
For metrics, opentelemetry/proto/metrics/v1/metrics.proto.
For logs, opentelemetry/proto/logs/v1/logs.proto.

This Destination will drop non-conforming events. If the Destination encounters a parsing error when trying to convert an event to OTLP, it discards the event and Cribl Stream logs the error.

Connecting to Kafka

Leave the TLS Settings > Server name (SNI) field blank

In Cribl Stream’s Kafka-based Sources and Destinations (including this one), the Kafka library that Cribl Stream uses manages SNI (Server Name Indication) without any input from Cribl Stream. Therefore, you should leave the TLS Settings > Server name (SNI) field blank.
Setting this field in the Cribl Stream UI can cause traffic to be routed to the wrong brokers, because it interferes with the Kafka library’s operation.

Connecting to a Kafka cluster involves using hostnames for two key components: bootstrap servers and brokers.

Brokers are servers that comprise the storage layer in a Kafka cluster. Bootstrap servers handle the initial connection to the Kafka cluster, and then return the list of brokers. A broker list can run into the hundreds. Every Kafka cluster has a bootstrap.servers property, defined as either a single hostname:port K-V pair, or a list of them. If Cribl Stream tries to connect via one bootstrap server and that fails, Cribl Stream then tries another one on the list.

In the General Settings > Bootstrap servers list, you can enter either the hostnames of brokers that your Kafka server has been configured to use, or, the hostnames of one or more bootstrap servers. If Kafka returns a list of brokers that’s longer than the list you entered, Cribl Stream keeps the full list internally. Cribl Stream neither saves the list nor makes it available in the UI. The connection process simply starts at the beginning whenever the Source or Destination is started.

Here’s an overview of the connection process:

From the General Settings > Bootstrap servers list - where each broker is listed as a hostname and port - Cribl Stream takes a hostname and resolves it to an IP address.
Cribl Stream makes a connection to that IP address. Notwithstanding the fact that Cribl Stream resolved one particular hostname to that IP address, there may be many services running at that IP address - each with its own distinct hostname.
Cribl Stream establishes TLS security for the connection.

Although SNI is managed by the Kafka library rather than in the Cribl Stream UI, you might want to know how it fits into the connection process. The purpose of the SNI is to specify one hostname - that is, service - among many that might be running on a given IP address within a Kafka cluster. Excluding the other services is one way that TLS makes the connection more secure.

Working with Event Timestamps

By default, when an incoming event reaches this Destination, the underlying Kafka JS library adds a timestamp field set to the current time at that moment. The library gets the current time from the Date.now() JavaScript method. Events that this Destination sends to downstream Kafka receivers include this timestamp field.

Some production situations require the timestamp value to be the time an incoming event was originally created by an upstream sender, not the current time of its arrival at this Destination. For example, suppose the events were originally created over a wide time range, and arrive at the Destination hours or days later. In this case, the original creation time might be important to retain in events Cribl Stream sends to downstream Kafka receivers.

Here is how to satisfy such a requirement:

In a Pipeline that routes events to this Destination, use an Eval Function to add a field named __kafkaTime and write the incoming event’s original creation time into that field. The value of __kafkaTime must be in UNIX epoch time (either seconds or milliseconds since the UNIX epoch - both will work). For context, see this explanation of the related fields _time and __origTime.
This Destination will recognize the __kafkaTime field and write its value into the timestamp field. (This is similar to the way you can use the __topicOut field to overwrite a topic setting, as described above.)

If the __kafkaTime field is not present, the Destination will apply the default behavior (using Date.now()) described previously.

Internal Fields

Cribl Stream uses a set of internal fields to assist in forwarding data to a Destination.

Fields for this Destination:

__topicOut
__key
__headers
__keySchemaIdOut
__valueSchemaIdOut

Troubleshooting

The Destination’s configuration modal has helpful tabs for troubleshooting:

Live Data: Try capturing live data to see real-time events as they flow through the Destination. On the Live Data tab, click Start Capture to begin viewing real-time data.

Logs: Review and search the logs that provide detailed information about the delivery process, including any errors or warnings that may have occurred.

Test: Ensures that the Destination is correctly set up and reachable. Verify that sample events are sent correctly by clicking Run Test.

You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify delivery issues. Analyze the graphs showing events and bytes in/out over time.

Amazon MSK Destination

Configuring Cribl Stream to Output to Amazon MSK

Persistent Queue Settings

Calculating the Time PQ Will Take to Engage

TLS Settings (Client Side)

Authentication

SSO Providers

Assume Role

Processing Settings

Post-Processing

Retries

Advanced Settings

Protobuf Format Settings

Working with Protobufs

Connecting to Kafka

Leave the TLS Settings > Server name (SNI) field blank

Working with Event Timestamps

Internal Fields

Troubleshooting

Common Resources

Amazon MSK Destination

Configuring Cribl Stream to Output to Amazon MSK​

Persistent Queue Settings​

Calculating the Time PQ Will Take to Engage​

TLS Settings (Client Side)​

Authentication​

SSO Providers​

Assume Role​

Processing Settings​

Post-Processing​

Retries​

Advanced Settings​

Protobuf Format Settings​

Working with Protobufs​

Connecting to Kafka​

Leave the TLS Settings > Server name (SNI) field blank​

Working with Event Timestamps​

Internal Fields​

Troubleshooting​

Common Resources

Configuring Cribl Stream to Output to Amazon MSK

Persistent Queue Settings

Calculating the Time PQ Will Take to Engage

TLS Settings (Client Side)

Authentication

SSO Providers

Assume Role

Processing Settings

Post-Processing

Retries

Advanced Settings

Protobuf Format Settings

Working with Protobufs

Connecting to Kafka

Leave the TLS Settings > Server name (SNI) field blank

Working with Event Timestamps

Internal Fields

Troubleshooting