These docs are for Cribl Edge 4.3 and are no longer actively maintained.
See the latest version (4.14).
Amazon MSK
Amazon MSK
Cribl Edge supports sending data to an Amazon Managed Streaming for Apache Kafka (MSK) topic.
Type: Streaming | TLS Support: Configurable | PQ Support: Yes
Kafka uses a binary protocol over TCP. It does not support HTTP proxies, so Cribl Edge must send events directly to receivers. You might need to adjust your firewall rules to allow this traffic.
Configuring Cribl Edge to Output to Kafka
From the top nav, click Manage, then select a Fleet to configure. Next, you have two options:
To configure via the graphical QuickConnect UI, click Routing > QuickConnect (Stream) or Collect (Edge). Next, click Add Destination at right. From the resulting drawer’s tiles, select Amazon MSK. Next, click either Add Destination or (if displayed) Select Existing. The resulting drawer will provide the options below.
Or, to configure via the Routing UI, click Data > Destinations (Stream) or More > Destinations (Edge). From the resulting page’s tiles or the Destinations left nav, select Amazon MSK. Next, click Add Destination to open a New Destination modal that provides the options below.
General Settings
Output ID: Enter a unique name to identify this Amazon MSK Destination definition.
Brokers: List of Kafka brokers to connect to. (E.g., kafkaBrokerHost:9092.)
Topic: The topic on which to publish events. Can be overwritten using the event’s __topicOut field.
Region: From the drop-down, select the name of the AWS Region where your Amazon MSK cluster is located.
Optional Settings
Acknowledgments: Select the number of required acknowledgments. Defaults to Leader.
Record data format: Format to use to serialize events before writing to Kafka. Defaults to JSON.
Compression: Codec to compress the data before sending to Kafka. Select None, Gzip, Snappy, or LZ4. Defaults to Gzip.
Cribl strongly recommends enabling compression. Doing so improves Cribl Edge’s performance, enabling faster data transfer using less bandwidth.
Backpressure behavior: Select whether to block, drop, or queue incoming events when all receivers are exerting backpressure. Defaults to Block.
Tags: Optionally, add tags that you can use to filter and group Destinations in Cribl Edge’s Manage Destinations page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
Persistent Queue Settings
This tab is displayed when the Backpressure behavior is set to Persistent Queue.
On Cribl-managed Cribl.Cloud Workers (with an Enterprise plan), this tab exposes only the Clear Persistent Queue button. A maximum queue size of 1 GB disk space is automatically allocated per PQ‑enabled Destination, per Worker Process. The 1 GB limit is on outbound uncompressed data, and no compression is applied to the queue.
This limit is not configurable. If the queue fills up, Cribl Edge will block outbound data. To configure the queue size, compression, queue-full fallback behavior, and other options below, use a hybrid Group.
Max file size: The maximum data volume to store in each queue file before closing it. Enter a numeral with units of KB, MB, etc. Defaults to 1 MB.
Max queue size: The maximum amount of disk space the queue is allowed to consume. Once this limit is reached, queueing is stopped and data blocking is applied. Enter a numeral with units of KB, MB, etc.
Queue file path: The location for the persistent queue files. This will be of the form: your/path/here/<worker‑id>/<output‑id>. Defaults to: $CRIBL_HOME/state/queues.
Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to None; Gzip is also available.
Queue-full behavior: Whether to block or drop events when the queue is exerting backpressure (because disk is low or at full capacity). Block is the same behavior as non-PQ blocking, corresponding to the Block option on the Backpressure behavior drop-down. Drop new data throws away incoming data, while leaving the contents of the PQ unchanged.
Strict ordering: The default Yes position enables FIFO (first in, first out) event forwarding. When receivers recover, Cribl Edge will send earlier queued events before forwarding newly arrived events. To instead prioritize new events before draining the queue, toggle this off. Doing so will expose this additional control:
- Drain rate limit (EPS): Optionally, set a throttling rate (in events per second) on writing from the queue to receivers. (The default 0value disables throttling.) Throttling the queue’s drain rate can boost the throughput of new/active connections, by reserving more resources for them. You can further optimize Workers’ startup connections and CPU load at Fleet Settings > Worker Processes.
Clear persistent queue: Click this button if you want to flush out files that are currently queued for delivery to this Destination. A confirmation modal will appear. (Appears only after Output ID has been defined.)
Calculating the Time PQ Will Take to Engage
PQ will not engage until Cribl Edge has exhausted all attempts to send events to the Kafka receiver. This can take several minutes if requests continue to fail or time out.
To calculate the longest possible time this can take, multiply the values of Advanced Settings > Request timeout and Max retries. For the default values (60 seconds and 5, respectively), this would be 60 seconds times 5 retries = 300 seconds, or 5 minutes.
TLS Settings (Client Side)
For Amazon MSK Sources and Destinations:
- IAM is the only type of authentication that Cribl Edge supports.
- Because IAM auth requires TLS, TLS is automatically enabled.
Validate server certs: Reject certificates that are not authorized by a CA in the CA certificate path, or by another trusted CA (e.g., the system’s CA). Defaults to Yes.
Server name (SNI): Leave this field blank. See Connecting to Kafka below.
Certificate name: The name of the predefined certificate.
CA certificate path: Path on client containing CA certificates (in PEM format) to use to verify the server’s cert. Path can reference $ENV_VARS.
Private key path (mutual auth): Path on client containing the private key (in PEM format) to use. Path can reference $ENV_VARS. Use only if mutual auth is required.
Certificate path (mutual auth): Path on client containing certificates in (PEM format) to use. Path can reference $ENV_VARS. Use only if mutual auth is required.
Passphrase: Passphrase to use to decrypt private key.
Minimum TLS version: Optionally, select the minimum TLS version to use when connecting.
Maximum TLS version: Optionally, select the maximum TLS version to use when connecting.
Authentication
Use the Authentication Method buttons to select an AWS authentication method.
Auto: This default option uses the AWS instance’s metadata service to automatically obtain short-lived credentials from the IAM role attached to an EC2 instance, local credentials, sidecar, or other source. The attached IAM role grants Cribl Edge Workers access to authorized AWS resources. Can also use the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Works only when running on AWS.
Manual: If not running on AWS, you can select this option to enter a static set of user-associated IAM credentials (your access key and secret key) directly or by reference. This is useful for Workers not in an AWS VPC, e.g., those running a private cloud. The Manual option exposes these corresponding additional fields:
- Access key: Enter your AWS access key. If not present, will fall back to the - env.AWS_ACCESS_KEY_IDenvironment variable, or to the metadata endpoint for IAM role credentials.
- Secret key: Enter your AWS secret key. If not present, will fall back to the - env.AWS_SECRET_ACCESS_KEYenvironment variable, or to the metadata endpoint for IAM credentials.
Secret: If not running on AWS, you can select this option to supply a stored secret that references an AWS access key and secret key. The Secret option exposes this additional field:
- Secret key pair: Use the drop-down to select an API key/secret key pair that you’ve configured in Cribl Edge’s secrets manager. A Create link is available to store a new, reusable secret.
Assume Role
Enable for MSK: Toggle on to use Assume Role credentials to access MSK.
AssumeRole ARN: Enter the Amazon Resource Name (ARN) of the role to assume.
External ID: Enter the External ID to use when assuming role. This is required only when assuming a role that requires this ID in order to delegate third-party access. For details, see AWS’ documentation.
Duration (seconds): Duration of the Assumed Role’s session, in seconds. Minimum is 900 (15 minutes). Maximum is 43200 (12 hours). Defaults to 3600 (1 hour).
Processing Settings
Post‑Processing
Pipeline: Pipeline to process data before sending the data out using this output.
System fields: A list of fields to automatically add to events that use this output. By default, includes cribl_pipe (identifying the Cribl Edge Pipeline that processed the event). Supports wildcards. Other options include:
- cribl_host– Cribl Edge Node that processed the event.
- cribl_input– Cribl Edge Source that processed the event.
- cribl_output– Cribl Edge Destination that processed the event.
- cribl_route– Cribl Edge Route (or QuickConnect) that processed the event.
- cribl_wp– Cribl Edge Worker Process that processed the event.
Advanced Settings
Max record size (KB, uncompressed): Maximum size (KB) of each record batch before compression. Setting should be < message.max.bytes settings in Kafka brokers. Defaults to 768.
Max events per batch: Maximum number of events in a batch before forcing a flush. Defaults to 1000.
Flush period (sec): Maximum time between requests. Low values could cause the payload size to be smaller than its configured maximum. Defaults to 1.
Connection timeout (ms): Maximum time to wait for a connection to complete successfully. Defaults to 10000 ms (10 seconds). Valid range is 1000 to 3600000 ms (1 second to 1 hour).
Request timeout (ms): Maximum time to wait for Kafka to respond to a request. Defaults to 60000 ms (1 minute).
Max retries: Maximum number of times to retry a failed request before the message fails. Defaults to 5; enter 0 to not retry at all.
Authentication timeout (ms): Maximum time to wait for Kafka to respond to an authentication request. Defaults to 1000 (1 second).
Reauthentication threshold (ms): If the broker requires periodic reauthentication, this setting defines how long before the reauthentication timeout Cribl Edge initiates the reauthentication. Defaults to 10000 (10 seconds).
A small value for this setting, combined with high network latency, might prevent the Destination from reauthenticating before the Kafka broker closes the connection.
A large value might cause the Destination to send reauthentication messages too soon, wasting bandwidth.
The Kafka setting connections.max.reauth.ms controls the reuthentication threshold on the Kafka side.
Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.
Connecting to Kafka
Leave the TLS Settings > Server name (SNI) field blank
In Cribl Edge’s Kafka-based Sources and Destinations (including this one), the Kafka library that Cribl Edge uses manages SNI (Server Name Indication) without any input from Cribl Edge. Therefore, you should leave the TLS Settings > Server name (SNI) field blank.
Setting this field in the Cribl Edge UI can cause traffic to be routed to the wrong brokers, because it interferes with the Kafka library’s operation.
Connecting to a Kafka cluster entails working with hostnames for brokers and bootstrap servers.
Brokers are servers that comprise the storage layer in a Kafka cluster. Bootstrap servers handle the initial connection to the Kafka cluster, and then return the list of brokers. A broker list can run into the hundreds. Every Kafka cluster has a bootstrap.servers property, defined as either a single hostname:port K-V pair, or a list of them. If Cribl Edge tries to connect via one bootstrap server and that fails, Cribl Edge then tries another one on the list.
In the General Settings > Brokers list, you can enter either the hostnames of brokers that your Kafka server has been configured to use, or, the hostnames of one or more bootstrap servers. If Kafka returns a list of brokers that’s longer than the list you entered, Cribl Edge keeps the full list internally. Cribl Edge neither saves the list nor makes it available in the UI. The connection process simply starts at the beginning whenever the Source or Destination is started.
Here’s an overview of the connection process:
- From the General Settings > Brokers list – where each broker is listed as a hostname and port – Cribl Edge takes a hostname and resolves it to an IP address. 
- Cribl Edge makes a connection to that IP address. Notwithstanding the fact that Cribl Edge resolved one particular hostname to that IP address, there may be many services running at that IP address – each with its own distinct hostname. 
- Cribl Edge establishes TLS security for the connection. 
Although SNI is managed by the Kafka library rather than in the Cribl Edge UI, you might want to know how it fits into the connection process. The purpose of the SNI is to specify one hostname – i.e., service – among many that might be running on a given IP address within a Kafka cluster. Excluding the other services is one way that TLS makes the connection more secure.
Internal Fields
Cribl Edge uses a set of internal fields to assist in forwarding data to a Destination.
Fields for this Destination:
- __topicOut
- __key
- __headers
- __keySchemaIdOut
- __valueSchemaIdOut