These docs are for Cribl Edge 4.7 and are no longer actively maintained.
See the latest version (4.14).
Amazon Data Firehose Source
Cribl Edge supports receiving data from Amazon Data Firehose delivery streams via the HTTP endpoint destination option.
Type: Push | TLS Support: YES | Event Breaker Support: No
This Source supports gzip-compressed inbound data when the
Content‑Encoding: gzipconnection header is set.
AWS Configuration
Configuring the endpoint and port settings for integrating Cribl with Amazon Data Firehose varies depending on whether you’re using Cribl.Cloud or a non-Cribl.Cloud deployment.
- Cribl.Cloud:- Configure the AWS destination HTTP endpoint URL to your Cribl.Cloud endpoint https://default.main.<your-org-id>.cribl.cloudon port443. Replace<your-org-id>with your actual organization ID.
- Configure the Cribl Edge Amazon Data Firehose Source to listen on port 10443internally.
 
- Configure the AWS destination HTTP endpoint URL to your Cribl.Cloud endpoint 
- Non-Cribl.Cloud:- Configure the AWS destination HTTP endpoint URL to the public URL of your Cribl Edge instance.
- Configure the Cribl Edge Amazon Data Firehose Source to either listen on port 443with elevated privileges or use a reverse proxy to forward traffic from port443to a non-privileged port, like10443.
 
Configuring Cribl Edge to Receive Data over HTTP(S) from Amazon Data Firehose
From the top nav, click Manage, then select a Fleet to configure. Next, you have two options:
To configure via the graphical QuickConnect UI, click Routing > QuickConnect (Stream) or Collect (Edge). Next, click Add Source at left. From the resulting drawer’s tiles, select [Push >] Amazon > Firehose. Next, click either Add Destination or (if displayed) Select Existing. The resulting drawer will provide the options below.
Or, to configure via the Routing UI, click Data > Sources (Stream) or More > Sources (Edge). From the resulting page’s tiles or left nav, select [Push >] Amazon > Firehose. Next, click New Source to open a New Source modal that provides the options below.
General Settings
Input ID: Enter a unique name to identify this Source definition. If you clone this Source, Cribl Edge will add -CLONE to the original Input ID.
Address: Address to bind on. Defaults to 0.0.0.0 (all addresses).
Port: Enter the port number to listen on. In Cribl.Cloud, use port 10443, as this is where the data is processed internally.
Authentication Settings
Auth tokens: Shared secrets to be provided by any client (Authorization: <token>). Click Generate to create a new secret. If empty, unauthenticated access will be permitted.
Optional Settings
Tags: Optionally, add tags that you can use to filter and group Sources in Cribl Edge’s Manage Sources page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
TLS Settings (Server Side)
Enabled defaults to No. When toggled to Yes:
Certificate name: Name of the predefined certificate.
Private key path: Server path containing the private key (in PEM format) to use. Path can reference $ENV_VARS.
Passphrase: Passphrase to use to decrypt private key.
Certificate path: Server path containing certificates (in PEM format) to use. Path can reference $ENV_VARS.
CA certificate path: Server path containing CA certificates (in PEM format) to use. Path can reference $ENV_VARS.
Authenticate client (mutual auth): Require clients to present their certificates. Used to perform mutual authentication using SSL certs. Defaults to No. When toggled to Yes:
- Validate client certs: Reject certificates that are not authorized by a CA in the CA certificate path, or by another trusted CA (e.g., the system’s CA). Defaults to - No.
- Common Name: Regex that a peer certificate’s subject attribute must match in order to connect. Defaults to - .*. Matches on the substring after- CN=. As needed, escape regex tokens to match literal characters. (For example, to match the subject- CN=worker.cribl.local, you would enter:- worker\.cribl\.local.) If the subject attribute contains Subject Alternative Name (SAN) entries, the Source will check the regex against all of those but ignore the Common Name (CN) entry (if any). If the certificate has no SAN extension, the Source will check the regex against the single name in the CN.
Minimum TLS version: Optionally, select the minimum TLS version to accept from connections.
Maximum TLS version: Optionally, select the maximum TLS version to accept from connections.
Persistent Queue Settings
In this section, you can optionally specify persistent queue storage, using the following controls. This will buffer and preserve incoming events when a downstream Destination is down, or exhibiting backpressure.
On Cribl-managed Cribl.Cloud Workers (with an Enterprise plan), this tab exposes only the Enable Persistent Queue toggle. If enabled, PQ is automatically configured in
Always Onmode, with a maximum queue size of 1 GB disk space allocated per PQ‑enabled Source, per Worker Process.The 1 GB limit is on uncompressed inbound data, and no compression is applied to the queue. This limit is not configurable. For configurable queue size, compression, mode, and other options below, use a hybrid Group.
Enable Persistent Queue: Defaults to No. When toggled to Yes:
Mode: Select a condition for engaging persistent queues.
- Always On: This default option will always write events to the persistent queue, before forwarding them to Cribl Edge’s data processing engine.
- Smart: This option will engage PQ only when the Source detects backpressure from Cribl Edge’s data processing engine.
Max buffer size: The maximum number of events to hold in memory before reporting backpressure to the sender and writing the queue to disk. Defaults to 1000. (This buffer is per connection, not just per Worker Process – and this can dramatically expand memory usage.)
Commit frequency: The number of events to send downstream before committing that Stream has read them. Defaults to 42.
Max file size: The maximum data volume to store in each queue file before closing it and (optionally) applying the configured Compression. Enter a numeral with units of KB, MB, etc. If not specified, Cribl Edge applies the default 1 MB.
Max queue size: The maximum amount of disk space that the queue is allowed to consume on each Worker Process. Once this limit is reached, this Source will stop queueing data and block incoming data. Required, and defaults to 5 GB. Accepts positive numbers with units of KB, MB, GB, etc. Can be set as high as 1 TB, unless you’ve configured a different Max PQ size per Worker Process in Fleet Settings.
Queue file path: The location for the persistent queue files. Defaults to $CRIBL_HOME/state/queues. To this field’s specified path, Cribl Edge will append /<worker-id>/inputs/<input-id>.
Compression: Optional codec to compress the persisted data after a file is closed. Defaults to None; Gzip is also available.
In Cribl Edge 4.1 and later, Source-side PQ’s default Mode is
Always on, to best ensure events’ delivery. For details on optimizing this selection, see Always On versus Smart Mode.You can optimize Workers’ startup connections and CPU load at Fleet Settings > Worker Processes.
Processing Settings
Fields
In this section, you can add Fields to each event using Eval-like functionality.
- Name: Field name.
- Value: JavaScript expression to compute field’s value, enclosed in quotes or backticks. (Can evaluate to a constant.)
Pre-Processing
In this section’s Pipeline drop-down list, you can select a single existing Pipeline to process data from this input before the data is sent through the Routes.
Advanced Settings
Show originating IP: Toggle to Yes when clients are connecting through a proxy that supports the X-Forwarded-For header to keep the client’s original IP address on the event instead of the proxy’s IP address. This setting affects how the Source handles the __srcIpPort field.
Capture request headers: Toggle this to Yes to add request headers to events, in the __headers field.
Health check endpoint: Toggle to Yes to enable a health check endpoint specific to this Source, http(s)://<host>:<port>/cribl_health. A 200 HTTP response code is returned when the Source is healthy. Otherwise, two errors you could receive are:
- ECONNRESETwhere the Source failed to initialize due to not having listeners on the port.
- 503or- Server is busy, max active connections reachedindicate there are too many connections per Worker Process.
Max active requests: Maximum number of active requests allowed for this Source, per Worker Process. Defaults to 256. Enter 0 for unlimited.
Activity log sample rate: Determines how often request activity is logged at the info level. The default 100 value logs every 100th value; a 1 value would log every request; a 10 value would log every 10th request; etc.
Max requests per socket: The maximum number of requests Cribl Edge should allow on one socket before instructing the client to close the connection. Defaults to 0 (unlimited). See Balancing Connection Reuse Against Request Distribution below.
Socket timeout (seconds): How long Cribl Edge should wait before assuming that an inactive socket has timed out. The default 0 value means wait forever.
Request timeout (seconds): How long to wait for an incoming request to complete before aborting it. The default 0 value means wait indefinitely.
Keep-alive timeout (seconds): After the last response is sent, Cribl Edge will wait this long for additional data before closing the socket connection. Defaults to 5 seconds; minimum is 1 second; maximum is 600 seconds (10 minutes).
The longer the Keep‑alive timeout, the more Cribl Edge will reuse connections. The shorter the timeout, the closer Cribl Edge gets to creating a new connection for every request. When request frequency is high, you can use longer timeouts to reduce the number of connections created, which mitigates the associated cost.
Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.
Balancing Connection Reuse Against Request Distribution
Max requests per socket allows you to limit the number of HTTP requests an upstream client can send on one network connection. Once the limit is reached, Cribl Edge uses HTTP headers to inform the client that it must establish a new connection to send any more requests. (Specifically, Cribl Edge sets the HTTP Connection header to close.) After that, if the client disregards what Cribl Edge has asked it to do and tries to send another HTTP request over the existing connection, Cribl Edge will respond with an HTTP status code of 503 Service Unavailable.
Use this setting to strike a balance between connection reuse by the client, and distribution of requests among one or more Edge Node processes by Cribl Edge:
- When a client sends a sequence of requests on the same connection, that is called connection reuse. Because connection reuse benefits client performance by avoiding the overhead of creating new connections, clients have an incentive to maximize connection reuse. 
- Meanwhile, a single process on that Edge Node will handle all the requests of a single network connection, for the lifetime of the connection. When receiving a large overall set of data, Cribl Edge performs better when the workload is distributed across multiple Edge Node processes. In that situation, it makes sense to limit connection reuse. 
There is no one-size-fits-all solution, because of variation in the size of the payload a client sends with a request and in the number of requests a client wants to send in one sequence. Start by estimating how long connections will stay open. To do this, multiply the typical time that requests take to process (based on payload size) times the number of requests the client typically wants to send.
If the result is 60 seconds or longer, set Max requests per socket to force the client to create a new connection sooner. This way, more data can be spread over more Edge Node processes within a given unit of time.
For example: Suppose a client tries to send thousands of requests over a very few connections that stay open for hours on end. By setting a relatively low Max requests per socket, you can ensure that the same work is done over more, shorter-lived connections distributed between more Edge Node processes, yielding better performance from Cribl Edge.
A final point to consider is that one Cribl Edge Source can receive requests from more than one client, making it more complicated to determine an optimal value for Max requests per socket.
Connected Destinations
Select Send to Routes to enable conditional routing, filtering, and cloning of this Source’s data via the Routing table.
Select QuickConnect to send this Source’s data to one or more Destinations via independent, direct connections.
Internal Fields
Cribl Edge uses a set of internal fields to assist in handling of data. These “meta” fields are not part of an event, but they are accessible, and functions can use them to make processing decisions.
Fields for this Source:
- __headers– Added only when Advanced Settings > Capture request headers is set to- Yes.
- __inputId
- __raw
- __srcIpPort– See details below.
Records also include the following metadata specific to the Amazon Data Firehose service:
- __firehoseArn: The Amazon Resource Name (ARN) of the Amazon Data Firehose delivery stream that served as the data source.
- __firehoseReqId: A unique identifier assigned to the specific data ingestion request processed by Amazon Data Firehose.
- __firehoseEndpoint: The service endpoint URL of the Amazon Data Firehose service that received the data.
- __firehoseToken: An authentication token associated with the Amazon Data Firehose data ingestion request, used for source validation.
Overriding __srcIpPort with Client IP/Port
The __srcIpPort field’s value contains the IP address and (optionally) port of the Amazon Data Firehose client sending data to this Source.
When any proxies (including load balancers) lie between the Amazon Data Firehose client and the Source, the last proxy adds an X‑Forwarded‑For header whose value is the IP/port of the original client. With multiple proxies, this header’s value will be an array, whose first item is the original client IP/port.
If X‑Forwarded‑For is present, and Advanced Settings > Show originating IP is set to No, the original client IP/port in this header will override the value of __srcIpPort.
If Show originating IP is set to Yes, the X‑Forwarded‑For header’s contents will not override the __srcIpPort value. (Here, the upstream proxy can convey the client IP/port without using this header.)
Limitations
If you set the optional IntervalInSeconds and/or SizeInMBs parameters in the Amazon Data Firehose BufferingHints API, beware of selecting extreme values (toward the ends of the API’s supported ranges). These can send more bytes than Cribl Edge can buffer, causing Cribl Edge to send HTTP 500 error responses to Amazon Data Firehose.
Troubleshooting
The Source’s configuration modal has helpful tabs for troubleshooting:
Live Data: Try capturing live data to see real-time events as they are ingested. On the Live Data tab, click Start Capture to begin viewing real-time data.
Logs: Review and search the logs that provide detailed information about the ingestion process, including any errors or warnings that may have occurred.
You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify ingestion issues. Analyze the graphs showing events and bytes in/out over time.