/ / / /

ClickHouse Destination

The ClickHouse Destination enables seamless integration with ClickHouse. This Destination routes observability data, including logs, metrics, and traces, to ClickHouse for enhanced monitoring and analysis.

Data is formatted and transmitted via HTTP requests, offering both synchronous and asynchronous insert options for flexibility. Authentication is secured with Basic or SSL User Certificate methods, while robust retry settings ensure reliable data delivery. Flexible data mapping, including automatic and manual field-to-column mapping, guarantees structured ingestion. To optimize performance, the integration supports batching and asynchronous inserts, leveraging ClickHouse’s capabilities for efficient data delivery.

To get started, connecting to ClickHouse is simple. Grab your ClickHouse instance’s HTTP interface URL, database name, and table name, then add these details to your Destination settings. Need more detailed instructions or configuration options? See below!

Type: Streaming | TLS Support: Configurable | PQ Support: Yes

Configure a ClickHouse Destination

In Cribl Edge, set up a ClickHouse Destination.

On the top bar, select Products, and then select Cribl Edge. Under Fleets, select a Fleet. Next, you have two options:
- To configure via QuickConnect, navigate to Routing > QuickConnect (Stream) or Collect (Edge). Select Add Destination and select the Destination you want from the list, choosing either Select Existing.
- To configure via the Routes, select Data > Destinations (Stream) or More > Destinations (Edge). Select the Destination you want. Next, select Add Destination.
In the Destination modal, configure the following under General Settings:
- Output ID: Enter a unique name to identify this output definition. If you clone this Destination, Cribl Edge will add -CLONE to the original Output ID. A Description is optional.
- URL: The URL of your ClickHouse instance’s HTTP interface. This should include the protocol (http or https) and the port number if it’s not the default. Example: http://localhost:8123.
- ClickHouse database: Enter the name of the ClickHouse database where the target table resides. Ensure that the database exists and the authenticated user has the necessary permissions to write to it.
- ClickHouse table: Enter the name of the target table. The name can contain letters (A-Z, a-z), numbers (0-9), and underscores _, and must start with either a letter or underscore. The table must exist and have a schema compatible with the data being sent.
- Format: Choose the format for events sent to ClickHouse. Choose between:
  - JSONCompactEachRowWithNames (default): Optimized for efficiency by minimizing data size, making it ideal for transmitting data to ClickHouse over networks where bandwidth is a concern.
  - JSONEachRow: Standard JSON format with extra spacing, useful for debugging or human inspection. If the time spent reformatting data into the compact format exceeds the cost of transmitting more data, JSONEachRow may be preferable. For example, in scenarios where the ClickHouse instance runs on the same machine as Cribl, the network cost is negligible, and the time spent reformatting doesn’t provide significant benefits.
- Authentication type: ClickHouse supports several authentication methods for its HTTP interface.
  - None: Don’t use authentication.
  - Basic: Displays Username and Password fields for you to enter HTTP Basic authentication credentials.
  - Basic (credentials secret): This option exposes a Credentials secret drop-down to select a stored text secret that contains your username and password credentials referenced by a secret. A Create link is available to store a new, reusable secret.
  - SSL User Certificate: Use to securely connect to ClickHouse. Provide the username for certificate authentication. This method ensures a higher level of security by using SSL certificates to verify the identity of the client.
- Mapping: Determines how fields from Cribl events are mapped to ClickHouse columns. Options include Automatic and Custom.
  - Automatic: Uses default mappings.
    - Exclude fields: A list of fields to exclude from the data sent to ClickHouse. This can help reduce the size of the data and avoid sending unnecessary information.
  - Custom: Manually specify column mappings. Select Retrieve table columns to connect to a publicly accessible ClickHouse instance, authenticate using the provided credentials, and fetch the table schema. The retrieved column names and types will be automatically populated in the Column Mapping table, allowing you to focus on mapping the appropriate data values.
    - The Column Mapping table allows you to define how event data fields are mapped to ClickHouse table columns. This is essential for ensuring that data is correctly ingested into ClickHouse with the appropriate structure and types. See Data Types in ClickHouse for details.
      - Column name: The name of the column in the ClickHouse table. This should match the column name defined in the ClickHouse schema.
      - Column type: The data type of the column in the ClickHouse table (such as, String, Int32, DateTime). This ensures that the data is stored in the correct format.
      - Column value: A JavaScript expression that defines how to extract or transform the event data to fit into the specified column. This allows for flexible data manipulation and transformation before ingestion.
- Async: The Async inserts toggle enables asynchronous inserts, allowing ClickHouse to buffer data in memory before writing it to disk. This can improve performance but may risk data loss if the server crashes before the data is written.
  - Wait for async inserts waits for the ClickHouse server to confirm that the data has been fully written to disk before proceeding. This ensures a higher level of data integrity by reducing the risk of data loss if the server encounters an issue before full insertion. This setting balances throughput and data assurance, making it suitable for high-volume logging where occasional data loss is acceptable.
- Optional Settings:
  - Backpressure behavior: Whether to Block, Drop, or Persistent Queue events when all receivers are exerting backpressure.
  - Tags: Optionally, add tags that you can use to filter and group Destinations in Cribl Edge’s Manage Destinations page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
If you’re using TLS, select TLS Settings (Client Side) on the left tabs and toggle Use TLS to Yes. See the TLS Settings section for details. Defaults to No.
The Processing Settings, Retries, and Advanced Settings provide granular control over how data is handled and transmitted.
- Processing Settings allow you to define pre-processing and post-processing Pipelines, ensuring that events conform to the relevant Protobuf specifications for traces, metrics, and logs. This includes adding metadata, applying transformations, and conditioning the data before it is sent to the destination.
- Retries allow you to specify how the system should handle different HTTP response status codes and timeouts, ensuring reliable data delivery.
- Advanced Settings offer additional configuration options that enhance the flexibility and performance of data transmission.
Select Save, then Commit & Deploy.
Use the Test tab in the Destination’s configuration modal to validate the configuration and connectivity of your Destination.

TLS Settings (Client Side)

Server name (SNI): Server name for the SNI (Server Name Indication) TLS extension. This must be a host name, not an IP address.

Minimum TLS version: Optionally, select the minimum TLS version to use when connecting.

Maximum TLS version: Optionally, select the maximum TLS version to use when connecting.

Certificate name: The name of the predefined certificate.

CA certificate path: Path on client containing CA certificates (in PEM format) to use to verify the server’s cert. Path can reference $ENV_VARS.

Private key path (mutual auth): Path on client containing the private key (in PEM format) to use. Path can reference $ENV_VARS. Use only if mutual auth is required.

Certificate path (mutual auth): Path on client containing certificates in (PEM format) to use. Path can reference $ENV_VARS. Use only if mutual auth is required.

Passphrase: Passphrase to use to decrypt private key.

Persistent Queue Settings

This tab is displayed when the Backpressure behavior is set to Persistent Queue.

On Cribl-managed Cloud Workers (with certain plan/license tiers), this tab exposes only the destructive Clear Persistent Queue button (described below in this section). A maximum queue size of 1 GB disk space is automatically allocated per PQ-enabled Destination, per Worker Process. The 1 GB limit is on outbound uncompressed data, and no compression is applied to the queue.
This limit is not configurable. If the queue fills up, Cribl Edge will block outbound data. To configure the queue size, compression, queue-full fallback behavior, and other options below, use a hybrid Group.

Mode: Choose the condition for engaging persistent queues. Options include Smart and Always On.

Smart: Engages persistent queues only when the Destination detects backpressure. This mode helps optimize performance by using disk storage only when necessary.
Always On: Always write events into the persistent queue before forwarding them. This mode ensures maximum data durability but may require more disk space.

Max file size: The maximum data volume to store in each queue file before closing it. Enter a numeral with units of KB, MB, and so on. Defaults to 1 MB.

Max queue size: The maximum amount of disk space the queue is allowed to consume. Once this limit is reached, queueing is stopped and data blocking is applied. Enter a numeral with units of KB, MB, and so on.

Queue file path: The location for the persistent queue files. This will be of the form: your/path/here/<worker-id>/<output-id>. Defaults to: $CRIBL_HOME/state/queues.

Compression: Codec to use to compress the persisted data, once a file is closed. Defaults to None. Gzip is also available.

Queue-full behavior: Whether to block or drop events when the queue is exerting backpressure (because disk is low or at full capacity). Block is the same behavior as non-PQ blocking, corresponding to the Block option on the Backpressure behavior drop-down. Drop new data throws away incoming data, while leaving the contents of the PQ unchanged.

Clear Persistent Queue: Select this “panic” button if you want to delete the files that are currently queued for delivery to this Destination. A confirmation modal will appear - because this will free up disk space by permanently deleting the queued data, without delivering it to downstream receivers. (Appears only after Output ID has been defined.)

Strict ordering: The default Yes position enables FIFO (first in, first out) event forwarding. When receivers recover, Cribl Edge will send earlier queued events before forwarding newly arrived events. To instead prioritize new events before draining the queue, toggle this off. Doing so will expose this additional control:

Drain rate limit (EPS): Optionally, set a throttling rate (in events per second) on writing from the queue to receivers. (The default 0 value disables throttling.) Throttling the queue’s drain rate can boost the throughput of new/active connections, by reserving more resources for them. You can further optimize Workers’ startup connections and CPU load at Fleet Settings > Worker Processes.

Processing Settings

Post-Processing

Pipeline: Pipeline to process data before sending the data out using this output.

System fields: A list of fields to automatically add to events that use this output - both metric events, as dimensions; and log events, as labels. Supports wildcards.

By default, includes cribl_pipe (Cribl Edge Pipeline that processed the event).

Other options include:

cribl_host - Cribl Edge Node that processed the event.
cribl_input - Cribl Edge Source that processed the event.
cribl_output - Cribl Edge Destination that processed the event.
cribl_route - Cribl Edge Route (or QuickConnect) that processed the event.
cribl_wp - Cribl Edge Worker Process that processed the event.

Retries

Honor Retry-After header: Whether to honor a Retry-After header, provided that the header specifies a delay no longer than 180 seconds. Cribl Stream/Edge limits the delay to 180 seconds even if the Retry-After header specifies a longer delay. When enabled, any Retry-After header received takes precedence over all other options configured in the Retries section. When disabled, all Retry-After headers are ignored.

Settings for failed HTTP requests: When you want to automatically retry requests that receive particular HTTP response status codes, use these settings to list those response codes.

For any HTTP response status codes that are not explicitly configured for retries, Cribl Stream/Edge applies the following rules:

Status Code	Action
Any in the `1xx`, `3xx`, or `4xx` series	Drop the request
Any in the `5xx` series	Retry the request

Upon receiving a response code that’s on the list, Cribl Stream/Edge first waits for a set time interval called the Pre-backoff interval and then begins retrying the request. Time between retries increases based on an exponential backoff algorithm whose base is the Backoff multiplier, until the backoff multiplier reaches the Backoff limit (ms). At that point, Cribl Stream/Edge continues retrying the request without increasing the time between retries any further.

If the sender (which manages the connection to the Destination) is at capacity, it will not accept any incoming events. These incoming events originate internally from a previous stage of the data flow when Destinations send outbound requests to their respective external services, and they include retry requests and new requests. Any events that were already in transit when the sender reached capacity will continue to be processed downstream.

Sender capacity is freed up when an outgoing request succeeds or encounters a non-retryable error. When the sender has available capacity again, it will resume accepting incoming events. This capacity management is influenced by the number of active connections and configured limits, such as concurrency and buffer sizes. If a Pipeline sends events faster than the Destination can process, the buffers may fill up, leading to backpressure and Sender at capacity warnings. This backpressure prevents the sender from accepting additional requests until capacity is restored.

By default, this Destination has no response codes configured for automatic retries. For each response code you want to add to the list, select Add Setting and configure the following settings:

HTTP status code: A response code that indicates a failed request, for example 429 (Too Many Requests) or 503 (Service Unavailable).
Pre-backoff interval (ms): The amount of time to wait before beginning retries, in milliseconds. Defaults to 1000 (one second).
Backoff multiplier: The base for the exponential backoff algorithm. A value of 2 (the default) means that Cribl Stream/Edge will retry after 2 seconds, then 4 seconds, then 8 seconds, and so on.
Backoff limit (ms): The maximum backoff interval Cribl Stream/Edge should apply for its final retry, in milliseconds. Default (and minimum) is 10,000 (10 seconds); maximum is 180,000 (180 seconds, or 3 minutes).

Retry timed-out HTTP requests: When you want to automatically retry requests that have timed out, toggle this control on to display the following settings for configuring retry behavior:

Pre-backoff interval (ms): The amount of time to wait before beginning retries, in milliseconds. Defaults to 1000 (one second).
Backoff multiplier: The base for the exponential backoff algorithm. A value of 2 (the default) means that Cribl Stream/Edge will retry after 2 seconds, then 4 seconds, then 8 seconds, and so on.
Backoff limit (ms): The maximum backoff interval Cribl Stream/Edge should apply for its final retry, in milliseconds. Default (and minimum) is 10,000 (10 seconds); maximum is 180,000 (180 seconds, or 3 minutes).

Advanced Settings

Validate server certs: Reject certificates that are not authorized by a CA in the CA certificate path, nor by another trusted CA (for example, the system’s CA). Defaults to Yes.

Round-robin DNS: Toggle on to enable round-robin DNS lookup across multiple IP addresses, IPv4 and IPv6. When a DNS server resolves a Fully Qualified Domain Name (FQDN) to multiple IP addresses, Cribl Edge will sequentially use each address in the order they are returned by the DNS server for subsequent connection attempts.

Compress: Compresses the payload body before sending. Defaults to Yes (recommended).

Request timeout: Amount of time (in seconds) to wait for a request to complete before aborting it. Defaults to 30.

Request concurrency: Maximum number of concurrent requests per Worker Process. When Cribl Edge hits this limit, it begins throttling traffic to the downstream service. Defaults to 5. Minimum: 1. Maximum: 32.

Max body size (KB): Maximum size of the request body before compression. Defaults to 4096 KB. You can set this limit to as high as 500 MB (512000 KB). Be aware that high values can cause high memory usage per Worker Node, especially if a dynamically constructed URL (see Internal Fields) causes this Destination to send events to more than one URL. The actual request body size might exceed the specified value because the Destination adds bytes when it writes to the downstream receiver. Cribl recommends that you experiment with the Max body size value until downstream receivers reliably accept all events.

Max events per request: Maximum number of events to include in the request body. The 0 default allows unlimited events.

Flush period (sec): Maximum time between requests. Low values could cause the payload size to be smaller than its configured maximum. Defaults to 1.

Extra HTTP headers: Name-value pairs to pass to all events as additional HTTP headers. Values will be sent encrypted. You can also add headers dynamically on a per-event basis in the __headers field. See Internal Fields below.

Failed request logging mode: Use this drop-down to determine which data should be logged when a request fails. Select among None (the default), Payload, or Payload + Headers. With this last option, Cribl Edge will redact all headers, except non-sensitive headers that you declare below in Safe headers.

Safe headers: Add headers to declare them as safe to log in plaintext. (Sensitive headers such as authorization will always be redacted, even if listed here.) Use a tab or hard return to separate header names.

Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.

Troubleshoot

The Destination’s configuration modal has helpful tabs for troubleshooting:

Live Data: Try capturing live data to see real-time events as they flow through the Destination. On the Live Data tab, click Start Capture to begin viewing real-time data.

Logs: Review and search the logs that provide detailed information about the delivery process, including any errors or warnings that may have occurred.

Test: Ensures that the Destination is correctly set up and reachable. Verify that sample events are sent correctly by clicking Run Test.

You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify delivery issues. Analyze the graphs showing events and bytes in/out over time.

ClickHouse Destination

Configure a ClickHouse Destination​

TLS Settings (Client Side)​

Persistent Queue Settings​

Processing Settings​

Post-Processing​

Retries​

Advanced Settings​

Troubleshoot​

Common Resources