/ / / / ·

Amazon S3 Compatible Stores Destination

S3 is a non-streaming Destination type. Cribl Edge does not have to run on AWS in order to deliver data to S3.

Type: Non-Streaming | TLS Support: Yes | PQ Support: No

Stores that are S3-compatible will also work with this Destination type. For example, the S3 Destination can be adapted to send data to services like Databricks and Snowflake, for which Cribl Edge currently has no preconfigured Destination.

For guidance in configuring these adaptations, contact Cribl Support. For general guidance, see our Amazon S3 Better Practices guide. To replay data you’ve previously exported to S3-compatible stores, see our Using S3 Storage and Replay guide and our interactive Data Collection & Replay sandbox.

Configure Cribl Edge to Output to S3 Destinations

In Cribl Edge, configure Amazon S3:

On the top bar, select Products, and then select Cribl Edge. Under Fleets, select a Fleet. Next, you have two options:
- To configure via QuickConnect, navigate to Routing > QuickConnect (Stream) or Collect (Edge). Select Add Destination and select the Destination you want from the list, choosing either Select Existing or Add New.
- To configure via the Routes, select Data > Destinations or More > Destinations (Edge). Select the Destination you want. Next, select Add Destination.
In the New Destination modal, configure the following under General Settings:
- Output ID: Enter a unique name to identify this S3 definition. If you clone this Destination, Cribl Edge will add -CLONE to the original Output ID.
- Description: Optionally, enter a description.
- S3 bucket name: Name of the destination S3 Bucket. This value can be a constant, or a JavaScript expression that will be evaluated only at init time. For example, referencing a Global Variable: myBucket-${C.vars.myVar}.
  Event-level variables are not available for JavaScript expressions. This is because the bucket name is evaluated only at Destination initialization. If you want to use event-level variables in file paths, Cribl recommends specifying them in the Partitioning Expression field (described below), because this is evaluated for each file.
- Staging location: Filesystem location in which to locally buffer files before compressing and moving to final destination. Cribl recommends that this location be stable and high-performance.
  The Staging location field is not displayed or available on Cribl.Cloud-managed Edge Nodes.
- Key prefix: Root directory to prepend to path before uploading. Enter either a constant, or a JS expression (enclosed in single quotes, double quotes, or backticks) that will be evaluated only at init time.
- Data format: The output data format defaults to JSON. Raw and Parquet are also available. Selecting Parquet (supported only on Linux, not Windows) exposes a Parquet Settings left tab, where you must configure certain options in order to export data in Parquet format.
Next, you can configure the following Optional Settings:
- Region: Region where the S3 bucket is located.
  You must grant your IAM role or user the GetBucketLocation permission to the S3 bucket when no Region is selected. See Amazon S3 Permissions for an example.
- Partitioning expression: JavaScript expression that defines how files are partitioned and organized. Default is date-based. If blank, Cribl Edge will fall back to the event’s __partition field value (if present); or otherwise to the root directory of the Output Location and Staging Location.
- Compression: Data compression format used before moving to final destination. Defaults to gzip (recommended). This setting is not available when Data format is set to Parquet.
- File name prefix expression: The output filename prefix. Must be a JavaScript expression (which can evaluate to a constant), enclosed in quotes or backticks. Defaults to CriblOut.
- File name suffix expression: The output filename suffix. Must be a JavaScript expression (which can evaluate to a constant), enclosed in quotes or backticks. Defaults to `.${C.env["CRIBL_WORKER_ID"]}.${__format}${__compression === "gzip" ? ".gz" : ""}`, where __format can be json or raw, and __compression can be none or gzip.
  To prevent files from being overwritten, Cribl appends a random sequence of 6 characters to the end of their names. This also applies to prefix and suffix expressions in file names.
  For example, if you set the File name prefix expression to CriblExec and set the File name suffix expression to .csv, the file name might display as CriblExec-adPRWM.csv with adPRWM appended.
- Backpressure behavior: Select whether to block or drop events when all receivers are exerting backpressure. (Causes might include an accumulation of too many files needing to be closed.) Defaults to Block.
- Tags: Optionally, add tags that you can use to filter and group Destinations on the Destinations page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
Optionally, configure any Authentication, Assume Role, Processing, Parquet, and Advanced settings outlined in the sections below.
Select Save, then Commit & Deploy.

Authentication

Use the Authentication method drop-down to select one of these options:

Auto: This default option uses the AWS SDK for JavaScript to automatically obtain credentials in the following order of attempts:

IAM Roles for Amazon EC2: Loaded from AWS Identity and Access Management (IAM) roles attached to an EC2 instance.
Shared Credentials File: Loaded from the shared credentials file (~/.aws/credentials).
Environment Variables: Loaded from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
JSON File on Disk: Loaded from a JSON file on disk.
Other Credential-Provider Classes: Other credential-provider classes provided by the AWS SDK for JavaScript.

The Auto method works both when running on AWS and in other environments where the necessary credentials are available through one of the above methods.

SSO Providers

When using the auto authentication method, you can leverage SSO providers like SAML and Okta to issue temporary credentials. These credentials should be set in the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The AWS SDK will then use these environment variables to authenticate.

Manual: If not running on AWS, you can select this option to enter a static set of user-associated IAM credentials (your access key and secret key) directly or by reference. This is useful for Workers not in an AWS VPC, for example, those running a private cloud. The Manual option exposes these corresponding additional fields:

Access key: Enter your AWS access key. If not present, will fall back to the env.AWS_ACCESS_KEY_ID environment variable, or to the metadata endpoint for IAM role credentials.
Secret key: Enter your AWS secret key. If not present, will fall back to the env.AWS_SECRET_ACCESS_KEY environment variable, or to the metadata endpoint for IAM credentials.

The values for Access key and Secret key can be a constant or a JavaScript expression (such as ${C.env.MY_VAR}).

Enclose values that contain special characters (like /) or environment variables in single quotes or backticks.

Secret: If not running on AWS, you can select this option to supply a stored secret that references an AWS access key and secret key. The Secret option exposes this additional field:

Secret key pair: Use the drop-down to select an API key/secret key pair that you’ve configured in Cribl Edge’s secrets manager. A Create link is available to store a new, reusable secret.

Assume Role

When using Assume Role to access resources in a different Region than Cribl Stream, you can target the AWS Security Token Service (STS) endpoint specific to that Region by using the CRIBL_AWS_STS_REGION environment variable on your Edge Node. Setting an invalid Region results in a fallback to the global STS endpoint.

Enable for S3: Toggle on to use Assume Role credentials to access S3.

AssumeRole ARN: Enter the Amazon Resource Name (ARN) of the role to assume.

External ID: Enter the External ID to use when assuming role. This is required only when assuming a role that requires this ID in order to delegate third-party access. For details, see AWS’ documentation.

Duration (seconds): Duration of the Assumed Role’s session, in seconds. Minimum is 900 (15 minutes). Maximum is 43200 (12 hours). Defaults to 3600 (1 hour).

Processing Settings

Post-Processing

Pipeline: Pipeline or Pack to process data before sending the data out using this output.

System fields: A list of fields to automatically add to events that use this output. By default, includes cribl_pipe (identifying the Cribl Edge Pipeline that processed the event). Supports c* wildcards. Other options include:

cribl_host - Cribl Edge Node that processed the event.
cribl_input - Cribl Edge Source that processed the event.
cribl_output - Cribl Edge Destination that processed the event.
cribl_route - Cribl Edge Route (or QuickConnect) that processed the event.
cribl_wp - Cribl Edge Worker Process that processed the event.

Parquet Settings

To write out Parquet files, note that:

On Linux, you can use the Cribl Edge CLI’s parquet command to view a Parquet file, its metadata, or its schema.
Cribl Edge Workers support Parquet only when running on Linux, not on Windows.
See Working with Parquet for pointers on how to avoid problems such as data mismatches.
In Cribl Edge 4.3 and newer, the S3 Collector supports ingesting data in the Parquet format. Therefore, data that you export in Parquet format can be replayed via the Collector.

Automatic schema: Toggle on to automatically generate a Parquet schema based on the events of each Parquet file that Cribl Edge writes. When toggled off (the default), exposes the following additional field:

Parquet schema: Select a schema from the drop-down.

If you need to modify a schema or add a new one, follow the instructions in our Parquet Schemas topic. These steps will propagate the freshest schema back to this drop-down.

Parquet version: Determines which data types are supported, and how they are represented. Defaults to 2.6; 2.4 and 1.0 are also available.

Data page version: Serialization format for data pages. Defaults to V2. If your toolchain includes a Parquet reader that does not support V2, use V1.

Group row limit: The number of rows that every group will contain. The final group can contain a smaller number of rows. Defaults to 10000.

Page size: Set the target memory size for page segments. Generally, set lower values to improve reading speed, or set higher values to improve compression. Value must be a positive integer smaller than the Row group size value, with appropriate units. Defaults to 1 MB.

Log invalid rows: Toggle on to output up to 20 unique rows that were skipped due to data format mismatch. Log level must be set to debug for output to be visible.

Write statistics: Leave toggled on (the default) if you have Parquet tools configured to view statistics - these profile an entire file in terms of minimum/maximum values within data, numbers of nulls, etc.

Write page indexes: Leave toggled on (the default) if your Parquet reader uses statistics from Page Indexes to enable page skipping. One Page Index contains statistics for one data page.

Write page checksum: Toggle on if you have configured Parquet tools to verify data integrity using the checksums of Parquet pages.

Metadata (optional): The metadata of files the Destination writes will include the properties you add here as key-value pairs. For example, one way to tag events as belonging to the OCSF category for security findings would be to set Key to OCSF Event Class and Value to 2001.

Advanced Settings

File size limit (MB): Maximum uncompressed output file size. Files of this size will be closed and moved to final output location. Defaults to 32.

File open time limit (sec): Maximum amount of time to write to a file. Files open for longer than this limit will be closed and moved to final output location. Defaults to 300.

Idle time limit (sec): Maximum amount of time to keep inactive files open. Files open for longer than this limit will be closed and moved to final output location. Defaults to 30.

Open file limit: Maximum number of files to keep open concurrently. When this limit is exceeded, on any individual Worker Process, Cribl Edge will close the oldest open files, and move them to the final output location. Defaults to 100.

Cribl Edge will close files when any of the four above conditions is met.

Compression level: Specifies the level of compression to apply before moving files to the final destination. Higher compression levels reduce file size but increase CPU usage. Choose from Best Speed, Normal, or Best Compression. The default level, Best Speed, will prioritize faster compression over smaller file sizes.

Staging file limit: Maximum number of files that the Destination will allow to wait for upload before it applies backpressure. Defaults to 100; minimum is 10; maximum is 4200. For context, see Troubleshooting below.

Concurrent file parts limit: Maximum number of parts to upload in parallel per file. A value of 1 tells the Destination to send the whole file at once. When set to 2 or above, IAM permissions must include those required for multipart uploads. Defaults to 4; highest allowed value is 10.

Disk space protection: Specifies whether to Block (default) or Drop incoming events when the disk space falls below the globally defined Min free disk space amount.

Add Output ID: When toggled on (default), adds the Output ID field’s value to the staging location’s file path. This ensures that each Destination’s logs will write to its own bucket.

For a Destination originally configured in a Cribl Edge version below 2.4.0, the Add Output ID behavior will be switched off on the backend, regardless of this toggle’s state. This is to avoid losing any files pending in the original staging directory, upon Cribl Edge upgrade and restart. To enable this option for such Destinations, Cribl’s recommended migration path is:
Clone the Destination.
Redirect the Routes referencing the original Destination to instead reference the new, cloned Destination.
This way, the original Destination will process pending files (after an idle timeout), and the new, cloned Destination will process newly arriving events with Add output ID enabled.

Remove empty staging directories: When toggled on (the default), Cribl Edge deletes empty staging directories after moving files. This prevents the proliferation of orphaned empty directories. When enabled, exposes this additional option: Remove empty staging: Toggle this on to delete empty staging directories after moving files. This prevents the proliferation of orphaned empty directories. When enabled, exposes this additional option:

Staging cleanup period: How often (in seconds) to delete empty directories when Remove staging dirs is enabled. Defaults to 300 seconds (every 5 minutes). Minimum configurable interval is 10 seconds; maximum is 86400 seconds (every 24 hours).

Enable dead-lettering: Toggle this on to set a maximum number of retries, and to move files to a designated directory when write failures exceed that limit. This prevents data flow blockage and excessive error logging due to undeliverable files. When enabled, exposes two additional fields:

Dead-letter location: Specify the storage location for undeliverable files. Defaults to $CRIBL_HOME/state/outputs/dead-letter.
Maximum retry limit: Configure the retry limit for failed file deliveries. This setting defines how many times the system will attempt to move a file to its intended location before it is deemed undeliverable and placed in the dead-letter directory. Defaults to 20.

Endpoint: S3 service endpoint. If empty, Cribl Edge will automatically construct the endpoint from the AWS Region. To access the AWS S3 endpoints, use the path-style URL format. You don’t need to specify the bucket name in the URL, because Cribl Edge will automatically add it to the URL path. For details, see AWS’ Path-Style Requests topic.

Object ACL: Object ACL (Access Control List) to assign to uploaded objects.

Storage class: Select a storage class for uploaded objects. Defaults to Standard. The other options are: Reduced Redundancy Storage; Standard, Infrequent Access; One Zone, Infrequent Access; Intelligent Tiering; Glacier Flexible Retrieval; Glacier Instant Retrieval; or Glacier Deep Archive.

Server-side encryption: Encryption type for uploaded objects - used to enable encryption on data at rest. Defaults to no encryption; the other options are Amazon S3 Managed Key or AWS KMS Managed Key.

AWS S3 always encrypts data in transit using HTTPS, with default one-way authentication from server to clients. With other S3-compatible stores (such as our native MinIO Destination), use an https:// URL to invoke in-transit encryption. Two-way authentication is not required to get encryption, and requires clients to possess a certificate.

Signature version: Signature version to use for signing S3 requests. Defaults to v4.

Reuse connections: Whether to reuse connections between requests. Toggling on (default) can improve performance.

Reject unauthorized certificates: Whether to accept certificates that cannot be verified against a valid Certificate Authority (for example, self-signed certificates). Defaults to toggled on.

Verify if bucket exists: Toggle off if you can access files within the bucket, but not the bucket itself. This resolves errors of the form: error initializing...Bucket does not exist. For context, see Troubleshooting below.

Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.

IAM Permissions

The following permissions are needed to write to an Amazon S3 bucket:

s3:GetBucketLocation - Needed if Region is not configured.
s3:ListBucket - Needed unless Verify if bucket exists is toggled off.
s3:PutObject

If your Destination needs to do multipart uploads to S3, two more permissions are needed:

kms:GenerateDataKey
kms:Decrypt

See the AWS documentation.

Temporary Access via SSO Provider

You can use Okta or SAML to provide access to S3 buckets using temporary security credentials.

Internal Fields

Cribl Edge uses a set of internal fields to assist in forwarding data to a Destination.

Field for this Destination:

__partition

Proxying Requests

If you need to proxy HTTP/S requests, see System Proxy Configuration.

Troubleshooting

The Destination’s configuration modal has helpful tabs for troubleshooting:

Live Data: Try capturing live data to see real-time events as they flow through the Destination. On the Live Data tab, click Start Capture to begin viewing real-time data.

Logs: Review and search the logs that provide detailed information about the delivery process, including any errors or warnings that may have occurred.

Test: Ensures that the Destination is correctly set up and reachable. Verify that sample events are sent correctly by clicking Run Test.

You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify delivery issues. Analyze the graphs showing events and bytes in/out over time.

Cribl University offers an Advanced Troubleshooting > Destination Integrations: S3 short course. To follow the direct course link, first log into your Cribl University account. (To create an account, select the Sign up link. You’ll need to click through a short Terms & Conditions presentation, with chill music, before proceeding to courses - but Cribl’s training is always free of charge.) Once logged in, check out other useful Advanced Troubleshooting short courses and Troubleshooting Criblets.

Common Issues

“Bucket does not exist - self signed certificate”

Example Error Text:

message: Bucket does not exist - self signed certificate
stack: Error: self signed certificate
    at TLSSocket.onConnectSecure (node:_tls_wrap_1535:34)
    at TLSSocket.emit (node:events:513:28)
    at TLSSocket.emit (node:events:489:12)
    at TLSSocket._fini... Show more
tmpPath: /opt/cribl/s3_bucket/cribl/2023/03/28/CriblOut-XTNGh5.0.json.tmp

Can occur when:

the Destination uses a self-signed cert that has not yet been trusted by Cribl Stream.
the user does not have appropriate permissions to view the bucket.
the authentication token generated by your computer has a timestamp that is out of sync with the server’s time, resulting in an authentication failure.

Recommendation

Disable Advanced Settings > Reject unauthorized certificates or; get the sender’s cert and add it to the NODE_EXTRA_CA_CERTS path for validation; or verify that the user has appropriate permissions.

Misconfigured or Nonexistent Buckets

If the S3 bucket that the Destination is trying to send files to does not exist or is misconfigured, the Destination logs will reflect the following pattern:

Upon initialization, the Destination will log an error that says the bucket cannot be found. This error will not be logged if Advanced Settings > Verify if bucket exists is turned off.
Every time the Destination “rolls” a new file out of staging to try to upload it to S3, the Destination will log an error, until the number of files rolled exceeds the value of Advanced Settings > Max files to stage.

Once the Destination has begun to apply backpressure:

The Destination will log a backpressure warning.
Events will stop flowing through the Destination.
The Destination will stop creating staging files.
Although the Destination will periodically attempt to upload the staged files, it will not log any additional failures.

Amazon S3 Compatible Stores Destination

Configure Cribl Edge to Output to S3 Destinations​

Authentication​

SSO Providers​

Assume Role​

Processing Settings​

Post-Processing​

Parquet Settings​

Advanced Settings​

IAM Permissions​

Temporary Access via SSO Provider​

Internal Fields​

Proxying Requests​

Troubleshooting​

Common Issues​

“Bucket does not exist - self signed certificate”​

Recommendation​

Misconfigured or Nonexistent Buckets​

Common Resources