On This Page

Home / Stream/ Integrations/ Destinations/Scality Destination

Scality Destination

Scality is an on-premises S3-compatible object storage platform. Its RING and ARTESCA products expose an S3-compatible API that lets applications store and retrieve objects in buckets. See the Scality CloudServer documentation for protocol details. The Scality Destination sends data to a Scality bucket using that S3-compatible API.

Type: Non-Streaming | TLS Support: Yes | PQ Support: No

Configure Cribl Stream to Output to Scality

In Cribl Stream, configure the Scality Destination:

  1. On the top bar, select Products, and then select Cribl Stream. Under Worker Groups, select a Worker Group. Next, you have two options:
    • To configure via QuickConnect, navigate to Routing > QuickConnect. Select Add Destination and select the Destination you want from the list, choosing either Select Existing or Add New.
    • To configure via the Routes, select Data > Destinations. Select the Destination you want. Next, select Add Destination.
  2. In the New Destination modal, configure the following under General Settings:
    • Output ID: Enter a unique name to identify this Scality definition. If you clone this Destination, Cribl Stream will add -CLONE to the original Output ID.
    • Description: Optionally, enter a description.
    • Scality endpoint: URL of your Scality RING or ARTESCA S3-compatible endpoint (for example, https://s3.scality.example.com). Both HTTP and HTTPS are accepted.
    • Scality bucket name: JavaScript expression that resolves to the Destination Scality bucket name. The expression must evaluate to a string and is evaluated only once, when the Destination initializes. For a fixed bucket name, enter a quoted string such as 'my-scality-bucket'. For a name derived from a global variable, use a JavaScript string expression such as myBucket-${C.vars.myVar}.

      Event-level variables are not available for this field, because the bucket name is evaluated only at Destination initialization. If you want to use event-level variables in file paths, we recommend specifying them in the Partitioning expression field in Optional Settings, because the Partitioning expression is evaluated for each file. For general expression syntax and examples, see Cribl Expression Syntax.

    • Staging location: Filesystem location in which to locally buffer files before compressing and moving to final destination. We recommend that this location be stable and high-performance.

      The Staging location field is not displayed or available on Cribl-managed Worker Groups in Cribl.Cloud.

    • Data format: The output data format defaults to JSON. Raw and Parquet are also available. Selecting Parquet exposes a Parquet Settings left tab, where you must configure certain options in order to export data in Parquet format.
  3. Next, you can configure the following Optional Settings:
    • Key prefix: Root directory to prepend to path before uploading. Enter either a constant, or a JS expression (enclosed in single quotes, double quotes, or backticks) that will be evaluated only at init time.

    • Partitioning expression: JavaScript expression that defines how files are partitioned and organized. Default is date-based. If blank, Cribl Stream will fall back to the event’s __partition field value (if present); or otherwise to the root directory of the Output Location and Staging Location.

    • Compression: Data compression format used before moving to the final destination. Defaults to gzip (recommended). This setting is not available when Data format is set to Parquet.

    • File name prefix expression: The output filename prefix. Must be a JavaScript expression (which can evaluate to a constant), enclosed in single quotes, double quotes, or backticks. Defaults to CriblOut.

    • File name suffix expression: The output filename suffix. Must be a JavaScript expression (which can evaluate to a constant), enclosed in single quotes, double quotes, or backticks. Defaults to `.${C.env["CRIBL_WORKER_ID"]}.${__format}${__compression === "gzip" ? ".gz" : ""}`, where __format can be json or raw, and __compression can be none or gzip.

      To prevent files from being overwritten, Cribl appends a random sequence of 6 characters to the end of their names. This also applies to prefix and suffix expressions in file names.

      For example, if you set the File name prefix expression to CriblExec and set the File name suffix expression to .csv, the file name might display as CriblExec-adPRWM.csv with adPRWM appended.

    • Backpressure behavior: Select whether to block or drop events when all receivers are exerting backpressure. (Causes might include an accumulation of too many files needing to be closed.) Defaults to Block. See more information about Destination backpressure behavior.

    • Tags: Optionally, add tags that you can use to filter and group Destinations on the Destinations page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.

  4. Optionally, configure any Authentication, Processing, Parquet, and Advanced settings outlined in the sections below.
  5. Select Save, then Commit & Deploy.

Authentication

This Destination uses access key and secret key credentials only. Use the Authentication method drop-down to select Manual or Secret.

The Manual option exposes these corresponding additional fields:

  • Access key: Enter the access key.
  • Secret key: Enter the secret key.

The values for Access key and Secret key can be a constant or a JavaScript expression (such as ${C.env.MY_VAR}).

Enclose values that contain special characters (for example, /) or environment variables in single quotes or backticks.

The Secret option allows you to supply a stored secret that references an access key and secret key. It exposes this additional field:

  • Secret key pair: Use the drop-down to select an API key/secret key pair that you’ve configured in Cribl Stream’s secrets manager. A Create link is available to store a new, reusable secret.

Processing Settings

Post-Processing

Pipeline: Pipeline or Pack to process data before sending the data out using this output.

System fields: A list of fields to automatically add to events that use this output. By default, includes cribl_pipe (identifying the Cribl Stream Pipeline that processed the event). Supports c* wildcards. Other options include:

  • cribl_host - Cribl Stream Node that processed the event.
  • cribl_input - Cribl Stream Source that processed the event.
  • cribl_output - Cribl Stream Destination that processed the event.
  • cribl_pipe - Cribl Stream Pipeline that processed the event.
  • cribl_wp - Cribl Stream Worker Process that processed the event.
  • cribl_route - Cribl Stream Route (or QuickConnect) that processed the event.
  • cribl_group - Cribl Stream Worker Group of the node that processed the event.
  • cribl_mode - Cribl Stream mode of the node that processed the event.

Parquet Settings

To write out Parquet files, note that:

  • You can use the Cribl Stream CLI’s parquet command to view a Parquet file, its metadata, or its schema.
  • See Parquet Schemas for pointers on how to avoid problems such as data mismatches.
  • You can use the S3 Collector to ingest Parquet data from a Scality bucket. This ensures that any data you export in Parquet format can be replayed and re-processed by Cribl at a later time.

Automatic schema: Toggle on to automatically generate a Parquet schema based on the events of each Parquet file that Cribl Stream writes. When toggled off (the default), exposes the following additional fields:

  • Parquet schema: Select a schema from the drop-down.

If you need to modify a schema or add a new one, follow the instructions in our Parquet Schemas topic. These steps will propagate the freshest schema back to this drop-down.

  • Parquet version: Determines which data types are supported, and how they are represented. Defaults to 2.6; 2.4 and 1.0 are also available.
  • Data page version: Serialization format for data pages. Defaults to V2. If your toolchain includes a Parquet reader that does not support V2, use V1.
  • Group row limit: The number of rows that every group will contain. The final group can contain a smaller number of rows. Defaults to 10000.
  • Page size: Set the target memory size for page segments. Generally, set lower values to improve reading speed, or set higher values to improve compression. Value must be a positive integer smaller than the Row group size value, with appropriate units. Defaults to 1 MB.
  • Log invalid rows: Toggle on to output up to 20 unique rows that were skipped due to data format mismatch. Log level must be set to debug for output to be visible.
  • Write statistics: Leave toggled on (the default) if you have Parquet tools configured to view statistics - these profile an entire file in terms of minimum/maximum values within data, numbers of nulls, etc.
  • Write page indexes: Leave toggled on (the default) if your Parquet reader uses statistics from Page Indexes to enable page skipping. One Page Index contains statistics for one data page.
  • Write page checksum: Toggle on if you have configured Parquet tools to verify data integrity using the checksums of Parquet pages.
  • Metadata (optional): The metadata of files the Destination writes will include the properties you add here as key-value pairs. For example, one way to tag events as belonging to the OCSF category for security findings would be to set Key to OCSF Event Class and Value to 2001.

Advanced Settings

File size limit (MB): Maximum uncompressed output file size. Files of this size will be closed and moved to final output location. Defaults to 32.

Open file limit: Maximum number of files to keep open concurrently. When this limit is exceeded, on any individual Worker Process, Cribl Stream will close the oldest open files, and move them to the final output location. Defaults to 100.

File open time limit (sec): Maximum amount of time to write to a file. Files open for longer than this limit will be closed and moved to final output location. Defaults to 300.

Compression level: Compression level to apply before moving files to the final destination. Defaults to Best Speed. This setting is not available when Data format is set to Parquet.

Staging file limit: Maximum number of files that the Destination will allow to wait for upload before it applies backpressure. Defaults to 100; minimum is 10; maximum is 4200.

Idle time limit (sec): Maximum amount of time to keep inactive files open. Files open for longer than this limit will be closed and moved to final output location. Defaults to 30.

Cribl Stream will close files when any of the limits are met.

Disk space protection: Specifies whether to Block (default) or Drop incoming events when the disk space falls below the globally defined Min free disk space amount.

Concurrent file parts upload limit: Maximum number of parts to upload in parallel per file. A value of 1 tells the Destination to send the whole file at once. When set to 2 or above, the Destination will use multipart uploads. Defaults to 4; highest allowed value is 10.

Add output ID: When toggled on (default), adds the Output ID field’s value to the staging location’s file path. This ensures that each Destination’s logs will write to its own bucket.

Remove empty staging directories: When toggled on (the default), Cribl Stream deletes empty staging directories after moving files. This prevents the proliferation of orphaned empty directories. When enabled, exposes this additional option:

  • Staging cleanup period: How often (in seconds) to delete empty directories when Remove empty staging directories is enabled. Defaults to 300 seconds (every 5 minutes). Minimum configurable interval is 10 seconds; maximum is 86400 seconds (every 24 hours).
  • Directory batch size: Specifies how many directories are scanned per batch during automatic cleanup of empty staging directories. Set between 10 and 10000, defaults to 1000. Higher values speed up cleanup but increase memory usage.

Force close on shutdown: Toggle on to force all staged files to close during an orderly shutdown. This triggers immediate upload or finalization of any in-progress files, regardless of Idle time, File open time, or File size limits. Use this option to minimize data loss risk in ephemeral or auto-scaling environments. This could result in a higher number of small files, as files are closed before reaching their configured thresholds. Ensure your Worker Group Drain timeout is set high enough to allow all staged data to be finalized before shutdown completes.

Enable dead-lettering: Toggle on to set a maximum number of retries, and to move files to a designated directory when write failures exceed that limit. This prevents data flow blockage and excessive error logging due to undeliverable files. When enabled, exposes two additional fields:

  • Dead-letter location: Specify the storage location for undeliverable files. Defaults to $CRIBL_HOME/state/outputs/dead-letter.
  • Retry limit: Configure the retry limit for failed file deliveries. This setting defines how many times the system will attempt to move a file to its intended location before it is deemed undeliverable and placed in the dead-letter directory. Defaults to 20.

Enable retry backoff: Toggle on to apply exponential backoff with jitter when file uploads fail repeatedly.

Region: The S3 region name configured for your Scality RING or ARTESCA instance. Most on-premises Scality deployments do not require a region. Leave blank unless your administrator has configured one.

Signature version: Signature version to use for signing Scality requests. Defaults to v4. Only change this if your Scality endpoint explicitly requires a different version.

Reuse connections: When toggled on (default), the Destination keeps open HTTP connections alive and reuses them across requests, reducing connection overhead.

Reject unauthorized certificates: When toggled on (default), Cribl Stream rejects certificates that cannot be verified against a trusted Certificate Authority, enforcing strict TLS validation. Toggle off only when connecting to a Scality endpoint that uses a self-signed or internal CA certificate.

Verify if bucket exists: Toggle off if your access key has permission to write objects but not to list the bucket (it has s3:PutObject but lacks s3:ListBucket).

Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.

S3 API Permissions

Scality enforces S3-compatible access for object operations. To allow Cribl Stream to write data, the access key you use must have permissions that include the following actions where applicable:

  • s3:ListBucket - Needed unless Verify if bucket exists is toggled off.
  • s3:PutObject
  • s3:AbortMultipartUpload and s3:ListMultipartUploadParts - Needed when Concurrent file parts upload limit is set to 2 or above (the default is 4).

See the Scality documentation for additional information on configuring S3 credentials and access permissions.

Internal Fields

Cribl Stream uses a set of internal fields to assist in forwarding data to a Destination.

Field for this Destination:

  • __partition

Proxying Requests

If you need to proxy HTTP/S requests, see System Proxy Configuration.

Troubleshooting

The Destination’s configuration modal has helpful tabs for troubleshooting:

Live Data: Try capturing live data to see real-time events as they flow through the Destination. On the Live Data tab, click Start Capture to begin viewing real-time data.

Logs: Review and search the logs that provide detailed information about the delivery process, including any errors or warnings that may have occurred.

Test: Ensures that the Destination is correctly set up and reachable. Verify that sample events are sent correctly by clicking Run Test.

You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify delivery issues. Analyze the graphs showing events and bytes in/out over time.

Common Issues

Bucket does not exist - self-signed certificate

Example Error Text:

message: Bucket does not exist - self signed certificate
stack: Error: self signed certificate
    at TLSSocket.onConnectSecure (node:_tls_wrap_1535:34)
    at TLSSocket.emit (node:events:513:28)
    at TLSSocket.emit (node:events:489:12)
    at TLSSocket._fini... Show more
tmpPath: /opt/cribl/scality_bucket/cribl/2023/03/28/CriblOut-XTNGh5.0.json.tmp

Can occur when:

  • The Destination uses a self-signed cert that has not yet been trusted by Cribl Stream.
  • The user does not have appropriate permissions to view the bucket.
  • The authentication token generated by your computer has a timestamp that is out of sync with the server’s time, resulting in an authentication failure.
Recommendation

Disable Advanced Settings > Reject unauthorized certificates, or get the sender’s cert and add it to the NODE_EXTRA_CA_CERTS path for validation; or verify that the user has appropriate permissions.

Misconfigured or Nonexistent Buckets

If the bucket that the Destination is trying to send files to does not exist or is misconfigured, the Destination logs will reflect the following pattern:

  1. Upon initialization, the Destination will log an error that says the bucket cannot be found. This error will not be logged if Advanced Settings > Verify if bucket exists is toggled off.

  2. Every time the Destination “rolls” a new file out of staging to try to upload it to the destination, the Destination will log an error, until the number of files rolled exceeds the value of Advanced Settings > Staging file limit.

Once the Destination has begun to apply backpressure:

  • The Destination will log a backpressure warning.
  • Events will stop flowing through the Destination.
  • The Destination will stop creating staging files.
  • Although the Destination will periodically attempt to upload the staged files, it will not log any additional failures.