Dell PowerScale OneFS Destination
Dell PowerScale OneFS provides scale-out file and object storage. Its S3-compatible API lets applications store and retrieve objects in buckets on the cluster. See Dell’s PowerScale OneFS S3 API Guide for protocol details. The Dell PowerScale OneFS Destination sends data to a OneFS bucket using that S3-compatible API.
Type: Non-Streaming | TLS Support: Yes | PQ Support: No
Configure Cribl Stream to Output to Dell PowerScale OneFS
In Cribl Stream, configure the Dell PowerScale OneFS Destination:
- On the top bar, select Products, and then select Cribl Stream. Under Worker Groups, select a Worker Group. Next, you have two options:
- To configure via QuickConnect, navigate to Routing > QuickConnect. Select Add Destination and select the Destination you want from the list, choosing either Select Existing or Add New.
- To configure via the Routes, select Data > Destinations. Select the Destination you want. Next, select Add Destination.
- In the New Destination modal, configure the following under General Settings:
- Output ID: Enter a unique name to identify this Dell PowerScale OneFS definition. If you clone this Destination, Cribl Stream will add
-CLONEto the original Output ID. - Description: Optionally, enter a description.
- Dell PowerScale OneFS endpoint: Base S3-compatible endpoint URL for the PowerScale cluster, without the bucket name. Cribl Stream builds the full request path (including bucket and object key) from this endpoint and your other settings. For example,
https://powerscale.example.com:9021. Dell PowerScale deployments often expose S3 on non-standard ports such as9021or8080, so be sure to include the correct port for your environment. - Dell PowerScale OneFS bucket name: JavaScript expression that resolves to the Destination Dell PowerScale OneFS bucket name. The expression must evaluate to a string and is evaluated only once, when the Destination initializes. For a fixed bucket name, enter a quoted string such as
'my-dell-bucket'. For a name derived from a global variable, use a JavaScript string expression such asmyBucket-${C.vars.myVar}.Event-level variables are not available for this field, because the bucket name is evaluated only at Destination initialization. If you want to use event-level variables in file paths, we recommend specifying them in the Partitioning expression field (described in the next step), because this is evaluated for each file. For general expression syntax and examples, see Cribl Expression Syntax.
- Staging location: Filesystem location in which to locally buffer files before compressing and moving to the final destination. We recommend that this location be stable and high-performance.
The Staging location field is not displayed or available on Cribl.Cloud-managed Worker Nodes.
- Data format: The output data format defaults to
JSON.RawandParquetare also available. SelectingParquetexposes a Parquet Settings left tab, where you must configure certain options in order to export data in Parquet format.
- Output ID: Enter a unique name to identify this Dell PowerScale OneFS definition. If you clone this Destination, Cribl Stream will add
- Next, you can configure the following Optional Settings:
Key prefix: Root directory to prepend to path before uploading. Enter either a constant, or a JS expression (enclosed in single quotes, double quotes, or backticks) that will be evaluated only at init time.
Partitioning expression: JavaScript expression that defines how files are partitioned and organized. Default is date-based. If blank, Cribl Stream will fall back to the event’s
__partitionfield value (if present); or otherwise to the root directory of the Output Location and Staging Location.Compression: Data compression format used before moving to final destination. Defaults to
gzip(recommended). This setting is not available when Data format is set toParquet.File name prefix expression: The output filename prefix. Must be a JavaScript expression (which can evaluate to a constant), enclosed in quotes or backticks. Defaults to
CriblOut.File name suffix expression: The output filename suffix. Must be a JavaScript expression (which can evaluate to a constant), enclosed in quotes or backticks. Defaults to
`.${C.env["CRIBL_WORKER_ID"]}.${__format}${__compression === "gzip" ? ".gz" : ""}`, where__formatcan bejsonorraw, and__compressioncan benoneorgzip.To prevent files from being overwritten, Cribl appends a random sequence of 6 characters to the end of their names. This also applies to prefix and suffix expressions in file names.
For example, if you set the File name prefix expression to
CriblExecand set the File name suffix expression to.csv, the file name might display asCriblExec-adPRWM.csvwithadPRWMappended.Backpressure behavior: Select whether to block or drop events when all receivers are exerting backpressure. (Causes might include an accumulation of too many files needing to be closed.) Defaults to
Block. See more information about Destination backpressure behavior.Tags: Optionally, add tags that you can use to filter and group Destinations on the Destinations page. These tags aren’t added to processed events. Use a tab or hard return between (arbitrary) tag names.
- Optionally, configure any Authentication, Processing, Parquet, and Advanced settings outlined in the sections below.
- Select Save, then Commit & Deploy.
Authentication
This Destination supports static S3-compatible credentials only. Use the Authentication method drop-down to select Manual or Secret Key. Automatic IAM credentials and AssumeRole are not available for this Destination.
The Manual option exposes these corresponding additional fields:
- Access key: Enter the access key.
- Secret key: Enter the secret key.
The values for Access key and Secret key can be a constant or a JavaScript expression (such as ${C.env.MY_VAR}).
Enclose values that contain special characters (for example,
/) or environment variables in single quotes or backticks.
The Secret Key option allows you to supply a stored secret that references an access key and secret key. It exposes this additional field:
- Secret key pair: Use the drop-down to select an API key/secret key pair that you’ve configured in Cribl Stream’s secrets manager. A Create link is available to store a new, reusable secret.
Processing Settings
Post-Processing
Pipeline: Pipeline or Pack to process data before sending the data out using this output.
System fields: A list of fields to automatically add to events that use this output. By default, includes cribl_pipe (identifying the Cribl Stream Pipeline that processed the event). Supports c* wildcards. Other options include:
cribl_host- Cribl Stream Node that processed the event.cribl_input- Cribl Stream Source that processed the event.cribl_output- Cribl Stream Destination that processed the event.cribl_pipe- Cribl Stream Pipeline that processed the event.cribl_wp- Cribl Stream Worker Process that processed the event.cribl_route- Cribl Stream Route (or QuickConnect) that processed the event.cribl_group- Cribl Stream Worker Group of the node that processed the event.cribl_mode- Cribl Stream mode of the node that processed the event.
Parquet Settings
To write out Parquet files, note that:
- You can use the Cribl Stream CLI’s
parquetcommand to view a Parquet file, its metadata, or its schema. - See Parquet Schemas for pointers on how to avoid problems such as data mismatches.
- You can use the S3 Collector to ingest Parquet data from a Dell PowerScale OneFS bucket. This ensures that any data you export in Parquet format can be replayed and re-processed by Cribl at a later time.
Automatic schema: Toggle on to automatically generate a Parquet schema based on the events of each Parquet file that Cribl Stream writes. When toggled off (the default), exposes the following additional fields:
- Parquet schema: Select a schema from the drop-down.
If you need to modify a schema or add a new one, follow the instructions in our Parquet Schemas topic. These steps will propagate the freshest schema back to this drop-down.
- Parquet version: Determines which data types are supported, and how they are represented. Defaults to
2.6;2.4and1.0are also available. - Data page version: Serialization format for data pages. Defaults to
V2. If your toolchain includes a Parquet reader that does not supportV2, useV1. - Group row limit: The number of rows that every group will contain. The final group can contain a smaller number of rows. Defaults to
10000. - Page size: Set the target memory size for page segments. Generally, set lower values to improve reading speed, or set higher values to improve compression. Value must be a positive integer smaller than the Row group size value, with appropriate units. Defaults to
1 MB. - Log invalid rows: Toggle on to output up to 20 unique rows that were skipped due to data format mismatch. Log level must be set to
debugfor output to be visible. - Write statistics: Leave toggled on (the default) if you have Parquet tools configured to view statistics - these profile an entire file in terms of minimum/maximum values within data, numbers of nulls, etc.
- Write page indexes: Leave toggled on (the default) if your Parquet reader uses statistics from Page Indexes to enable page skipping. One Page Index contains statistics for one data page.
- Write page checksum: Toggle on if you have configured Parquet tools to verify data integrity using the checksums of Parquet pages.
- Metadata (optional): The metadata of files the Destination writes will include the properties you add here as key-value pairs. For example, one way to tag events as belonging to the OCSF category for security findings would be to set Key to
OCSF Event Classand Value to2001.
Advanced Settings
File size limit (MB): Maximum uncompressed output file size. Files of this size will be closed and moved to final output location. Defaults to 32.
Open file limit: Maximum number of files to keep open concurrently. When this limit is exceeded, on any individual Worker Process, Cribl Stream will close the oldest open files, and move them to the final output location. Defaults to 100.
File open time limit (sec): Maximum amount of time to write to a file. Files open for longer than this limit will be closed and moved to final output location. Defaults to 300.
Compression level: Compression level to apply before moving files to the final destination. Defaults to Best Speed.
Staging file limit: Maximum number of files that the Destination will allow to wait for upload before it applies backpressure. Defaults to 100; minimum is 10; maximum is 4200.
Idle time limit (sec): Maximum amount of time to keep inactive files open. Files open for longer than this limit will be closed and moved to final output location. Defaults to 30.
Cribl Stream will close files when any of the four above conditions is met.
Disk space protection: Specifies whether to Block (default) or Drop incoming events when the disk space falls below the globally defined Min free disk space amount.
Concurrent file parts limit: Maximum number of parts to upload in parallel per file. A value of 1 tells the Destination to send the whole file at once. When set to 2 or above, the Destination will use multipart uploads. Defaults to 4; highest allowed value is 10.
Add Output ID: When toggled on (default), adds the Output ID field’s value to the staging location’s file path. This ensures that each Destination’s logs will write to its own bucket.
For a Destination originally configured in a Cribl Stream version below 2.4.0, the Add Output ID behavior will be switched off on the backend, regardless of this toggle’s state. This is to avoid losing any files pending in the original staging directory, upon Cribl Stream upgrade and restart. To enable this option for such Destinations, Cribl’s recommended migration path is:
- Clone the Destination.
- Redirect the Routes referencing the original Destination to instead reference the new, cloned Destination.
This way, the original Destination will process pending files (after an idle timeout), and the new, cloned Destination will process newly arriving events with Add Output ID enabled.
Remove empty staging directories: When toggled on (the default), Cribl Stream deletes empty staging directories after moving files. This prevents the proliferation of orphaned empty directories. When enabled, exposes this additional option:
- Staging cleanup period: How often (in seconds) to delete empty directories when Remove empty staging directories is enabled. Defaults to
300seconds (every 5 minutes). Minimum configurable interval is10seconds; maximum is86400seconds (every 24 hours). - Directory batch size: Specifies how many directories are scanned per batch during automatic cleanup of empty staging directories. Set between
10and10000, defaults to1000. Higher values speed up cleanup but increase memory usage.
Force close on shutdown: Toggle on to force all staged files to close during an orderly shutdown. This triggers immediate upload or finalization of any in-progress files, regardless of Idle time, File open time, or File size limits. Use this option to minimize data loss risk in ephemeral or auto-scaling environments. This could result in a higher number of small files, as files are closed before reaching their configured thresholds. Ensure your Worker Group Drain timeout is set high enough to allow all staged data to be finalized before shutdown completes.
Enable dead-lettering: Toggle on to set a maximum number of retries, and to move files to a designated directory when write failures exceed that limit. This prevents data flow blockage and excessive error logging due to undeliverable files. When enabled, exposes two additional fields:
- Dead-letter location: Specify the storage location for undeliverable files. Defaults to
$CRIBL_HOME/state/outputs/dead-letter. - Maximum retry limit: Configure the retry limit for failed file deliveries. This setting defines how many times the system will attempt to move a file to its intended location before it is deemed undeliverable and placed in the dead-letter directory. Defaults to
20.
Enable retry backoff: Toggle on to apply exponential backoff with jitter when file uploads fail repeatedly.
Region: Region where the Dell PowerScale OneFS bucket is located. This field is typically left blank for on-premises OneFS deployments, and only needs a value if your OneFS administrator has configured a region in the S3 service settings.
Object ACL: Object ACL (Access Control List) to assign to uploaded objects. Supported values depend on your OneFS S3 service configuration. See the PowerScale OneFS S3 API Guide for the ACL values your version supports.
Signature version: Signature version to use for signing Dell PowerScale OneFS requests. Use v4 for OneFS 9.0 and later. If you are running an older OneFS version that does not support Signature Version 4, use v2.
Reuse connections: Whether to reuse connections between requests. Toggling on (default) can improve performance.
Reject unauthorized certificates: Whether to accept certificates that cannot be verified against a valid Certificate Authority (for example, self-signed certificates). On-premises OneFS clusters commonly use self-signed certificates. Defaults to toggled on.
Verify if bucket exists: Toggle off if your access key has permission to write objects but not to list the bucket (it has s3:PutObject but lacks s3:ListBucket).
Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.
S3 API Permissions
Dell PowerScale OneFS enforces S3-compatible access for object operations. To allow Cribl Stream to write data, the access key you use must have permissions (configured in OneFS for that S3 user or role) that include the following actions where applicable:
s3:GetBucketLocation- Needed if Region is not configured.s3:ListBucket- Needed unless Verify if bucket exists is toggled off.s3:PutObjects3:AbortMultipartUploadands3:ListMultipartUploadParts- Needed when Concurrent file parts limit is set to2or above (the default is4). Availability depends on your OneFS version and administrator-defined policies.
See PowerScale OneFS S3 API Guide for additional information on configuring S3 credentials and access permissions.
Internal Fields
Cribl Stream uses a set of internal fields to assist in forwarding data to a Destination.
Field for this Destination:
__partition
Proxying Requests
If you need to proxy HTTP/S requests, see System Proxy Configuration.
Troubleshooting
The Destination’s configuration modal has helpful tabs for troubleshooting:
Live Data: Try capturing live data to see real-time events as they flow through the Destination. On the Live Data tab, click Start Capture to begin viewing real-time data.
Logs: Review and search the logs that provide detailed information about the delivery process, including any errors or warnings that may have occurred.
Test: Ensures that the Destination is correctly set up and reachable. Verify that sample events are sent correctly by clicking Run Test.
You can also view the Monitoring page that provides a comprehensive overview of data volume and rate, helping you identify delivery issues. Analyze the graphs showing events and bytes in/out over time.
Common Issues
Bucket does not exist - self-signed certificate
Example Error Text:
message: Bucket does not exist - self signed certificate
stack: Error: self signed certificate
at TLSSocket.onConnectSecure (node:_tls_wrap_1535:34)
at TLSSocket.emit (node:events:513:28)
at TLSSocket.emit (node:events:489:12)
at TLSSocket._fini... Show more
tmpPath: /opt/cribl/powerscale_bucket/cribl/2023/03/28/CriblOut-XTNGh5.0.json.tmpCan occur when:
- The Destination uses a self-signed cert that has not yet been trusted by Cribl Stream.
- The user does not have appropriate permissions to view the bucket.
- The authentication token generated by your computer has a timestamp that is out of sync with the server’s time, resulting in an authentication failure.
Recommendation
Disable Advanced Settings > Reject unauthorized certificates, or get the sender’s cert and add it to the NODE_EXTRA_CA_CERTS path for validation; or verify that the user has appropriate permissions.
Misconfigured or Nonexistent Buckets
If the bucket that the Destination is trying to send files to does not exist or is misconfigured, the Destination logs will reflect the following pattern:
Upon initialization, the Destination will log an error that says the bucket cannot be found. This error will not be logged if Advanced Settings > Verify if bucket exists is toggled off.
Every time the Destination “rolls” a new file out of staging to try to upload it to the destination, the Destination will log an error, until the number of files rolled exceeds the value of Advanced Settings > Staging file limit.
Once the Destination has begun to apply backpressure:
- The Destination will log a backpressure warning.
- Events will stop flowing through the Destination.
- The Destination will stop creating staging files.
- Although the Destination will periodically attempt to upload the staged files, it will not log any additional failures.