/ / / /

OpenAI Compliance Logs Source

The OpenAI Compliance Logs Source in Cribl Stream collects enterprise observability and compliance data from OpenAI’s Compliance Logs API. It is intended for organizations using ChatGPT Enterprise that need to feed ChatGPT workspace activity into data loss prevention (DLP) or SIEM tools, as well as to process that data in Cribl Stream before delivery to downstream systems.

Type: Pull | TLS Support: Yes | Event Breaker Support: Yes
TLS is enabled via HTTPS on the underlying OpenAI Compliance Logs REST APIs.

Prerequisites

Before you configure this Source, you need:

A Cribl Stream deployment (Cloud or self-hosted).
ChatGPT Enterprise (or equivalent OpenAI workspace offering) with access to the Compliance Logs export APIs.
A compliance API token (or other credential) that OpenAI expects in request headers, with permission to list and download compliance log exports for your organization.
Network connectivity from Cribl Workers to OpenAI’s compliance API hosts over HTTPS (direct or via an HTTP/S proxy).

How the OpenAI Compliance Logs Source Works

When a collection job runs, the Source:

Calls the Compliance Logs export API with your configured authentication headers and event types selection.
Receives a list of files (or equivalent export descriptors) that represent batches of compliance data for the requested types and time window.
Downloads and ingests those files so each record becomes an event in Cribl Stream.
Tracks state so repeated polls avoid large gaps or unnecessary duplication, consistent with other REST Collectors.

You can then route, filter, enrich, and format events for your downstream Destinations, such as SIEM, observability, storage, or ticketing tools.

Configuring an OpenAI Compliance Logs Source

On the top bar, select Products, and then select Cribl Stream. Under Worker Groups, select a Worker Group. Next, you have two options:
- To configure via QuickConnect, navigate to Routing > QuickConnect. Select Add Source and select the Source you want from the list, choosing either Select Existing or Add New.
- To configure via the Routes, select Data > Sources. Select the Source you want. Next, select Add Source.
In the Source modal, configure the following under General Settings:
- Input ID: Enter a unique name for this Source. If you clone it, Cribl Stream appends -CLONE to the original Input ID.
- Description: Optionally, enter a description (for example, which event types or downstream SIEM this instance feeds).
- API key: Select or create a stored text secret.
- Account type: Choose whether this Source collects compliance logs for a single ChatGPT workspace or for your entire organization. The default is Workspace.
- Workspace ID: If using Workspace as the Account type, enter the ID of the ChatGPT workspace to collect logs from. This must be in UUID format.
- Organization ID: If using Organization as the Account type, enter the ID of the ChatGPT organization to collect logs from. Example format is org-XXXXXXXXXXXXXXXXXXXXXXXX.
- Event types: One or more compliance log categories to collect.
- Cron schedule: Collector jobs will run on this schedule in the Leader’s time zone.
  - Earliest time: Relative to the current time. Format: [+|-]<time_integer><time_unit>.
  - Latest time: Relative to the current time. Format: [+|-]<time_integer><time_unit>.
  - Job timeout: Maximum time the job is allowed to run (For example, 30, 45s, 15m). Enter 0 for unlimited time.
- Tags: Optionally, add UI tags to group this Source in Stream. Tags are not added to events.
Under Event types, use the multi-select dropdown to choose one or more event_type values to pass to the export API. Only selected types are requested in each collection run. See Event types for the supported values.
Configure Scheduling and State Tracking for the Collector job as needed. Defaults are set for typical compliance polling.
Optionally configure Processing Settings, Retries, and Advanced Settings.
Under Connected Destinations, choose Send to Routes and/or QuickConnect depending on how you want data to leave this Source.
Click Save, then Commit & Deploy.

Event Types

The Compliance Logs export API accepts an event_type parameter so you can scope each export to specific log categories. Select the types you need using the multi-select dropdown. The Source includes only the selected types in its requests.

Supported values include:

Value	Typical use
`APP_LOG`	Application-level activity in the ChatGPT workspace.
`APP_AUTH_LOG`	Application authentication events.
`AUDIT_LOG`	Administrative and audit trail events.
`AUTH_LOG`	User or session authentication events.
`CHATGPT_PLUGIN_SPREADSHEET`	Plugin activity in the ChatGPT workspace.
`CODEX_LOG`	Codex-related usage and events.
`CODEX_SECURITY_LOG`	Security-focused Codex events.
`CONVERSATION_MESSAGE`	Conversation message-level records (often the richest content for review workflows).

When Account type is set to Organization, only APP_LOG and APP_AUTH_LOG are available. The remaining types are available for Workspace accounts only.

Select the minimum set of types that satisfy your DLP or SIEM policy. Requesting every type increases volume and API work.

Processing Settings

Event Breakers: Select an Event Breaker ruleset to split incoming raw, unstructured data into distinct events before they enter Routes and Pipelines. The OpenAI Compliance Logs is selected by default, but you can add additional rules.

Fields: Add fields to each event using the Eval Function. Values can be JavaScript expressions or constants. Fields defined here normally override same-named fields on the event unless you choose to let event fields win.

Pre-Processing: When Send to Routes is enabled, optionally select a Pipeline (or Pack) to run on collected events before they enter Routes.

Retries

Adjust how failed HTTP requests are retried.

Retry type: Backoff (default), Static, or Disabled.

Initial retry interval (ms): Delay before the first retry after a failure. Max 20,000 ms. 0 means retry immediately until Retry limit is reached.

Retry limit: Max retries per failed request (default 5, max 20). 0 disables retries.

Backoff multiplier: Base for exponential backoff (default 2).

Retry HTTP codes: Defaults typically include 429, 500, and 503. Non-2xx responses are errors; tune this list per OpenAI’s behavior.

Honor Retry-After header: When enabled (default), honor Retry-After up to the product maximum (longer delays may be ignored). Stream logs the delay when applicable.

Retry connection timeout / Retry connection reset: Optionally retry on ETIMEDOUT or ECONNRESET for more resilient long downloads.

Advanced Settings

Request timeout (seconds): Max time to wait for a request (default 300; 0 means wait indefinitely).

Time to live: How long Collector job artifacts remain on disk and in Job Inspector (default often 4h).

Page limit: Maximum number of pages the Collector will request in a single job. When the limit is reached, pagination stops even if the API has more data. Set to 0 for no explicit page cap.

State tracking: Customize how the Source tracks time or cursor state to avoid overlaps and gaps between jobs. See State Tracking.

Environment: For GitOps, optionally limit this config to a single Git branch.

Scheduling

The Collector’s Cron schedule controls when export/list/download jobs run.

Earliest time and Latest time: Relative time range for events to collect, using the same syntax as other REST Sources.

Job timeout: Maximum runtime for a job (30, 45s, 15m, and so on). Minimum granularity is often 10 seconds. Default is 300. A value of 0 means no timeout.

Log Level: Verbosity for task logs (use higher verbosity temporarily for troubleshooting).

[+|-]<time_integer><time_unit>@<snap-to_time_unit>

Syntax reference:

Syntax element	Values supported
Offset	`-` past, `+` future, or omit with `now`.
`<time_integer>`	Integer, or omit with `now`.
`<time_unit>`	`now`, or `s`, `m`, `h`, `d`, `w`, `mon`, `q`, `y`.
`@<snap-to_time_unit>`	Optional snap-to unit (see below).

Rules:

Earliest must not be later than Latest.
Values without units are interpreted as seconds (for example, -1 = -1s).

Snap-to-time syntax

@ rounds down from the evaluated time. For example:

@d - start of the current day.
+128m@h - forward 128 minutes, then snap back to the hour boundary.

Week/month/quarter/year snaps (@w, @w1-@w6, @mon, @q, @y) behave like other Cribl Stream Collectors.

Working with State Tracking

You can configure the Source to track state, either by time or another arbitrary value. This can help prevent overlaps between jobs, where subsequent runs may return some of the same results as previous runs. Similarly, it can help prevent gaps in data by allowing a run to pick up from where the last run ended.

State update expression: JavaScript expression that defines how to update the state from an event. Use the event’s data and the current state to compute the new state.

State merge expression: JavaScript expression that defines which state to keep when merging a task’s newly reported state with the previously saved state. Evaluates prevState and newState variables, resolving to the state to keep.

The default values for these fields are configured to track state by the latest lastEndTime field found in events gathered in a collection run.

Understanding State Expression Fields

The State update and State merge expressions control how state is derived from a collection run and how it is merged with existing state, respectively. They’re preconfigured to work with the common use case of tracking state by latest lastEndTime, but you may need to update them for other use cases. To understand what these fields do, let’s break down the default values.

State update expression

This expression has a default value of:

__collectible != null && (__collectible.end_time && {lastEndTime: __collectible.end_time > (state.lastEndTime || '') ? __collectible.end_time : state.lastEndTime})

The __collectible.end_time field is null or undefined if the Collector was unable to parse time for the event. If this is the case, we don’t want to update state (the event’s __collectible.end_time value may be missing if the Collector was unable to parse out the correct time). If __collectible.end_time is missing, take advantage of short-circuit evaluation to not update state.

State values must resolve to an object, such as:

{ "lastEndTime": "2024-04-05T12:00:00Z" }

If the expression does not resolve to an object, Cribl Stream will ignore the result.

{lastEndTime: __collectible.end_time > (state.lastEndTime || '') ? __collectible.end_time : state.lastEndTime} - compare state.lastEndTime to the __collectible.end_time value, keeping whichever value is greater.

State Merge Expression

This expression has a default value of:

(prevState.lastEndTime || '') >= (newState.lastEndTime || '') ? prevState : newState

It compares prevState (the state that was previously saved) to newState (the state reported from the most recent collection task), keeping the state with the greatest lastEndTime value.

Managing State

Select Manage State to view, modify, or delete a state. For more information, see Manage State.

The default values for these fields are configured to track state by the latest lastEndTime field found in events gathered in a collection run.

API limits and large exports

Compliance exports can produce many files or large payloads. If OpenAI returns rate-limit responses (for example, HTTP 429), rely on Retry settings and a less aggressive schedule, and reduce the number of event types or narrow the time window per job. Watch job logs and OpenAI’s documented quotas for your workspace tier.

Proxying requests

To send HTTPS traffic through a corporate proxy, see System Proxy Configuration.

Internal fields

Stream attaches metadata you can read in Functions:

__collectible - metadata about the collection job.
__collectStats - per-request statistics.

Troubleshooting

Live Data: On the Source modal, use Live Data and Start Capture to preview events as they are ingested. See Capture Source data.

Logs: Use the Logs tab on the job or Source for request errors, auth failures, and download issues.

Monitoring: Use the Monitoring page to correlate drops in events or bytes with schedule or API errors.

Response errors

Non-2xx HTTP responses from the configured endpoints are generally treated as errors. Exceptions may apply when only some subtasks fail or when the HTTP client follows redirects (3xx) according to library behavior. See job messages for details.

If authentication fails, verify the token, header names, and any organization or workspace identifiers against OpenAI’s current compliance logs documentation.

OpenAI Compliance Logs Source ​

Prerequisites ​

How the OpenAI Compliance Logs Source Works ​

Configuring an OpenAI Compliance Logs Source ​

Event Types ​

Processing Settings ​

Retries ​

Advanced Settings ​

Scheduling ​

Snap-to-time syntax ​

Working with State Tracking ​

Understanding State Expression Fields ​

State update expression ​

State Merge Expression ​

Managing State ​

API limits and large exports ​

Proxying requests ​

Internal fields ​

Troubleshooting ​

Response errors ​

Common Resources