OpenAI Compliance Logs Source
The OpenAI Compliance Logs Source in Cribl Stream collects enterprise observability and compliance data from OpenAI’s Compliance Logs API. It is intended for organizations using ChatGPT Enterprise that need to feed ChatGPT workspace activity into data loss prevention (DLP) or SIEM tools, as well as to process that data in Cribl Stream before delivery to downstream systems.
Type: Pull | TLS Support: Yes | Event Breaker Support: Yes
TLS is enabled via HTTPS on the underlying OpenAI Compliance Logs REST APIs.
Prerequisites
Before you configure this Source, you need:
- A Cribl Stream deployment (Cloud or self-hosted).
- ChatGPT Enterprise (or equivalent OpenAI workspace offering) with access to the Compliance Logs export APIs.
- A compliance API token (or other credential) that OpenAI expects in request headers, with permission to list and download compliance log exports for your organization.
- Network connectivity from Cribl Workers to OpenAI’s compliance API hosts over HTTPS (direct or via an HTTP/S proxy).
How the OpenAI Compliance Logs Source Works
When a collection job runs, the Source:
- Calls the Compliance Logs export API with your configured authentication headers and event types selection.
- Receives a list of files (or equivalent export descriptors) that represent batches of compliance data for the requested types and time window.
- Downloads and ingests those files so each record becomes an event in Cribl Stream.
- Tracks state so repeated polls avoid large gaps or unnecessary duplication, consistent with other REST Collectors.
You can then route, filter, enrich, and format events for your downstream Destinations, such as SIEM, observability, storage, or ticketing tools.
Configuring an OpenAI Compliance Logs Source
- On the top bar, select Products, and then select Cribl Stream. Under Worker Groups, select a Worker Group. Next, you have two options:
- To configure via QuickConnect, navigate to Routing > QuickConnect. Select Add Source and select the Source you want from the list, choosing either Select Existing or Add New.
- To configure via the Routes, select Data > Sources. Select the Source you want. Next, select Add Source.
- In the Source modal, configure the following under General Settings:
- Input ID: Enter a unique name for this Source. If you clone it, Cribl Stream appends
-CLONEto the original Input ID. - Description: Optionally, enter a description (for example, which event types or downstream SIEM this instance feeds).
- API key: Select or create a stored text secret.
- Account type: Choose whether this Source collects compliance logs for a single ChatGPT workspace or for your entire organization. The default is
Workspace. - Workspace ID: If using
Workspaceas the Account type, enter the ID of the ChatGPT workspace to collect logs from. This must be in UUID format. - Organization ID: If using
Organizationas the Account type, enter the ID of the ChatGPT organization to collect logs from. Example format isorg-XXXXXXXXXXXXXXXXXXXXXXXX. - Event types: One or more compliance log categories to collect.
- Cron schedule: Collector jobs will run on this schedule in the Leader’s time zone.
- Earliest time: Relative to the current time. Format:
[+|-]<time_integer><time_unit>. - Latest time: Relative to the current time. Format:
[+|-]<time_integer><time_unit>. - Job timeout: Maximum time the job is allowed to run (For example,
30,45s,15m). Enter0for unlimited time.
- Earliest time: Relative to the current time. Format:
- Tags: Optionally, add UI tags to group this Source in Stream. Tags are not added to events.
- Input ID: Enter a unique name for this Source. If you clone it, Cribl Stream appends
- Under Event types, use the multi-select dropdown to choose one or more
event_typevalues to pass to the export API. Only selected types are requested in each collection run. See Event types for the supported values. - Configure Scheduling and State Tracking for the Collector job as needed. Defaults are set for typical compliance polling.
- Optionally configure Processing Settings, Retries, and Advanced Settings.
- Under Connected Destinations, choose Send to Routes and/or QuickConnect depending on how you want data to leave this Source.
- Click Save, then Commit & Deploy.
Event Types
The Compliance Logs export API accepts an event_type parameter so you can scope each export to specific log categories. Select the types you need using the multi-select dropdown. The Source includes only the selected types in its requests.
Supported values include:
| Value | Typical use |
|---|---|
APP_LOG | Application-level activity in the ChatGPT workspace. |
APP_AUTH_LOG | Application authentication events. |
AUDIT_LOG | Administrative and audit trail events. |
AUTH_LOG | User or session authentication events. |
CHATGPT_PLUGIN_SPREADSHEET | Plugin activity in the ChatGPT workspace. |
CODEX_LOG | Codex-related usage and events. |
CODEX_SECURITY_LOG | Security-focused Codex events. |
CONVERSATION_MESSAGE | Conversation message-level records (often the richest content for review workflows). |
When Account type is set to
Organization, onlyAPP_LOGandAPP_AUTH_LOGare available. The remaining types are available forWorkspaceaccounts only.
Select the minimum set of types that satisfy your DLP or SIEM policy. Requesting every type increases volume and API work.
Processing Settings
Event Breakers: Select an Event Breaker ruleset to split incoming raw, unstructured data into distinct events before they enter Routes and Pipelines. The OpenAI Compliance Logs is selected by default, but you can add additional rules.
Fields: Add fields to each event using the Eval Function. Values can be JavaScript expressions or constants. Fields defined here normally override same-named fields on the event unless you choose to let event fields win.
Pre-Processing: When Send to Routes is enabled, optionally select a Pipeline (or Pack) to run on collected events before they enter Routes.
Retries
Adjust how failed HTTP requests are retried.
Retry type: Backoff (default), Static, or Disabled.
Initial retry interval (ms): Delay before the first retry after a failure. Max 20,000 ms. 0 means retry immediately until Retry limit is reached.
Retry limit: Max retries per failed request (default 5, max 20). 0 disables retries.
Backoff multiplier: Base for exponential backoff (default 2).
Retry HTTP codes: Defaults typically include 429, 500, and 503. Non-2xx responses are errors; tune this list per OpenAI’s behavior.
Honor Retry-After header: When enabled (default), honor Retry-After up to the product maximum (longer delays may be ignored). Stream logs the delay when applicable.
Retry connection timeout / Retry connection reset: Optionally retry on ETIMEDOUT or ECONNRESET for more resilient long downloads.
Advanced Settings
Request timeout (seconds): Max time to wait for a request (default 300; 0 means wait indefinitely).
Time to live: How long Collector job artifacts remain on disk and in Job Inspector (default often 4h).
Page limit: Maximum number of pages the Collector will request in a single job. When the limit is reached, pagination stops even if the API has more data. Set to 0 for no explicit page cap.
State tracking: Customize how the Source tracks time or cursor state to avoid overlaps and gaps between jobs. See State Tracking.
Environment: For GitOps, optionally limit this config to a single Git branch.
Scheduling
The Collector’s Cron schedule controls when export/list/download jobs run.
Earliest time and Latest time: Relative time range for events to collect, using the same syntax as other REST Sources.
Job timeout: Maximum runtime for a job (30, 45s, 15m, and so on). Minimum granularity is often 10 seconds. Default is 300. A value of 0 means no timeout.
Log Level: Verbosity for task logs (use higher verbosity temporarily for troubleshooting).
[+|-]<time_integer><time_unit>@<snap-to_time_unit>
Syntax reference:
| Syntax element | Values supported |
|---|---|
| Offset | - past, + future, or omit with now. |
<time_integer> | Integer, or omit with now. |
<time_unit> | now, or s, m, h, d, w, mon, q, y. |
@<snap-to_time_unit> | Optional snap-to unit (see below). |
Rules:
- Earliest must not be later than Latest.
- Values without units are interpreted as seconds (for example,
-1=-1s).
Snap-to-time syntax
@ rounds down from the evaluated time. For example:
@d- start of the current day.+128m@h- forward 128 minutes, then snap back to the hour boundary.
Week/month/quarter/year snaps (@w, @w1-@w6, @mon, @q, @y) behave like other Cribl Stream Collectors.
Working with State Tracking
You can configure the Source to track state, either by time or another arbitrary value. This can help prevent overlaps between jobs, where subsequent runs may return some of the same results as previous runs. Similarly, it can help prevent gaps in data by allowing a run to pick up from where the last run ended.
State update expression: JavaScript expression that defines how to update the state from an event. Use the event’s data and the current state to compute the new state.
State merge expression: JavaScript expression that defines which state to keep when merging a task’s newly reported state with the previously saved state. Evaluates prevState and newState variables, resolving to the state to keep.
The default values for these fields are configured to track state by the latest lastEndTime field found in events gathered in a collection run.
Understanding State Expression Fields
The State update and State merge expressions control how state is derived from a collection run and how it is merged with existing state, respectively. They’re preconfigured to work with the common use case of tracking state by latest lastEndTime, but you may need to update them for other use cases. To understand what these fields do, let’s break down the default values.
State update expression
This expression has a default value of:
__collectible != null && (__collectible.end_time && {lastEndTime: __collectible.end_time > (state.lastEndTime || '') ? __collectible.end_time : state.lastEndTime})
The __collectible.end_time field is null or undefined if the Collector was unable to parse time for the event. If this is the case, we don’t want to update state (the event’s __collectible.end_time value may be missing if the Collector was unable to parse out the correct time). If __collectible.end_time is missing, take advantage of short-circuit evaluation to not update state.
State values must resolve to an object, such as:
{ "lastEndTime": "2024-04-05T12:00:00Z" }
If the expression does not resolve to an object, Cribl Stream will ignore the result.
{lastEndTime: __collectible.end_time > (state.lastEndTime || '') ? __collectible.end_time : state.lastEndTime} - compare state.lastEndTime to the __collectible.end_time value, keeping whichever value is greater.
State Merge Expression
This expression has a default value of:
(prevState.lastEndTime || '') >= (newState.lastEndTime || '') ? prevState : newState
It compares prevState (the state that was previously saved) to newState (the state reported from the most recent collection task), keeping the state with the greatest lastEndTime value.
Managing State
Select Manage State to view, modify, or delete a state. For more information, see Manage State.
The default values for these fields are configured to track state by the latest lastEndTime field found in events gathered in a collection run.
API limits and large exports
Compliance exports can produce many files or large payloads. If OpenAI returns rate-limit responses (for example, HTTP 429), rely on Retry settings and a less aggressive schedule, and reduce the number of event types or narrow the time window per job. Watch job logs and OpenAI’s documented quotas for your workspace tier.
Proxying requests
To send HTTPS traffic through a corporate proxy, see System Proxy Configuration.
Internal fields
Stream attaches metadata you can read in Functions:
__collectible- metadata about the collection job.__collectStats- per-request statistics.
Troubleshooting
Live Data: On the Source modal, use Live Data and Start Capture to preview events as they are ingested. See Capture Source data.
Logs: Use the Logs tab on the job or Source for request errors, auth failures, and download issues.
Monitoring: Use the Monitoring page to correlate drops in events or bytes with schedule or API errors.
Response errors
Non-2xx HTTP responses from the configured endpoints are generally treated as errors. Exceptions may apply when only some subtasks fail or when the HTTP client follows redirects (3xx) according to library behavior. See job messages for details.
If authentication fails, verify the token, header names, and any organization or workspace identifiers against OpenAI’s current compliance logs documentation.