Onboarding Data Sources to Cortex XSIAM with Cribl Stream
This guide outlines a practical, end-to-end process for onboarding third-party data sources into Cortex XSIAM using Cribl Stream, and explains how datasets, parsers, and data models (XDM) work together.
High-Level Onboarding Flow
Start from the XSIAM perspective.
- Get certified on using XSIAM as an Analyst or Engineer. Get certified on Cribl Stream. (It’s free!)
- Conduct a data source assessment so you have a solid understanding of which data sources are supported by XSIAM out of the box. Everything about ingest is centered on XSIAM, whether you use XSIAM connectors or Cribl Stream to route data.
- Identify the expected event format and whether an XSIAM data model (XDM) exists for the target data source.
- If the data model expects a specific format or field set, Cribl Stream must emit that. In general, XSIAM expects events to look as they did when produced by the source. If no Content Pack exists in the Palo Alto XSIAM Marketplace that includes a data model, create a data model in XSIAM.
Format and tag events correctly in Cribl Stream.
- Use an Event Breaker to ensure events are formatted to meet XSIAM’s expectations.
- For all events you send to XSIAM, set:
__sourceIdentifier__vendor__product
- The Cortex XSIAM Destination maps these to the HTTP headers XSIAM uses for parsing and data model mapping.
Send data to Cortex XSIAM.
- Configure the Cortex XSIAM Destination in Cribl Stream and route the relevant Sources to it.
- In XSIAM, validate delivery using the Investigator (XQL search).
Confirm dataset and data model behavior.
- Data is written to Cortex Data Lake (the underlying store) and is searchable in the XSIAM Investigator.
- For XSIAM analytics, the dataset must be mapped to an XSIAM data model (XDM). This mapping is what surfaces fields such as
XDM.*for analytics, correlation, and stitching. - In practice, it does not matter whether you use a standard XSIAM connector, the XSIAM Broker VM, or the Cribl XSIAM integration. The important consideration is whether events arrive in the right format with the required identifiers.
Verify XDM mapping (when a data model exists).
- In XSIAM data model rules, look for:
dataset = vendor_product_raw
- If the dataset is missing:
- Install the right content pack from the XSIAM Marketplace, or
- Build your own data model rules to map into XDM.
- If no data model exists and you do not add one, data is still fully searchable in the XSIAM Investigator; it is just not normalized into XDM for analytics.
- To confirm
XDM.*fields are populated, run a query like:
datamodel dataset = vendor_product_raw- In XSIAM data model rules, look for:
How XSIAM Ingests and Normalizes Events
XSIAM ingests events in the following sequence:
Raw log ingested. Events arrive through a Collector, Broker VM, or the Cribl integration endpoint.
Parsing and dataset assignment.
XSIAM’s parser tries to match an existing parser.
On match, XSIAM sets:
dataset = "vendor_product_raw"If no parser matches and the payload is JSON, CEF, LEEF syslog with key-value fields, XSIAM can still write KV pairs to Cortex Data Lake.
If no parser matches and the payload is plain text, XSIAM often writes at least
_timeand_rawor_raw_log, depending on the Source. See Understanding How Raw Fields Are Stored for more information.
Write to Cortex Data Lake. Parsed and/or raw fields land in Cortex Data Lake. Data models are not applied at this stage. The fields remain raw and parsed.
Schema-at-read normalization. XSIAM normalizes to the Cortex Data Model (XDM) at read time. Analytics, stitching, and most higher-level use cases use this XDM layer.
Configuring the Cribl XSIAM Destination
Destination Behavior
- Cribl provides a dedicated Cortex XSIAM Destination and the Palo Alto XSIAM Pack on the Dispensary.
- The Destination validates and maps required fields to HTTP headers XSIAM expects, and handles batching, rate limits, and JSON for ingestion.
For more information, see:
- The Cribl Cortex XSIAM Destination documentation.
- Ingest data from Cribl in the XSIAM documentation.
Required Identifiers and Headers
Cribl Stream sends identifiers to XSIAM as HTTP headers. These headers tell XSIAM which integration the events use, the vendor and product, and how to send traffic to the right parsers, datasets, and (where available) data models. In the Pipeline, set and preserve:
__sourceIdentifier- Required for all events. Maps to theSource-Identifierheader.__inputId- Required; set automatically by Cribl Stream. Do not remove this field. Maps to theIntegration-Identifierheader.__vendor- Maps to thevendorheader.__product- Maps to theproductheader.format- Usually set by the Cortex XSIAM Destination, oftenraworjson.
UUIDs and Built-In Analytics
Some XSIAM data sources use data source UUIDs to enable advanced, built-in analytics (such as baselining, ML, and specialized stitching) alongside what is written to Cortex Data Lake.
If a documented UUID exists for the source, set it on the Cribl XSIAM Destination. XSIAM can then apply the right parsers, data models, and analytics tied to that UUID.
For reference, see Data source UUIDs in the XSIAM documentation.
Configuration Patterns
Data sources with an assigned UUID: Look up the UUID in the XSIAM documentation. In the Cribl XSIAM Destination, set the UUID and make sure
__vendorand__productmatch the documented values for parsers and data models.Data sources without an assigned UUID: Find the XSIAM content pack in the Marketplace. From the pack’s parser and data model rules, read the expected
__vendorand__product. In Cribl Stream, set those and a consistent__sourceIdentifier. Examples and defaults appear in the Palo Alto XSIAM Pack and the Data source UUIDs page, including the Other/non-XSIAM integration UUID. XSIAM uses the headers to route to the correctdatasetand parsing or modeling for that pack.First-party Palo Alto Networks data: Do not send Palo Alto first-party data (for example, NGFW, native Cortex XDR agent telemetry) through Cribl if you depend on all out-of-the-box advanced analytics. Ingesting that data directly into Cortex Data Lake / Strata Logging Service uses data-source-specific analytics and full stitching. An intermediate path can alter or miss some of that processing.
Understanding How Raw Fields Are Stored
When you troubleshoot ingest, it helps to know how the original payload is stored.
Field Breakdown
_raw_log- Most common for text or unstructured sources. Holds the line as a string, as received. Common for syslog, CEF, LEEF, CSV, and similar. Functions such asregextract(),split(), andarrayindex()on text usually use_raw_log._raw_json- Used for JSON ingests: APIs, HTTP collectors, cloud sources, Strata Logging Service with JSON, and so on. You can usejson_extract_scalar(), JSON arrow syntax, and similar without parsing from a string._raw- An internal field, including inxdr_dataand some native Palo Alto datasets. It reflects XSIAM’s internal view of the event. Exact use varies by dataset.
When Each Applies
The fields you see are determined by two factors:
- The ingestion path (Broker VM, HTTP collector, XDR collector, Cribl, and so on), and
- The format (declared or detected) decide which field you see.
In general:
| Incoming Data Format | Typical Ingestion | XSIAM Target Field | Data Type |
|---|---|---|---|
| Unstructured Text (Syslog, CEF, LEEF) | Broker VM, Syslog Collector | _raw_log | String |
| Structured JSON | API, HTTP Collector, Cloud Integrations | _raw_json | Object |
| Agent-Native Data | XDR Collector / Native Agent | _raw | Internal/Varies |
Cribl and XSIAM Resources
- Palo Alto XSIAM Pack (Pipelines and samples) - Cortex XSIAM Destination configuration support.
- Ingest data from Cribl - XSIAM documentation.
- Cortex XSIAM Destination - Cribl product documentation.
- Data source UUIDs - XSIAM documentation.
- Palo Alto content GitHub (Packs, parsers, data models) - Reference content.