On This Page

Home / Stream/ Integrations/ Integrating with Other Services/Onboarding Data Sources to Cortex XSIAM with Cribl Stream

Onboarding Data Sources to Cortex XSIAM with Cribl Stream

This guide outlines a practical, end-to-end process for onboarding third-party data sources into Cortex XSIAM using Cribl Stream, and explains how datasets, parsers, and data models (XDM) work together.

High-Level Onboarding Flow

  1. Start from the XSIAM perspective.

    • Get certified on using XSIAM as an Analyst or Engineer. Get certified on Cribl Stream. (It’s free!)
    • Conduct a data source assessment so you have a solid understanding of which data sources are supported by XSIAM out of the box. Everything about ingest is centered on XSIAM, whether you use XSIAM connectors or Cribl Stream to route data.
    • Identify the expected event format and whether an XSIAM data model (XDM) exists for the target data source.
    • If the data model expects a specific format or field set, Cribl Stream must emit that. In general, XSIAM expects events to look as they did when produced by the source. If no Content Pack exists in the Palo Alto XSIAM Marketplace that includes a data model, create a data model in XSIAM.
  2. Format and tag events correctly in Cribl Stream.

    • Use an Event Breaker to ensure events are formatted to meet XSIAM’s expectations.
    • For all events you send to XSIAM, set:
      • __sourceIdentifier
      • __vendor
      • __product
    • The Cortex XSIAM Destination maps these to the HTTP headers XSIAM uses for parsing and data model mapping.
  3. Send data to Cortex XSIAM.

    • Configure the Cortex XSIAM Destination in Cribl Stream and route the relevant Sources to it.
    • In XSIAM, validate delivery using the Investigator (XQL search).
  4. Confirm dataset and data model behavior.

    • Data is written to Cortex Data Lake (the underlying store) and is searchable in the XSIAM Investigator.
    • For XSIAM analytics, the dataset must be mapped to an XSIAM data model (XDM). This mapping is what surfaces fields such as XDM.* for analytics, correlation, and stitching.
    • In practice, it does not matter whether you use a standard XSIAM connector, the XSIAM Broker VM, or the Cribl XSIAM integration. The important consideration is whether events arrive in the right format with the required identifiers.
  5. Verify XDM mapping (when a data model exists).

    • In XSIAM data model rules, look for:
      • dataset = vendor_product_raw
    • If the dataset is missing:
      • Install the right content pack from the XSIAM Marketplace, or
      • Build your own data model rules to map into XDM.
    • If no data model exists and you do not add one, data is still fully searchable in the XSIAM Investigator; it is just not normalized into XDM for analytics.
    • To confirm XDM.* fields are populated, run a query like:
    datamodel dataset = vendor_product_raw

How XSIAM Ingests and Normalizes Events

XSIAM ingests events in the following sequence:

  1. Raw log ingested. Events arrive through a Collector, Broker VM, or the Cribl integration endpoint.

  2. Parsing and dataset assignment.

    • XSIAM’s parser tries to match an existing parser.

    • On match, XSIAM sets:

      dataset = "vendor_product_raw"
    • If no parser matches and the payload is JSON, CEF, LEEF syslog with key-value fields, XSIAM can still write KV pairs to Cortex Data Lake.

    • If no parser matches and the payload is plain text, XSIAM often writes at least _time and _raw or _raw_log, depending on the Source. See Understanding How Raw Fields Are Stored for more information.

  3. Write to Cortex Data Lake. Parsed and/or raw fields land in Cortex Data Lake. Data models are not applied at this stage. The fields remain raw and parsed.

  4. Schema-at-read normalization. XSIAM normalizes to the Cortex Data Model (XDM) at read time. Analytics, stitching, and most higher-level use cases use this XDM layer.

Configuring the Cribl XSIAM Destination

Destination Behavior

For more information, see:

Required Identifiers and Headers

Cribl Stream sends identifiers to XSIAM as HTTP headers. These headers tell XSIAM which integration the events use, the vendor and product, and how to send traffic to the right parsers, datasets, and (where available) data models. In the Pipeline, set and preserve:

  • __sourceIdentifier - Required for all events. Maps to the Source-Identifier header.
  • __inputId - Required; set automatically by Cribl Stream. Do not remove this field. Maps to the Integration-Identifier header.
  • __vendor - Maps to the vendor header.
  • __product - Maps to the product header.
  • format - Usually set by the Cortex XSIAM Destination, often raw or json.

UUIDs and Built-In Analytics

Some XSIAM data sources use data source UUIDs to enable advanced, built-in analytics (such as baselining, ML, and specialized stitching) alongside what is written to Cortex Data Lake.

If a documented UUID exists for the source, set it on the Cribl XSIAM Destination. XSIAM can then apply the right parsers, data models, and analytics tied to that UUID.

For reference, see Data source UUIDs in the XSIAM documentation.

Configuration Patterns

  • Data sources with an assigned UUID: Look up the UUID in the XSIAM documentation. In the Cribl XSIAM Destination, set the UUID and make sure __vendor and __product match the documented values for parsers and data models.

  • Data sources without an assigned UUID: Find the XSIAM content pack in the Marketplace. From the pack’s parser and data model rules, read the expected __vendor and __product. In Cribl Stream, set those and a consistent __sourceIdentifier. Examples and defaults appear in the Palo Alto XSIAM Pack and the Data source UUIDs page, including the Other/non-XSIAM integration UUID. XSIAM uses the headers to route to the correct dataset and parsing or modeling for that pack.

  • First-party Palo Alto Networks data: Do not send Palo Alto first-party data (for example, NGFW, native Cortex XDR agent telemetry) through Cribl if you depend on all out-of-the-box advanced analytics. Ingesting that data directly into Cortex Data Lake / Strata Logging Service uses data-source-specific analytics and full stitching. An intermediate path can alter or miss some of that processing.

Understanding How Raw Fields Are Stored

When you troubleshoot ingest, it helps to know how the original payload is stored.

Field Breakdown

  • _raw_log - Most common for text or unstructured sources. Holds the line as a string, as received. Common for syslog, CEF, LEEF, CSV, and similar. Functions such as regextract(), split(), and arrayindex() on text usually use _raw_log.
  • _raw_json - Used for JSON ingests: APIs, HTTP collectors, cloud sources, Strata Logging Service with JSON, and so on. You can use json_extract_scalar(), JSON arrow syntax, and similar without parsing from a string.
  • _raw - An internal field, including in xdr_data and some native Palo Alto datasets. It reflects XSIAM’s internal view of the event. Exact use varies by dataset.

When Each Applies

The fields you see are determined by two factors:

  • The ingestion path (Broker VM, HTTP collector, XDR collector, Cribl, and so on), and
  • The format (declared or detected) decide which field you see.

In general:

Incoming Data FormatTypical IngestionXSIAM Target FieldData Type
Unstructured Text
(Syslog, CEF, LEEF)
Broker VM, Syslog Collector_raw_logString
Structured JSONAPI, HTTP Collector, Cloud Integrations_raw_jsonObject
Agent-Native DataXDR Collector / Native Agent_rawInternal/Varies

Cribl and XSIAM Resources