Cribl LogStream – Docs

Cribl LogStream Documentation

Questions? We'd love to help you! Meet us in #Cribl Community Slack (sign up here)
Download entire manual as PDF - v2.4.5

Regex Extract

Description

The Regex Extract Function extracts fields using regex named groups. (In Splunk, these will be index-time fields). Fields that start with __ (double underscore) are special fields in Cribl LogStream. They are ephemeral: they can be used by any Function downstream, but will not be added to events, and will not exit the Pipeline.

Usage

Filter: Filter expression (JS) that selects data to be fed through the Function. Defaults to true, meaning that all events will be evaluated.

Description: Simple description about this Function. Defaults to empty.

Final: If true, stops data from being fed to downstream Functions. Defaults to No.

Regex: Regex literal. Must contain named capturing groups, e.g.: (?<foo>bar). Can contain special _NAME_N and _VALUE_N capturing groups, which extract both the name and value of a field, e.g.: (?<_NAME_0>[^\s=]+)=(?<_VALUE_0>[^\s]+). Defaults to empty. See Examples below.

Additional regex: Click + Add Regex to chain extra regex conditions.

Source field: Field on which to perform regex field extraction. Nested addressing is supported. Defaults to _raw.

Advanced Settings

Max exec: The maximum number of times to apply the Regex to the source field when the global flag is set, or when using _NAME_N and _VALUE_N capturing groups. Named capturing groups will always use a value of 1. Defaults to 100.

Field name format expression: JavaScript expression to format field names when _NAME_n and _VALUE_n capturing groups are used. E.g., to append XX to all field names, use: `${name}_XX` (backticks are literal). If not specified, names will be sanitized using regex: /^[_0-9]+|[^a-zA-Z0-9_]+/g. The original field name is in the global name.

Overwrite existing fields: Whether to overwrite existing event fields with extracted values. If set to No (the default), existing fields will be converted to an array. If toggled to Yes, Regex Extract will create array fields if applied multiple times, or if fields exist. (E.g., if src_ip is extracted in an input Pipeline where it is assigned a value of 10.1.2.2, and is also in a processing Pipeline with a value of 10.2.3.3, then the resulting field is ["10.1.2.2", "10.2.3.3"].)

Examples

Example 1

Assume a simple event that looks like this: metric1=23 metric2=42 dc=23 abc=xyz

Extract only the metric1 field:

Regex: metric1=(?<metric1>\d+)
Result: metric1:"23"

Example 2

Use the first line of the sample here:
https://github.com/weeb-cribl/cribl-samples/blob/master/parser/functions/parser/cisco_estreamer.log

Extract all k=v pairs, and add an _XX suffix to the end of the extracted fields:

Regex: (?<_NAME_0>[^\s]+)=(?<_VALUE_0>[^\s]+)
Field Name Format Expression: `${name}_XX`

Result:

Example 3

This example uses similar syntax as Example 2, but with a more complex event structure.

In the right Sample Data pane, click Paste and insert the following sample:

<134>1 2020-12-22T17:06:08Z CORP_INT_NLB CheckPoint 18160 - [action:"Accept"; conn_direction:"Internal"; flags:"4606212"; ifdir:"inbound"; ifname:"bond2.1025"; logid:"0"; loguid:"{0x5fe25889,0x0,0x80ad57cd,0xeb91c0c3}"; origin:"192.168.20.54"; originsicname:"CN=TST32-VSX0-FW-DC-01_tst302-shd,O=CORP-SEC-SHRD-CMA..t7xpcz"; sequencenum:"3"; time:"1608656768"; version:"5"; __policy_id_tag:"product=VPN-1 & FireWall-1[db_tag={15E4B45A-663B-5B49-BD59-CD9B9F21AA16};mgmt=SHRDFW01CON;date=1608236862;policy_name=TEST-SHRD-POL\]"; dst:"192.168.79.20"; log_delay:"1608656768"; layer_name:”TEST-SHRD-POL Security"; layer_uuid:"e914c2f3-d7bd-4a77-8e7a-7a5e403447aa"; match_id:"1"; parent_rule:"0"; rule_action:"Accept"; rule_uid:"001ab86d-d201-4b61-9b64-0fede1a9f059"; product:"VPN-1 & FireWall-1"; proto:"17"; s_port:"45519"; service:"123"; service_id:"ntp-udp"; src:"192.168.79.22"; ]

This event is from a CheckPoint Firewall CMA system. With this type of event structure, properly extracting each event field into a separate metadata field requires two-stage processing. So we'll use two Regex Extract Functions.

The first Regex Function splits the event to separate the actual data from the header information. We'll split after the CheckPoint 18160 string, by capturing everything between the [ and ]:

Regex; \[(?<__fields>.*)\]
Source: _raw

Next, add this second Regex Extract Function to extract all k=v pairs:

Regex: (?<_NAME_0>[^ :]+):(?<_VALUE_0>[^;]+);
Source: __fields

Results:

📘

For further examples, see Using Cribl to Analyze DNS Logs in Real Time – Part 2.

Updated 4 months ago

Regex Extract


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.