Regex Extract
Regex Extract uses regex capture groups to pull structured data out of raw event strings.
Use this Function to:
- Structure raw logs: Turn long, unformatted strings into specific fields like a timestamp, severity level, or a message.
- Extract names and values from strings: Use regex patterns to identify and parse key-value pairs from an event into separate fields. This can be useful for automating key-value discovery, allowing you to find and create fields dynamically without manually defining every field name.
- Clean and reduce data: Isolate the important parts of a log message while dropping unnecessary data.
This Function uses the JavaScript (ECMAScript) regex engine.
How the Regex Extract Function Works
The Regex Extract Function processes data in the following order:
Ingest the event from the fields you supply: Using the field(s) you specify in the Source field setting, the Function retrieves the event string and converts the field to a string before applying the regex. This applies to numbers and booleans too, which means that
42becomes"42".- The default Source field value is
_raw, which means this Function will apply regex to the string contained in the_rawfield of every event. - If the Source field value is blank (null), the Function will silently fail.
- The default Source field value is
Evaluate regex patterns: The Function compares the event string against each configured regex pattern up to the iteration limit defined by Max exec setting (default 100).
Process each successful match: For each match found, the Function processes the capture group as follows:
Named capture groups: Creates fields using the group name as the field name and the captured text as the value. For example, if your event contains
...disk_stack 2025-07-28 14:30..., the pattern(?<timestamp>\d{4}-\d{2}-\d{2})extracts the date and creates a field calledtimestampwith the value2025-07-28. See Single Named Capture Group for a usage example.Key-value capture pairs: Uses the special
_NAME_Nand_VALUE_Ntokens within your regex string to dynamically extract multiple fields at once. When the Function identifies these tokens, it treats them as a coordinated pair: the text captured by_NAME_Nbecomes the field name, and the text captured by_VALUE_Nbecomes the corresponding value. For example, if an event containsuser=john, the pattern(?<_NAME_1>\w+)=(?<_VALUE_1>\w+)identifiesuseras the name andjohnas the value, creating a new event field:user: "john". See Key-Value Pairs From Multiple Fields for a usage example.
Apply field name formatting or sanitization: The Function evaluates and modifies extracted field names using the regex defined in the Field name format expression setting.
Manage existing fields: The Function uses the Overwrite existing fields toggle to determine how to handle existing fields. When a field already exists in the event:
- If overwrite is toggled on, the Function replaces the field’s value.
- If overwrite is toggled off, the Function retains both values as an array containing the original and new values.
Return the enriched event: The enriched event is returned with all newly extracted fields added to the top of the event.
Considerations
When using this Function, keep these considerations in mind:
- Fields that start with
__(double underscore) are special in Cribl Stream. They are ephemeral, which means any Function downstream can use them, but they will only be serialized to Cribl internal Destinations. - Some source types don’t have a
_rawfield. For those sources, choose another field that contains the event data you want to extract. For example, parquet format logs do not have a_rawfield. - If the Source field value is blank (null), the Regex Extract Function will silently fail.
- This Function uses the JavaScript (ECMAScript) regex engine. If you test patterns in external tools such as Regex101, select the ECMAScript (JavaScript) flavor. For more information, see Regex Flavor.
- The optional regex flags will impact how matches are performed. In particular the global match flag
g, will cause Cribl Stream to continue checking the rest of the string after any match. If multiple matches happen, the resulting field will contain an array of matches. The other flags operate as standard in the ECMAScript regex library.Regex Flag Purpose gCribl Stream only: Continue checking the rest of the string after any match mMultiline iCase insensitive sDot all ySticky uUnicode
Configure the Regex Extract Function
To configure this Function:
Add a Source and Destination to begin generating data. See Integrations for more information.
Add a new Pipeline or open an existing Pipeline. Add or capture sample data test your Pipeline as you build it. See Pipelines for more information.
At the top of the Pipeline, select Add Function and search for
Regex Extract, then select it.In the Regex Extract modal, configure the following general settings:
Filter: A JavaScript filter expression that selects which data to process through the Function. Defaults to
true, meaning it evaluates all data events. If you don’t want to use the default, you can use a JavaScript expression to apply this Function to specific data events. See Build Custom Logic to Route and Process Your Data for examples.Description: A brief description of how this Function modifies your data to help other users understand its purpose later. Defaults to empty.
Final: When enabled, stops data from continuing to downstream Functions for additional processing. Default: Off. Toggle Final on if Regex Extract is the last Function in your Pipeline.
Regex: Regex literal. Must contain named capturing groups:
(?<foo>bar), wherefoois the field name andbaris the pattern to match. Can contain special_NAME_Nand_VALUE_Ncapturing groups, which extract both the name and value of a field:(?<_NAME_0>[^\s=]+)=(?<_VALUE_0>[^\s]+). Defaults to empty. See Usage Examples for more information.Additional regex: Select Add Regex to chain extra regex conditions. The expressions are evaluated in order from first to last.
Source field: The field on which to perform regex field extraction. This field supports nested addressing. Defaults to
_raw, which means this Function will apply regex to the string contained in the_rawfield of every event.If the Source field value is blank (null), the Function will silently fail.
Overwrite existing fields: Toggle to indicate whether to keep or overwrite existing field values. If toggled on, Regex Extract will replace existing field values with the new extracted value. If toggled off (default), the Function will convert existing fields to an array and keep the existing and new values. For example, if you extract
src_ipin an input Pipeline where the assigned value is10.1.2.2, and in a processing Pipeline with a value of10.2.3.3, then the resulting field is["10.1.2.2", "10.2.3.3"].
Configure additional settings as needed. For more information, see: Additional Settings.
Test that you configured the Function correctly by comparing sample incoming data with outgoing data in the Data Preview pane and Pipeline Diagnostics tool. See Data Preview and Metrics View for more information.
Additional Settings
Max exec: The maximum number of times the regex pattern should loop through the source field to find matches. This applies when the global match flag g is enabled or when using iterative capturing groups (_NAME_N / _VALUE_N). Named groups always execute once. Iterative groups will extract up to the specified limit. Defaults to 100.
Field name format expression: A JavaScript expression used to transform or prefix the field names generated by _NAME_N capturing groups. Use the global name variable to reference the discovered field name. You can also access event data using __e.<fieldName>. For example, to add a prefix to all discovered fields, use: pre_${name}. To add a suffix, use: ${name}_pre.
If left blank, Cribl Stream automatically sanitizes names by removing invalid characters (such as spaces or special symbols) to ensure they are valid JavaScript identifiers.
Usage Examples
This section provides a variety of examples of typical regex patterns that could be used with the Regex Extract Function.
Single Field From Simple Event
This example demonstrates how to isolate a single piece of data from a string of multiple key-value pairs. This pattern is ideal when you have a noisy log line but only need to promote one specific value into a searchable field for downstream analysis.
By using a named capture group (indicated by the (?<name>pattern) syntax), you tell Cribl Stream exactly what to call the new field and which part of the string to capture.
Sample event:
{
"_raw": "metric1=23, metric2=42, dc=23, abc=xyz"
}Regex:
To extract only the metric1 field, use: metric1=(?<metric1>\d+)
Resulting output:
{
"_raw": "metric1=23, metric2=42, dc=23, abc=xyz",
"metric1": "23"
}Single Named Capture Group
This example illustrates how to use a Named Capture Group to isolate a specific piece of data. In this case, the goal is to extract a date from a larger string and promote it to a top-level field. This is the most common method for structuring specific parts of an event when you know the exact format of the data you need.
Sample event:
{
"_raw": "<134>1 2025-07-28T14:30:15Z host-01 conn_id=992 message=\"Connection accepted\""
}Regex:
(?
Resulting output:
The Function finds the pattern within the _raw event and creates a new field. The regex only looks for the date, the rest of the event remains intact.
{
"_raw": "<134>1 2025-07-28T14:30:15Z host-01 conn_id=992 message=\"Connection accepted\"",
"timestamp": "2025-07-28"
}Extract Multiple Fields
This example covers the full schema extraction pattern. While the previous example showed how to grab a single value from a string, this pattern can add structure to an unstructured log line.
Use this approach when you have a predictable log format (like NGINX, Apache, or Syslog) and you want to transform the entire raw string into a set of discrete, labeled fields. By mapping every segment of the log to a named capture group, you turn a difficult-to-search text block into a fully indexed JSON object.
Sample event:
{
"_raw": "462559d4a487[471]: 172.23.0.6 - - [26/Feb/2024:20:22:38 +0000] \"GET /catalog/view/javascript/jquery/jquery-3.7.1.min.js HTTP/1.1\" 200 87533"
}Regex:
This regex pattern parses a log entry and decomposes the string into individual, searchable fields. It works by anchoring to the start of the line (^) and the end ($), ensuring it captures the entire log entry rather than just a fragment.
^(?<container_id>[^\s]+)\[(?<process_id>\d+)\]:\s+(?<remote_host>[^\s]+)\s+(?<remote_user>-)\s+(?<auth_user>-)\s+\[(?<timestamp>[^\]]+)\]\s+"(?<request_method>\w+)\s+(?<requested_url>[^\s]+)\s+(?<http_version>[^"]+)"\s+(?<status>\d+)\s+(?<bytes>.+)$
Resulting output:
When this regex is applied to the sample event, Cribl Stream populates the metadata with the following fields. Notice how the capture group names in the regex (?<remote_host>) become the keys in the output.
{
"_raw": "462559d4a487[471]: 172.23.0.6 - - [26/Feb/2024:20:22:38 +0000] \"GET /catalog/view/javascript/jquery/jquery-3.7.1.min.js HTTP/1.1\" 200 87533",
"container_id": "462559d4a487",
"process_id": "471",
"remote_host": "172.23.0.6",
"remote_user": "-",
"auth_user": "-",
"timestamp": "26/Feb/2024:20:22:38 +0000",
"request_method": "GET",
"requested_url": "/catalog/view/javascript/jquery/jquery-3.7.1.min.js",
"http_version": "HTTP/1.1",
"status": "200",
"bytes": "87533"
}Key-Value Pairs From Multiple Fields
This example covers the iterative discovery pattern. While the previous examples mapped specific parts of a string to static field names, this pattern scans a large block of text and automatically extract every key-value pair it finds.
This approach is essential for high-volume logs (like firewall or proxy data) where the number and order of fields can vary from one event to the next. Instead of writing a massive regex to account for every possible field, you can use special iterative capture groups: _NAME_N and _VALUE_N.
By combining this regex with a Field name format expression, you can dynamically transform the discovered keys (such as appending a suffix) to prevent field collisions or to identify the source of the data.
Sample event:
{
"_raw": "rec_type=71 rec_type_simple=RNA dest_port=443 snmp_out=0 netflow_src=\"00000000-0000-0000-0000-000000000000\" ssl_server_cert_status=\"Not Checked\" dest_ip=172.20.115.42 sec_intel_event=No mac_address=00:00:00:00:00:00 dest_bytes=3746 dest_autonomous_system=0 security_context=00000000000000000000000000000000 src_port=41925 web_app=Unknown url=https://outlook.ssg.petsmart.com url_reputation=\"Risk unknown\" first_pkt_sec=1543598207 vlan_id=0 ssl_flow_error=0 ssl_actual_action=Unknown has_ipv6=1 monitor_rule_6=N/A monitor_rule_7=N/A monitor_rule_4=N/A monitor_rule_5=N/A monitor_rule_2=N/A monitor_rule_3=N/A ips_count=0 monitor_rule_1=N/A dest_tos=0 src_ip=192.168.228.5 referenced_host=\"\" iface_ingress=DMZ3.30 monitor_rule_8=0 event_subtype=1 fw_rule_reason=N/A event_type=1003 ssl_version=Unknown dns_resp_id=0 sensor=ssg-inet-fpr-ftd-fw01 sec_zone_egress=Inside src_tos=0 client_app=\"SSL client\" snmp_in=0 user=Unknown ssl_flow_messages=0 iface_egress=inside http_referrer=\"\" src_pkts=0 event_desc=\"Flow Statistics\" event_usec=0 client_version=\"\" fw_rule_action=Allow ssl_cert_fingerprint=0000000000000000000000000000000000000000 ssl_url_category=0 file_count=0 sec_zone_ingress=DMZ3 instance_id=6 src_bytes=1013 src_ip_country=unknown ssl_cipher_suite=TLS_NULL_WITH_NULL_NULL user_agent=\"\" http_response=0 src_mask=0 dest_mask=0 sec_intel_ip=N/A netbios_domain=\"\" tcp_flags=0 dns_rec_id=0 fw_policy=\"SSG INET Access Control Policy\" last_pkt_sec=1543598207 legacy_ip_address=0.0.0.0 ip_proto=TCP connection_id=21378 dest_pkts=0 app_proto=HTTPS ssl_flow_status=Unknown ssl_rule_id=0 ssl_session_id=0000000000000000000000000000000000000000000000000000000000000000 dns_query=\"\" rec_type_desc=\"Connection Statistics\" url_category=Unknown fw_rule=\"Outbound Web\" src_autonomous_system=0 ssl_flow_flags=0 ip_layer=0 event_sec=1543598205 ssl_ticket_id=0000000000000000000000000000000000000000 sinkhole_uuid=00000000-0000-0000-0000-000000000000 dest_ip_country=unknown ssl_expected_action=Unknown num_ioc=0 dns_ttl=0 ssl_policy_id=00000000000000000000000000000000 ssl_server_name=\"\""
}Use a regex to extract all key-value pairs, then use Field Name Format Expression to append an _XX suffix to each extracted field:
- Regex:
(?<_NAME_0>[\w-]+)="?(?<_VALUE_0>(?<=")[^"]*|\S*) - Field Name Format Expression:
${name}_XX
Resulting output:
{
"_raw": "rec_type=71 rec_type_simple=RNA dest_port=443 snmp_out=0 netflow_src=\"00000000-0000-0000-0000-000000000000\" ssl_server_cert_status=\"Not Checked\" dest_ip=172.20.115.42 sec_intel_event=No mac_address=00:00:00:00:00:00 dest_bytes=3746 dest_autonomous_system=0 security_context=00000000000000000000000000000000 src_port=41925 web_app=Unknown url=https://outlook.ssg.petsmart.com url_reputation=\"Risk unknown\" first_pkt_sec=1543598207 vlan_id=0 ssl_flow_error=0 ssl_actual_action=Unknown has_ipv6=1 monitor_rule_6=N/A monitor_rule_7=N/A monitor_rule_4=N/A monitor_rule_5=N/A monitor_rule_2=N/A monitor_rule_3=N/A ips_count=0 monitor_rule_1=N/A dest_tos=0 src_ip=192.168.228.5 referenced_host=\"\" iface_ingress=DMZ3.30 monitor_rule_8=0 event_subtype=1 fw_rule_reason=N/A event_type=1003 ssl_version=Unknown dns_resp_id=0 sensor=ssg-inet-fpr-ftd-fw01 sec_zone_egress=Inside src_tos=0 client_app=\"SSL client\" snmp_in=0 user=Unknown ssl_flow_messages=0 iface_egress=inside http_referrer=\"\" src_pkts=0 event_desc=\"Flow Statistics\" event_usec=0 client_version=\"\" fw_rule_action=Allow ssl_cert_fingerprint=0000000000000000000000000000000000000000 ssl_url_category=0 file_count=0 sec_zone_ingress=DMZ3 instance_id=6 src_bytes=1013 src_ip_country=unknown ssl_cipher_suite=TLS_NULL_WITH_NULL_NULL user_agent=\"\" http_response=0 src_mask=0 dest_mask=0 sec_intel_ip=N/A netbios_domain=\"\" tcp_flags=0 dns_rec_id=0 fw_policy=\"SSG INET Access Control Policy\" last_pkt_sec=1543598207 legacy_ip_address=0.0.0.0 ip_proto=TCP connection_id=21378 dest_pkts=0 app_proto=HTTPS ssl_flow_status=Unknown ssl_rule_id=0 ssl_session_id=0000000000000000000000000000000000000000000000000000000000000000 dns_query=\"\" rec_type_desc=\"Connection Statistics\" url_category=Unknown fw_rule=\"Outbound Web\" src_autonomous_system=0 ssl_flow_flags=0 ip_layer=0 event_sec=1543598205 ssl_ticket_id=0000000000000000000000000000000000000000 sinkhole_uuid=00000000-0000-0000-0000-000000000000 dest_ip_country=unknown ssl_expected_action=Unknown num_ioc=0 dns_ttl=0 ssl_policy_id=00000000000000000000000000000000 ssl_server_name=\"\"",
"rec_type_XX": "71",
"rec_type_simple_XX": "RNA",
"dest_port_XX": "443",
"snmp_out_XX": "0",
"netflow_src_XX": "00000000-0000-0000-0000-000000000000",
"ssl_server_cert_status_XX": "Not Checked",
"dest_ip_XX": "172.20.115.42",
"sec_intel_event_XX": "No",
"url_XX": "https://outlook.ssg.petsmart.com",
"fw_policy_XX": "SSG INET Access Control Policy",
"app_proto_XX": "HTTPS"
}Multi-Stage Extraction for Complex Events
This example builds on the previous one to demonstrate the nested extraction pattern. This strategy is essential for complex logs (like Check Point or Cisco ASA) where the data you need is encapsulated within a specific part of a larger header.
In many enterprise logs, the raw string contains a fixed header followed by a payload of key-value pairs wrapped in brackets or quotes. Trying to extract everything with a single regex often leads to matching errors or performance issues.
Instead, use a two-stage approach:
Isolation: Use a primary regex operation to extract the bracketed payload and store it in a temporary, internal field (prefixed with
__).Iterative parsing: Use a secondary regex operation targeting that temporary field to break out the individual key-value pairs.
Sample event:
{
"_raw": "<134>1 2020-12-22T17:06:08Z CORP_INT_NLB CheckPoint 18160 - [action:\"Accept\"; conn_direction:\"Internal\"; flags:\"4606212\"; ifdir:\"inbound\"; ifname:\"bond2.1025\"; logid:\"0\"; loguid:\"{0x5fe25889,0x0,0x80ad57cd,0xeb91c0c3}\"; origin:\"192.168.20.54\"; originsicname:\"CN=TST32-VSX0-FW-DC-01_tst302-shd,O=CORP-SEC-SHRD-CMA..t7xpcz\"; sequencenum:\"3\"; time:\"1608656768\"; version:\"5\"; __policy_id_tag:\"product=VPN-1 & FireWall-1[db_tag={15E4B45A-663B-5B49-BD59-CD9B9F21AA16};mgmt=SHRDFW01CON;date=1608236862;policy_name=TEST-SHRD-POL\\\\]\"; dst:\"192.168.79.20\"; log_delay:\"1608656768\"; layer_name:\"TEST-SHRD-POL Security\"; layer_uuid:\"e914c2f3-d7bd-4a77-8e7a-7a5e403447aa\"; match_id:\"1\"; parent_rule:\"0\"; rule_action:\"Accept\"; rule_uid:\"001ab86d-d201-4b61-9b64-0fede1a9f059\"; product:\"VPN-1 & FireWall-1\"; proto:\"17\"; s_port:\"45519\"; service:\"123\"; service_id:\"ntp-udp\"; src:\"192.168.79.22\"; ]"
}Regex:
The first Regex Function splits the event to separate the actual data from the header information. It splits the string after the CheckPoint 18160 string, by capturing everything between the [ and ]:
- Regex:
\[(?<__fields>.*)\] - Source:
_raw
The second Regex Extract Function extracts all key-value pairs:
- Regex:
(?<_NAME_0>[^ :]+):(?<_VALUE_0>[^;]+); - Source:
__fields
Resulting output:
{
"_raw": "<134>1 2020-12-22T17:06:08Z CORP_INT_NLB CheckPoint 18160 - [action:\"Accept\"; conn_direction:\"Internal\"; flags:\"4606212\"; ifdir:\"inbound\"; ifname:\"bond2.1025\"; logid:\"0\"; loguid:\"{0x5fe25889,0x0,0x80ad57cd,0xeb91c0c3}\"; origin:\"192.168.20.54\"; originsicname:\"CN=TST32-VSX0-FW-DC-01_tst302-shd,O=CORP-SEC-SHRD-CMA..t7xpcz\"; sequencenum:\"3\"; time:\"1608656768\"; version:\"5\"; __policy_id_tag:\"product=VPN-1 & FireWall-1[db_tag={15E4B45A-663B-5B49-BD59-CD9B9F21AA16};mgmt=SHRDFW01CON;date=1608236862;policy_name=TEST-SHRD-POL\\\\]\"; dst:\"192.168.79.20\"; log_delay:\"1608656768\"; layer_name:\"TEST-SHRD-POL Security\"; layer_uuid:\"e914c2f3-d7bd-4a77-8e7a-7a5e403447aa\"; match_id:\"1\"; parent_rule:\"0\"; rule_action:\"Accept\"; rule_uid:\"001ab86d-d201-4b61-9b64-0fede1a9f059\"; product:\"VPN-1 & FireWall-1\"; proto:\"17\"; s_port:\"45519\"; service:\"123\"; service_id:\"ntp-udp\"; src:\"192.168.79.22\"; ]",
"action": "\"Accept\"",
"conn_direction": "\"Internal\"",
"flags": "\"4606212\"",
"ifdir": "\"inbound\"",
"ifname": "\"bond2.1025\"",
"logid": "\"0\"",
"loguid": "\"{0x5fe25889,0x0,0x80ad57cd,0xeb91c0c3}\"",
"origin": "\"192.168.20.54\"",
"dst": "\"192.168.79.20\"",
"layer_name": "\"TEST-SHRD-POL Security\"",
"proto": "\"17\"",
"s_port": "\"45519\"",
"src": "\"192.168.79.22\"",
"__fields": "action:\"Accept\"; conn_direction:\"Internal\"; flags:\"4606212\"; ifdir:\"inbound\"; logid:\"0\"; loguid:\"{0x5fe25889,0x0,0x80ad57cd,0xeb91c0c3}\"; origin:\"192.168.20.54\"; dst:\"192.168.79.20\"; layer_name:\"TEST-SHRD-POL Security\"; proto: \"17\"; s_port: \"45519\"; src: \"192.168.79.22\""
}