Lookup

The Lookup Function enriches events with external fields, using lookup table files in CSV, compressed .csv.gz, or binary .mmdb format.

Usage

Filter: Filter expression (JS) that selects data to feed through the Function. Defaults to true, meaning it evaluates all events.

Description: Simple description about this Function. Defaults to empty.

Final: If toggled to Yes, stops feeding data to the downstream Functions. Defaults to No.

Lookup file path (.csv, .csv.gz): Path to the lookup file. Select an existing file that you’ve uploaded via Cribl Stream’s UI at Knowledge > Lookups Library, or specify the path. You can reference environment variables via $, e.g.: $CRIBL_HOME/file.csv.

When you configure this field via a distributed deployment’s Leader Node, Cribl Stream will swap $CRIBL_HOME/groups/<groupname>/ for $CRIBL_HOME when validating whether the file exists. In this case, the default upload path changes from $CRIBL_HOME/data/lookups (single-instance deployments) to $CRIBL_HOME/groups/<groupname>/data/lookups/ (distributed deployments).

Match mode: Defines the format of the lookup file, and indicates the matching logic that will be performed. Defaults to Exact.

Match type: For CIDR and Regex Match modes, this attribute refines how to resolve multiple matches. First match will return the first matching entry. Most specific will scan all entries, finding the most specific match. All will return all matches in the output, as arrays. (Defaults to First match. Not displayed for Exact Match mode.)

Lookup fields (.csv): Field(s) that should be used to key into the lookup table.

  • Lookup field name in event: Exact field name as it appears in events. Nested addressing supported.
  • Corresponding field name in lookup: The field name as it appears in the lookup file. Defaults to the Lookup field name in event value. This input is optional.
Case-Sensitive / Multiple Matches

Lookups are case-sensitive by default. (See the Ignore case option below.)

If the lookup file contains duplicate key names with different values, all Match modes of this Function will use only the value in the key’s final instance, ignoring all preceding instances.

Output field(s): Field(s) to add to events after matching the lookup table. Defaults to all if not specified.

  • Output field name from lookup: Field name, as it appears in the lookup file.
  • Lookup field name in event: Field name to add to event. Defaults to the lookup field name. This input is optional. Nested addressing is supported.

Advanced Settings

Reload period (sec): To periodically check the underlying file for mod-time changes, and reload the file if necessary, change the default -1 value (disabled) to a positive integer representing the check interval in seconds.

In distributed deployments, enabling a Reload period can generate conflicts with configuration updates, causing Pipelines to skip executing some Lookup Functions. Cribl recommends that you enable it only for lookup files not managed by Cribl Stream’s UI, or lookup files that can be updated by an external process. (E.g., a threat list that you update via a cron job.)

For lookup files that are managed by Cribl Stream’s UI, a distributed Cribl Stream deployment will override this setting as necessary – skipping checks to prevent conflicts that could trigger skipped lookups. These restrictions do not apply to single-instance deployments.

Ignore case: Ignore case when performing Match mode: Exact lookups. Defaults to No.

Add to raw event: Whether to append the looked-up values to the _raw field, as key=value pairs. Defaults to No.

Examples

Example 1: Regex Lookups

Assign a sourcetype field to events if their _raw field matches a particular regex.

paloalto.csv
regex,sourcetype
"^[^,]+,[^,]+,[^,]+,THREAT",pan:threat
"^[^,]+,[^,]+,[^,]+,TRAFFIC",pan:traffic
"^[^,]+,[^,]+,[^,]+,SYSTEM",pan:system

Match mode: Regex

Match type: First match

When using the Lookup Function with Regex and First match, ensure that your lookup file contains no empty lines - not even at the bottom. Any empty row will cause the function to always return true.

Lookup field name in event: _raw

Corresponding field name in lookup: regex

Events before and after
### BEFORE:

{"_raw": "Sep 20 13:03:55 PA-VM 1,2018/09/20 13:03:58,FOOBAR,TRAFFIC,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0"}
{"_raw": "Sep 20 13:03:55 PA-VM 1,2018/09/20 13:03:58,FOOBAR,THREAT,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0"}


### AFTER:

{"_raw": "Sep 20 13:03:55 PA-VM 1,2018/09/20 13:03:58,FOOBAR,TRAFFIC,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0",
  "sourcetype": "pan:traffic"
  }
{"_raw": "Sep 20 13:03:55 PA-VM 1,2018/09/20 13:03:58,FOOBAR,THREAT,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0"
  "sourcetype": "pan:threat"
  }

Example 2: CIDR Lookups

Assign a location field to events if their destination_ip field matches a particular CIDR range.

paloaltoips.csv
range,location
10.0.0.0/24,San Francisco
10.0.0.0/16,California
10.0.0.0/8,US

Match mode: CIDR

Match type: See options below

Lookup field name in event: destination_ip

Corresponding field name in lookup: range

In Match mode: CIDR with Match type: Most specific, the lookup will implicitly search for matches from most specific to least specific. There is no need to pre-sort data.

Note that Match mode: CIDR with Match type: First Match is likely the most performant with large lookups. This can be used as an alternative to Most specific, if the file is sorted with the most specific/relevant entries first. This mode still performs a table scan, top to bottom.

Events before and after
### BEFORE:

{"_raw": "Sep 20 13:03:55 PA-VM 1, 2018/09/20 13:03:58,FOOBAR,TRAFFIC,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0",
  "destination_ip": "10.0.0.102"
  }
  
### AFTER with Match Type: First Match
 
{"_raw": "Sep 20 13:03:55 PA-VM 1, 2018/09/20 13:03:58,FOOBAR,TRAFFIC,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0",
  "destination_ip": "10.0.0.102",
  "location": "San Francisco"
  }
  
### AFTER with Match Type: Most Specific
 
{"_raw": "Sep 20 13:03:55 PA-VM 1, 2018/09/20 13:03:58,FOOBAR,TRAFFIC,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0",
  "destination_ip": "10.0.0.102",
  "location": "San Francisco"
  }

### AFTER with Match Type: All
 
{"_raw": "Sep 20 13:03:55 PA-VM 1, 2018/09/20 13:03:58,FOOBAR,TRAFFIC,end,2049,2018/09/20 13:03:58,34.217.108.226,10.0.0.102,34.217.108.226,10.0.2.65,splunk,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0",
  "destination_ip": "10.0.0.102",
  "location": [
    "San Francisco",
    "California",
    "US",
  ]}

More Examples and Scenarios

More examples:

See also: