Lookups are data tables that can be used in Cribl LogStream to enrich events as they're processed by the Lookup Function. You can access the Lookups library under Knowledge > Lookups, which provides a management interface for all lookups.
This library is searchable, and each lookup can be tagged as necessary. There's full support for
.csv files. Compressed files are supported, but must be in gzip format (
.gz extension). You can add files in multimedia database (
.mmdb) binary format, but you cannot edit these binary files through LogStream's UI.
In single-instance deployments, all files handled by the interface are stored in
$CRIBL_HOME/data/lookups. In distributed deployments, the storage path on the Master Node is
$CRIBL_HOME/groups/<groupname>/data/lookups/ for each Worker Group.
For large and/or frequently replicated lookup files, you might want to bypass the Lookups Library UI and instead manually place the files in a different location. This can both reduce deploy traffic and prevent errors with LogStream's default Git integration. For details, see Managing Large Lookups.
To upload or create a new lookup file, click + Add New, then click the appropriate option from the drop-down.
To re-upload, expand, edit, or delete an existing
.gz lookup file, click its row on the Lookups page. (No editing option is available for
.mmdb files; you can only re-upload or delete these.)
In the resulting modal, you can edit files in Table or Text mode. However, Text mode is disabled for files larger than 1 MB.
For large lookup files, you'll need to provide extra memory beyond basic requirements for LogStream and the OS. To determine how much extra memory to add to a Worker Node for a lookup, use this formula:
Lookup file's uncompressed size (MB) * 2.25 (to 2.75) * numWorkerProcesses = Extra RAM required for lookup
For example, if you have a lookup file that is 1 GB (1,000 MB) on disk, and three Worker Processes, you could use an average 2.50 as the multiplier:
1,000 * 2.50 * 3 = 7,500
In this case, the Node's server or VM would need an extra 7,500 MB (7.5 GB) to accommodate the lookup file across all three worker processes.
We've cited a squishy range of 2.25–2.75 for the multiplier, because we've found that it varies inversely with the number of columns in the lookup file:
- The fewer columns, the higher the extra memory overhead (2.75 multiplier).
- The more columns, the lower the overhead (2.25 multiplier).
In Cribl's testing:
- 5 columns required a multiplier of 2.75
- 10 columns required a multiplier of only 2.25.
These are general (not exact) guidelines, and this multiplier depends only on the lookup table's number of columns. The memory overhead imposed by each additional row appears to decline when more columns are present in the data.
Updated about a month ago