These docs are for Cribl Stream 4.1 and are no longer actively maintained.
See the latest version (4.11).
File Monitor
The File Monitor Source collects log (text) files, and generates events from lines or records in those files. This Source is available on Stream and Edge, but not on Cloud.
Available in Cribl.Cloud: No | Type: System | TLS Support: N/A | Event Breaker Support: Yes
Discovering and Filtering Files to Monitor
To produce its initial list of files to monitor, the File Monitor Source runs a discovery procedure at a configurable Polling interval. The Source then applies an Allowed list to filter the initial list down into its final form. Then, for each file on the list, the Source compares current state with previously-stored state. This comparison determines whether the File Monitor Source will actually watch a given file for a given polling interval, or just ignore the file.
In the simplest case, the Source discovers a file for which it has no stored state. This means that the file has just been created, and needs to be monitored. See the Examples for other possibilities, along with a description of the Status tab, which displays state information for all files being monitored.
How does the File Monitor Source discover files in the first place? You have the choice of two Discovery Modes: Auto or Manual.
In Auto mode, the Source automatically discovers files that running processes have open for writing. Auto mode collects logs from (the discovered) active files that match the path/allowlist.
Auto mode is useful to detect which files are being written to. For example, if you enter an allowlist of
*log
, Auto mode will find all the active logs within that directory, no matter what’s writing to it, and it will exclude any rotated logs.
Cribl Edge currently supports Auto mode only when running on Linux, not on Windows.
In Manual mode, the Source discovers files based on the specified directory and depth.
Manual mode is often used to collect logs in a folder where the process that creates them rotates them when they get large. For example, if you specify a path of
/var/log
and an allowlist of*/messages*
, Manual mode will capture the current/var/log/messages
, as well as older logs from/var/log/messages*
.
If you are using a tool like
rsync
to transfer files in chunks, the File Monitor Source might start collecting files before the transfer is complete. To prevent this, configure your tool to transfer files serially. (For example, use:rsync --append
.)
Configuring a File Monitor Source
From the top nav, click Manage, then select a Worker Group to configure. Next, you have two options:
To configure via the graphical QuickConnect UI, click Routing > QuickConnect (Stream) or Collect (Edge). Next, click Add Source at left. From the resulting drawer’s tiles, select [System and Internal >] File Monitor. Next, click either Add Destination or (if displayed) Select Existing. The resulting drawer will provide the options below.
Or, to configure via the Routing UI, click Data > Sources (Stream) or More > Sources (Edge). From the resulting page’s tiles or left nav, select [System and Internal >] File Monitor. Next, click New Source to open a New Source modal that provides the options below.
General Settings
Enabled: Toggle this slider to Yes
to enable the Source.
Input ID: Enter a unique name to identify this File Monitor Source definition.
Discovery Mode: Use the buttons to select one of these options:
Auto: Tells the Source to automatically discover files that running processes have open for writing.
Manual: Tells the Source to discover the files within the Search path (a directory) that you specify, down to the Max depth. You must enter an absolute path (no wildcards) for the Search path, as File Monitor searches files in a single directory in Manual mode.
By default, the Max depth field is empty. This means that the Source will search subdirectories, and their subdirectories, and so on, without limit. If you specify
0
, the Source will discover only the top-level files within the Search path. For1
, the Source will discover files one level down.
Both modes allow you to set the Polling interval, which otherwise defaults to 10 seconds.
Optional Settings
Filename allowlist: Wildcard syntax, and the exclamation mark (!
) for negation, are allowed. For example, you can use !*cribl*access.log
to prevent the Source from discovering Cribl Stream’s own log files. The default filters are */log/*
and *log
.
Always think through how your Search path and Filename allowlist settings interact: If you’re not careful, you can inadvertently create overlapping monitor groups that deliver duplicates of some files. See Using Allowlists Effectively below.
Max age duration: Optionally, specify a maximum age of files to monitor. This Source will filter events with timestamps newer than the configured duration. (Where files don’t have a parsable timestamp, it will set their events’ timestamp to Now
.) Enter a duration in a format like 30s
, 4h
, 3d
, or 1w
. Defaults to an empty field, which applies no age filters.
Check file modification times: If toggled to Yes
, this Source will skip files with modification times older than the configured Max age duration.
When you set a Max age duration threshold, the Source will open every file and read through all its contents to seek a timestamp. For best performance, also enable the Check file modification times toggle to skip older files.
Collect from end: If toggled to Yes
, then upon Cribl Stream’s next startup, this Source will skip to the end of new files. (It will run discovery once, adding any newly discovered files to the state store as if they had already been consumed.) After that initial run, when this Source next discovers new files, it will read those new files from the head.
Tags: Optionally, add tags that you can use for filtering and grouping in Cribl Stream. Use a tab or hard return between (arbitrary) tag names.
Processing Settings
Event Breakers
Event Breaker rulesets: A list of event breaking rulesets that will be applied to the input data stream before the data is sent through the Routes. Defaults to System Default Rule
.
Event Breaker buffer timeout: How long (in milliseconds) the Event Breaker will wait for new data to be sent to a specific channel, before flushing out the data stream, as-is, to the Routes. Minimum 10
ms, default 10000
(10 sec.), maxiumum 43200000
(12 hours).
Fields
In this section, you can add Fields to each event, using Eval-like functionality.
Name: Field name.
Value: JavaScript expression to compute field’s value, enclosed in quotes or backticks. (Can evaluate to a constant.)
Pre-Processing
In this section’s Pipeline drop-down list, you can select a single existing Pipeline to process data from this input before the data is sent through the Routes.
Advanced Settings
Environment: If you’re using GitOps, optionally use this field to specify a single Git branch on which to enable this configuration. If empty, the config will be enabled everywhere.
Idle timeout: Time, in seconds, before an idle file is closed. Defaults to 300
sec. (5 minutes).
Connected Destinations
Select Send to Routes to enable conditional routing, filtering, and cloning of this Source’s data via the Routing table.
Select QuickConnect to send this Source’s data to one or more Destinations via independent, direct connections.
This Source defaults to QuickConnect.
Internal Fields
Cribl Stream uses a set of internal fields to assist in handling of data. These “meta” fields are not part of an event, but they are accessible, and Functions can use them to make processing decisions.
For the File Monitor Source, you can use internal fields to enrich events with container and host metadata. The following internal fields are available:
__baseFilename
__source
/source
__raw
__inputId
container_id
container_path
host_path
Monitoring Renamed Files’ State
The File Monitor Source uses a simple scheme to find “backlog” files corresponding to a matching file: Tell it to tail foo.log
, and it will look for foo.log.[0-9]
and scrape those, too.
This Source keeps hashes that correspond to the start point within the file, and to the last-read point. This way, if Cribl Stream is stopped, files are rotated, and Cribl Stream is restarted, the File Monitor Source can find the resume point in those backlog files.
Similarly, if you rename the file within the same backlog scheme (e.g., foo.log
to foo.log.0
), the File Monitor Source can resume at the right place. However, if you renamed foo.log
to bar.log
(a deviation from the expected naming scheme), this Source could not find its resume point.
Using Allowlists Effectively
Done properly, file monitoring collects exactly the files you need to see, without duplication. You want to avoid inadvertently creating overlapping monitor groups that deliver duplicates, because this can “run the meter” twice for affected files, potentially causing license issues.
The key is knowing how to craft allowlists.
Crafting Allowlists: Principles
To get the results you want from file monitoring, always apply the following principles.
Write allowlists to explicitly include the full path of files you want to collect.
The search path itself is not excluded from the match. For example, if you want to collect the file /var/log/messages
when you’ve set your search path to /var/log
, your filename allowlist needs to be either /var/log/messages
, /var/*/messages
, or */log/messages
.
Exclusions (!
) also need to address the full file path. For example, if you want to collect everything from /var/log/*
except for /var/log/apache/*
, your allowlist must include one of the following:
!/var/log/apache/*
!*/apache/*
!*/log/apache/*
Wildcards (*
) match any part of the file path:
*log
matches/var/foo/bar/myfile.log
.*.log
matches anything ending in.log
./var/log/*
matches anything in/var/log
, subject to the Max Depth setting.
Put explicit matches (and exclusions) at the beginning of the allowlist, ahead of wildcard matches.
For example, an allowlist of /var/log/*
followed by !/var/log/apache/*
will fail to exclude the file /var/log/apache/foo
because that file matches the first allowlist entry.
In allowlists, order matters!
Be as specific as possible, whenever possible, to prevent accidental matches.
Each of the following examples shows how to avoid matching unwanted files within the directories that you’re monitoring.
To collect /var/log/boot.log
(and older versions of boot.log.*
) as well as /var/log/messages
:
Search path | Filename allowlist |
---|---|
/var/log | /var/log/boot.log , /var/log/messages |
To collect everything in /var/log
while excluding /var/log/apache/*
:
Search path | Filename allowlist |
---|---|
/var/log | !/var/log/apache/* , /var/log/* |
Examples of Allowlists
Below are typical scenarios where good allowlist technique will save you from duplication headaches.
Example: Excluding the System’s Own Logs
Recall that you’ll often create a set of File Monitor Sources, each of which monitors one file. This is not the right approach in an example like this one, where Discovery Mode is set to Auto. Using Auto mode in a file monitor along with literally any other file monitor can cause unintended, duplicate collection. Use Auto mode carefully!
Here’s the example:
Suppose you added a File Monitor Source on a Linux machine, set your Discovery Mode to Auto, and specified your home directory as the Search path (e.g., /home/bogart/
).
You do not want to monitor Cribl Stream’s own log files, or any log files generated by Chrome, the web browser you’re using. To exclude them, you add !*cribl*access.log
and !*chrome*
to your Allowlist. You do want to monitor any other log files that the Source can discover, so you also add *log
.
After awhile, the Status tab looks like this:

Example: Fixing Overlapping Monitor Groups
Here’s an example of the kind of overlapping monitor groups you should avoid:
Search path | Max depth | Filename allowlist |
---|---|---|
/var | (none) | */log/* |
/var/log/apache | 0 | * |
The above configuration would collect all the files in /var/log/apache/
twice. A better solution would be the following:
Search path | Max depth | Filename allowlist |
---|---|---|
/var/log | (none) | !*/apache/* |
/var/log/apache | 0 | * |