Skip to main content
Version: 3.2

Scheduling and Running

Once you've configured a Collector, you can either run it immediately ("ad hoc") to collect data, or schedule it to run on a recurring interval. Scheduling requires some extra configuration upfront, so we cover this option first.

For ad hoc collection, you can configure whether a job interrupted by an unintended LogStream shutdown will automatically resume upon LogStream restart.

But regardless of this configuration, if you explicitly restart or stop LogStream, this will cancel any currently running jobs. This applies to executing the ./cribl restart or ./cribl stop CLI commands, as well as to selecting the UI's global ⚙️ Settings (lower left) > Controls > Restart option.

scheduled job interrupted by a shutdown (whether explicit or unintended) will not resume upon restart.

Schedule Configuration

Click Schedule beside a configured Collector to display the Schedule configuration modal. This provides the following controls.

Enabled: Slide to Yes to enable this collection schedule.

The scheduled job will keep running on this schedule forever, unless you toggle Enabled back to Off. The Off setting preserves the schedule's configuration, but prevents its execution.

Cron schedule: A cron schedule on which to run this job.

  • The Estimated schedule below this field shows the next few collection runs, as examples of the cron interval you've scheduled.

Skippable: Skippable jobs can be delayed up to their next run time if the system is hitting concurrency limits. Defaults to Yes.

Skippable Jobs and Concurrency Limits

If toggled to Yes, the Skippable option obeys these concurrency limits in global ⚙️ Settings (lower left) > General Settings > Job Limits:

  • Concurrent Job Limit
  • Concurrent Scheduled Job Limit

See Job Limits for details on these and other limits that you can set in global ⚙️ Settings.

When the above limits delay a Skippable job:

  • The Skippable job will be granted slightly higher priority than non-Skippable jobs.

  • If the job receives resources to run before its next scheduled run, LogStream will run the delayed job, then snap back to the original cron schedule.

  • If resources do not free up before the next scheduled run: LogStream will skip the delayed run, and snap back to the original cron schedule.

Set Skippable to No if you absolutely must have all your data, for compliance or other reasons. In this case, LogStream will build up a backlog of jobs to run.

You can think of Skippable: No as behaving more like the TCP protocol, with Skippable: Yes behaving more like UDP.

Max Concurrent Runs: Sets the maximum number of instances of this scheduled job that may simultaneously run.

All collection jobs are constrained by the following options in global ⚙️ Settings (lower left) > General Settings > Job Limits:

  • Concurrent Task Limit
  • Max Task Usage Percentage

Run Configuration and Shared Settings

Most of the remaining fields and options below are shared with the Run configuration modal, which you can open by clicking Run beside a configured Collector.

Mode

Depending on your requirements, you can schedule or run a collector in these modes:

Preview

In the Preview mode, a collection job will return only a sample subset of matching results (e.g., 100 events). This is very useful in cases when users need a data sample to:

  • Ensure that the correct data comes in.
  • Iterate on Filter expressions.
  • Capture a sample to iterate on Pipelines.

Schedule configuration omits the Preview option, because Preview is designed for immediate analysis and decision making. To configure a Scheduled Job with high confidence, you can first manually run Preview jobs with the same Collector, to verify that you're collecting the data you expect.

Preview Settings

In Preview mode, you can optionally configure these options:

Capture time (sec): Maximum time interval (in seconds) to collect data.

Capture up to N events: Maximum number of events to capture.

Where to capture: Select one of the options shown below. (Note that option 2. Before the Routes is disabled.) If not specified, this will default to 1. Before pre‑processing Pipeline.

Preview capture options

Discovery

In Discovery mode, a collection job will return only the list of objects/files to be collected, but none of the data. This mode is typically used to ensure that the Filter expression and time range are correct before a Full Run job collects unintended data.

Send to Routes

In Discovery mode, this slider enables you to send discovery results to LogStream Routes. Defaults to No.

This setting overrides the Collector configuration's Result Routing > Send to Routes setting.

Full Run

In Full Run mode, the collection job is fully executed by Worker Nodes, and will return all data matching the Run configuration.

Time Range

Set an Absolute or Relative time range for data collection.

The Relative option is the default, and is particularly useful for configuring scheduled jobs.

Absolute

Select the Absolute button to set fixed collection boundaries in your local time. Next, use the Earliest and Latest controls to set the start date/time and end date/time.

Relative

Select the Relative button to set collection boundaries relative to the current time. Next, use the Earliest and Latest to set start and end times like these:

  • Earliest example values: -1h, -42m, -42m@h
  • Latest example values: now, -20m, +42m@h

Relative Time Syntax

For Relative times, the Earliest and Latest controls accept the following syntax:

[+|-]<time_integer><time_unit>@<snap-to_time_unit>

To break down this syntax:

Syntax ElementValues Supported
OffsetSpecify: - for times in the past, + for times in the future, or omit with now.
<time_integer>Specify any integer, or omit with now.
<time_unit>Specify the now constant, or one of the following abbreviations: s[econds], m[inutes], h[ours], d[ays], w[eeks], mon[ths], q[uarters], y[ears].
@<snap-to_time_unit>Optionally, you can append the @ modifier, followed by any of the above <time_unit>s, to round down to the nearest instance of that unit. (See the next section for details.)

LogStream validates relative time values using these rules:

  • Earliest must not be later than Latest.
  • Values without units get interpreted as seconds. (E.g., -1 = -1s.)

Snap-to-Time Syntax

The @ snap modifier always rounds down (backwards) from any specified time. This is true even in relative time expressions with + (future) offsets. For example:

  • @d snaps back to the beginning of today, 12:00 AM (midnight).

  • +128m@h looks forward 128 minutes, then snaps back to the nearest round hour. (If you specified this in the Latest field, and ran the Collector at 4:20 PM, collection would end at 6:00 PM. The expression would look forward to 6:28 PM, but snap back to 6:00 PM.)

Other options:

  • @w or @w7 to snap back to the beginning of the week – defined here as the preceding Sunday.
  • To snap back to other days of a week, use w1 (Monday) through w6 (Saturday).
  • @m to snap back to the 1st of a month.
  • @q to snap back to the beginning of the most recent quarter – Jan. 1, Apr. 1, Jul. 1, or Oct. 1.
  • @y to snap back to Jan. 1.

Filter

This is a JavaScript filter expression that is evaluated against token values in the provided collector path (see below), and against the events being collected. The Filter value defaults to true, which matches all data, but this value can be customized almost arbitrarily.

For example, if a Filesystem or S3 collector is run with this Filter:

host=='myHost' && source.endsWith('.log') || source.endsWith('.txt')

...then only files/objects with .log or .txt extensions will be fetched. And, from those, only those events with host field myHost will be collected.

At the Filter field's right edge are a Copy button, an Expand button to open a validation modal, and a History button. For more extensive options, see Tokens for Filtering below.

Advanced Settings

Log Level: Level at which to set task logging. More-verbose levels are useful for troubleshooting jobs and tasks, but use them sparingly.

Lower task bundle size: Limits the bundle size for small tasks. E.g., bundle five 200KB files into one 1MB task bundle. Defaults to 1MB.

Upper task bundle size: Limits the bundle size for files above the Lower task bundle size. E.g., bundle five 2MB files into one 10MB task bundle. Files greater than this size will be assigned to individual tasks. Defaults to 10MB.

Reschedule tasks: Whether to automatically reschedule tasks that failed with non-fatal errors. Defaults to Yes; does not apply to fatal errors.

Max task reschedule: Maximum number of times a task can be rescheduled. Defaults to 1.

Job timeout: Maximum time this job will be allowed to run. Units are seconds, if not specified. Sample values: 30, 45s, or 15m. Minimum granularity is 10 seconds, so a 45s value would round up to a 50-second timeout. Defaults to 0, meaning unlimited time (no timeout).

Tokens for Filtering

Let's look at the options for path-based (basic) and time-based token filtering.

Basic Tokens

In collectors with paths, such as Filesystem or S3, LogStream supports path filtering via token notation. Basic tokens' syntax follows that of JS template literals: ${<token_name>} – where token_name is the field (name) of interest.

For example, if the path was set to /var/log/${hostname}/${sourcetype}/, you could use a Filter such as hostname=='myHost' && sourcetype=='mySourcetype' to collect data only from the /var/log/myHost/mySourcetype/ subdirectory.

Time-based Tokens

In paths with time partitions, LogStream supports further filtering via time-based tokens. This has a direct effect with earliest and latest boundaries. When a job runs against a path with time partitions, the job traverses a minimal superset of the required directories to satisfy the time range, before subsequent event _time filtering.

About Partitions and Tokens

LogStream processes time-based tokens as follows:

  • For each path, time partitions must be notated in descending order. So Year/Month/Day order is supported, but Day/Month/Year is not.
  • Paths may contain more than one partition. E.g., /my/path/2020-04/20/.
  • In a given path, each time component can be used only once. So /my/path/${_time:%Y}/${_time:%m}/${_time:%d}/... is a valid expression format, but /my/path/${_time:%Y}/${_time:%m}/${host}/${_time:%Y}/... (with a repeated Y) is not supported.
  • For each path, all extracted dates/times are considered in UTC.

The following strptime format components are allowed:

  • 'Yy', for years
  • 'mBbj', for months
  • 'dj', for days
  • 'HI', for hours
  • 'M', for minutes
  • 'S', for seconds
Token Syntax

Time-based token syntax follows that of a slightly modified JS template literal: ${_time: <some_strptime_format_component>}. Examples:

FilterMatches
/my/path/${_time:%Y}/${_time:%m}/${_time:%d}/.../my/path/2020/04/20/...
/my/path/${_time:year=%Y}/${_time:month=%m}/${_time:date=%d}/.../my/path/year=2020/month=05/date=20/...
/my/path/${_time:%Y-%m-%d}/.../my/path/2020-05-20/...
Last updated by: Dritan Bitincka