Home / Search/ Language Reference/ Operators/ Filter Operators/dedup

dedup

The dedup operator filters out duplicate events.

You need to specify the field that you want to inspect for duplicates (FieldName). When that field has the same value in multiple events, those events are considered duplicates.

You can also specify multiple fields. In that case, events are considered duplicates only if all of the specified fields are duplicated. For example, if you specify fields named Phone and Email, events are considered duplicates only if they have an identical Phone value and an identical Email value.

By default, the dedup operator outputs only the first duplicate it finds, and drops the rest. You can change this behavior by specifying the number of duplicates to keep (NumberOfDuplicatesToKeep).

When looking for duplicates, the dedup operator compares events timed within 30 seconds of each other. You can also specify a different time window (TimeWindow).

Syntax

Scope | dedup [time_window=TimeWindow] [num_duplicates=NumberOfDuplicatesToKeep] by FieldName [, ...]

Arguments

  • TimeWindow (int): Sets an execution time interval over which this operator will respect the NumberOfDuplicatesToKeep limit on the specified FieldName(s). If execution continues beyond this time interval, dedup will release the limit, allowing the next duplicate event to filter through to results. Default: 30 seconds.
  • NumberOfDuplicatesToKeep (int or expression): The number of duplicate events to keep. Default: 1.
  • FieldName: The name of the field that you want to inspect for duplicates. Allowed formats: fieldName or [“field name”]. Separate multiple fields with a comma: fieldName1, [“field name 2”], ....

Results

Filters out events identified as duplicates.

Examples

Filter out events that have the same value in the Name field.

dataset=myDataset
| dedup by Name

Filter out events that have the same values in the corresponding Name, Home address, and Work address fields.

dataset=myDataset
| dedup by Name, ['Home address'], ["Work address"]

Filter out events that contain Name duplicates, and that were found within a minute of each other. If there are more than 5 such events, keep only the first 5.

dataset=myDataset
| dedup time_window=60 num_duplicates=5 by Name
dataset=$vt_dummy event<1000
| extend randomNumber=rand(10)
| dedup by randomNumber