Home / Search/ Language Reference/ Operators/ Data Operators/extract

extract

The extract operator extracts information from a field (which defaults to _raw), using either a predefined Parser or various Type options (such as delimited, regex, kvp, json, csv, elff, or clf).

To extract a substring from a source string, using an RE2 regular expression, see the extract scalar string function.

Syntax

... | extract [ SourceOption ] [ type=Type | parser=Parser ]
  • [ type=delimited [ DelimOption ] FieldList ]
  • [ [ type=regex ] [ RegexOption ] RegexLiteral [RegexLiteral[...]] ]
  • [ type=keyvalue [ FieldList ]]
  • [ type=json [ FieldList ]]
  • [ type=csv FieldList ]
  • [ type=elff FieldList ]
  • [ type=clf [ FieldList ]]

Rules

This operator automatically detects the type when you define RegexLiteral or DelimOption. For example, if you provide RegexLiteral, it’s assigned as type=regex. Or, if you specify DelimOption, it’s assigned as type=delimited.

You can define either DelimOption or RegexOption, not both.

You can have multiple FieldLists. These will be merged into one larger field list.

A FieldList is optional for the key value, JSON, and CLF types. If no FieldList is specified:

  • Key value and JSON types extract all fields.

  • CLF type extracts clientip, ident, user, timestamp, request, status, and bytes.

Arguments

Here are syntax details and options for each of this operator’s arguments.

SourceOption

Syntax – [ SourceIdent=FieldName ].

Defaults to source=_raw.

  • SourceIdent: Define as source, sourceField, or src.

  • FieldName: Name of the field to extract from. Case sensitive.

Type

Syntax – [ Type=TypeName ]. Example: type=csv

  • Type: Define as either type or format.

  • TypeName: See the following table.

TypeNameDefine as
Delimiterdelimited, del, or delim
Regexregex, regexp, or re
Key Valuekvp, keyvalue, or kv
JSONjson or js
CSVcsv
ELFelff or extended
CLFclf or common (case insensitive)

FieldList

Syntax – "FieldName" [, "FieldNameList" ].

Case-sensitive. Example: "field1,field2"

Alternative syntax – FieldIdent="FieldNameList".

Example: fields=field1,field2

  • FieldIdent: Define as either field or fields.

  • FieldNameList: Syntax – FieldName [, FieldNameList ]. Case sensitive.

Parser

Syntax – parser="KnownParserName".

Use a predefined Parser. Case-sensitive.

RegexLiteral

Syntax – @'NamedRegularExpression'.

At least one named capture group is required. Example: @'metric1=(?<metric1>\d+)'

To extract key-value pairs and assign them to fields, use the capture group names _NAME_0 and _VALUE_0. Example: @'\"(?<_NAME_0>\w+)\s\/(?<_VALUE_0>.*)\"'

Alternative syntax – RegexIdent=@'NamedRegularExpression'.

Example: regex=@'metric1=(?<metric1>\d+)'

  • RegexIdent: Define as either regex or regexp.

DelimOption

Syntax – [[ DelimCharOpt ] [ EscapeCharOpt ] [ QuoteCharOpt] [ NullValOpt ]]

ArgumentDefine asSyntaxDefault ValueExample
DelimCharOptdelimchar, delimiter, or delimDelimCharOpt=char;delimiter=:
EscapeCharOptescapechar, escape, or escEscapeCharOpt=char\esc=\
QuoteCharOptquotechar or quoteQuoteCharOpt=char"quotechar='
NullValOptnullvalue or undefined

RegexOption

Syntax – [[ IterationOpt ] [ OverwriteOpt ]]

ArgumentDefine asSyntaxDefault ValueExample
IterationOptnumiterations, iterations, or iterIterationOpt=integer100iterations=11
OverwriteOptoverwrite or owOverwriteOpt=booleanfalseow=true

Examples

This example extracts the domain name from an email address:

dataset=$vt_dummy event<10 
| extend email="test@example.com" 
| extract source=email @'@(?<domainName>.*$)'

This example extracts delimited field values:

dataset=$vt_dummy event<10 
| extend myField="value1|value2|value3" 
| extract source=myField type=delim delimChar="|" "field1,field2,field3"

This example extracts and names the three fields from the CSV-formatted source field foobar:

dataset=myDataset
| extract source=foobar type=csv "field1,field2,field3"

This example extracts all key-value pairs from the default source field _raw:

dataset=myDataset
| extract type=keyvalue

This examplee uses a Parser to extract all fields from mySource:

dataset=myDataset
| extract source="mySource" parser="Apache Common Log Format"

This example extracts delimited values (from _raw) using the | as the delimiter. The type specification could be omitted in this case, because the type is implicitly given by specifying the delimChar option:

dataset=myDataset
| extract type=delim delimChar="|" "field1,field2"

This example extracts the fields metric1 and foo from _raw (default source) via named regular expressions. The type specification could be omitted in this case, because the type is implicitly given by specifying regular expressions.

dataset=myDataset
| extract type=regex @'metric1=(?<foo>\d+)'