extract
The extract operator extracts information from a field (which defaults to _raw), using either a predefined Parser or various Type options (such as delimited, regex, kvp, json, csv, elff, or clf).
To extract a substring from a source string, using an RE2 regular expression, see the
extractscalar string function.
Syntax
... | extract [ SourceOption ] [ type=Type | parser=Parser ][ type=delimited [ DelimOption ] FieldList ][ [ type=regex ] [ RegexOption ] RegexLiteral [RegexLiteral[...]] ][ type=keyvalue [ FieldList ]][ type=json [ FieldList ]][ type=csv FieldList ][ type=elff FieldList ][ type=clf [ FieldList ]]
Rules
This operator automatically detects the type when you define RegexLiteral or DelimOption. For example, if you provide RegexLiteral, it’s assigned as type=regex. Or, if you specify DelimOption, it’s assigned as type=delimited.
You can define either DelimOption or RegexOption, not both.
You can have multiple FieldLists. These will be merged into one larger field list.
A FieldList is optional for the key value, JSON, and CLF types. If no FieldList is specified:
Key value and JSON types extract all fields.
CLF type extracts
clientip,ident,user,timestamp,request,status, andbytes.
Arguments
Here are syntax details and options for each of this operator’s arguments.
SourceOption
Syntax – [ SourceIdent=FieldName ].
Defaults to source=_raw.
SourceIdent: Define as
source,sourceField, orsrc.FieldName: Name of the field to extract from. Case sensitive.
Type
Syntax – [ Type=TypeName ]. Example: type=csv
Type: Define as either
typeorformat.TypeName: See the following table.
| TypeName | Define as |
|---|---|
| Delimiter | delimited, del, or delim |
| Regex | regex, regexp, or re |
| Key Value | kvp, keyvalue, or kv |
| JSON | json or js |
| CSV | csv |
| ELF | elff or extended |
| CLF | clf or common (case insensitive) |
FieldList
Syntax – "FieldName" [, "FieldNameList" ].
Case-sensitive. Example: "field1,field2"
Alternative syntax – FieldIdent="FieldNameList".
Example: fields=field1,field2
FieldIdent: Define as either
fieldorfields.FieldNameList: Syntax –
FieldName [, FieldNameList ]. Case sensitive.
Parser
Syntax – parser="KnownParserName".
Use a predefined Parser. Case-sensitive.
RegexLiteral
Syntax – @'NamedRegularExpression'.
At least one named capture group is required. Example: @'metric1=(?<metric1>\d+)'
To extract key-value pairs and assign them to fields, use the capture group names _NAME_0 and _VALUE_0. Example: @'\"(?<_NAME_0>\w+)\s\/(?<_VALUE_0>.*)\"'
Alternative syntax – RegexIdent=@'NamedRegularExpression'.
Example: regex=@'metric1=(?<metric1>\d+)'
- RegexIdent: Define as either
regexorregexp.
DelimOption
Syntax – [[ DelimCharOpt ] [ EscapeCharOpt ] [ QuoteCharOpt] [ NullValOpt ]]
| Argument | Define as | Syntax | Default Value | Example |
|---|---|---|---|---|
| DelimCharOpt | delimchar, delimiter, or delim | DelimCharOpt=char | ; | delimiter=: |
| EscapeCharOpt | escapechar, escape, or esc | EscapeCharOpt=char | \ | esc=\ |
| QuoteCharOpt | quotechar or quote | QuoteCharOpt=char | " | quotechar=' |
| NullValOpt | nullvalue or undefined |
RegexOption
Syntax – [[ IterationOpt ] [ OverwriteOpt ]]
| Argument | Define as | Syntax | Default Value | Example |
|---|---|---|---|---|
| IterationOpt | numiterations, iterations, or iter | IterationOpt=integer | 100 | iterations=11 |
| OverwriteOpt | overwrite or ow | OverwriteOpt=boolean | false | ow=true |
Examples
This example extracts the domain name from an email address:
dataset=$vt_dummy event<10
| extend email="test@example.com"
| extract source=email @'@(?<domainName>.*$)'This example extracts delimited field values:
dataset=$vt_dummy event<10
| extend myField="value1|value2|value3"
| extract source=myField type=delim delimChar="|" "field1,field2,field3"This example extracts and names the three fields from the CSV-formatted source field foobar:
dataset=myDataset
| extract source=foobar type=csv "field1,field2,field3"This example extracts all key-value pairs from the default source field _raw:
dataset=myDataset
| extract type=keyvalueThis examplee uses a Parser to extract all fields from mySource:
dataset=myDataset
| extract source="mySource" parser="Apache Common Log Format"This example extracts delimited values (from _raw) using the | as the delimiter. The type specification could be omitted in this case, because the type is implicitly given by specifying the delimChar option:
dataset=myDataset
| extract type=delim delimChar="|" "field1,field2"This example extracts the fields metric1 and foo from _raw (default source) via named regular expressions. The type specification could be omitted in this case, because the type is implicitly given by specifying regular expressions.
dataset=myDataset
| extract type=regex @'metric1=(?<foo>\d+)'