Grok
前言
在Logstash里为我们提供了一个非常强大的Grok日志分析工具,它可以将杂乱无序的日志变的井然有序。
语法与内置正则
Grok属于Filter的插件,同时Grok内置了许多正则,方便我们解析日志,Grok语法如下:
filter {
grok {
match => {
"message" => "this is regular"
}
}
}
在logstash里使用数学里的"=>"推导符号来表示关系,message是filebeat发送过来的一个字段内容
grok里的语法格式如下:
%{正则:变量}
grok内置了多个正则,帮助我们来快速解析日志内容:
模式 | 正则 | 作用 |
USERNAME | [a-zA-Z0-9._-]+ | 提取用户名 |
USER | %{USERNAME} | 提取用户, USERNAME的别名 |
INT | (?:[+-]?(?:[0-9]+)) | 提取整数 |
BASE10NUM | (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))) | 提取十进制的数 |
NUMBER | (?:%{BASE10NUM}) | 提取数字 |
BASE16NUM | (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+)) | 提取十六进制的数 |
BASE16FLOAT | \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b | 提取十六进制的浮点数 |
POSINT | \b(?:[1-9][0-9]*)\b | 提取不以0开头的数,如:01 22 33 04 只提取22 33以0开头的数字不会提取 |
NONNEGINT | \b(?:[0-9]+)\b | 全部匹配,以0开头依然会匹配 |
WORD | \b\w+\b | 匹配单词 |
NOTSPACE | \S+ | 匹配非空白字符 |
SPACE | \s* | 匹配空格 |
DATA | .*? | 满足条件字符匹配一次 |
GREEDYDATA | .* | 贪婪匹配 |
QUOTEDSTRING | (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)) | 匹配引用字符串 |
UUID | [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12} | 匹配UUID |
MAC | (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) | 匹配MAC地址 |
CISCOMAC | (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4}) | 匹配 CISCO MAC地址 |
WINDOWSMAC | (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) | 匹配Windows MAC地址 |
COMMONMAC | (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}) | 匹配COMMON MAC地址 |
IPV6 | ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)? | 匹配IPV6 |
IPV4 | (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]) | 匹配IPV4 |
IP | (?:%{IPV6}|%{IPV4}) | 匹配IP6或IP4 |
HOSTNAME | \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) | 匹配主机名 |
HOST | %{HOSTNAME} | HOSTNAME别名 |
IPORHOST | (?:%{HOSTNAME}|%{IP}) | 匹配IPORHOST |
HOSTPORT | %{IPORHOST}:%{POSINT} | 匹配HOSTPORT |
PATH | (?:%{UNIXPATH}|%{WINPATH}) | 匹配路径 |
UNIXPATH | (?>/(?>[\w_%!$@:.,-]+|\\.)*)+ | unix路径 |
TTY | (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+)) | tty路径 |
WINPATH | (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+ | windows路径 |
URIPROTO | [A-Za-z]+(\+[A-Za-z+]+)? | 匹配URIPROTO |
URIHOST | %{IPORHOST}(?::%{POSINT:port})? | 匹配URIHOST |
URIPATH | (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+ | 匹配URIPATH |
URIPARAM | \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]* | 匹配URIPARAM |
URIPATHPARAM | %{URIPATH}(?:%{URIPARAM})? | 匹配URIPATHPARAM |
URI | %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})? | 匹配URI |
MONTH | \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b | 匹配月 |
MONTHNUM | (?:0?[1-9]|1[0-2]) | 匹配月数 |
MONTHNUM2 | (?:0[1-9]|1[0-2]) | 匹配月数写法2 |
MONTHDAY | (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]) | 匹配日 |
DAY | (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) | 匹配星期 |
YEAR | (?>\d\d){1,2} | 匹配年 |
HOUR | (?:2[0123]|[01]?[0-9]) | 匹配小时 |
MINUTE | (?:[0-5][0-9]) | 匹配分钟 |
SECOND | (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?) | 匹配秒 |
TIME | (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) | 匹配时间 |
DATE_US | %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR} | 匹配美国时间格式 |
DATE_EU | %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR} | 匹配欧洲时间格式 |
ISO8601_TIMEZONE | (?:Z|[+-]%{HOUR}(?::?%{MINUTE})) | 匹配ISO8601时区 |
ISO8601_SECOND | (?:%{SECOND}|60) | 匹配ISO8601秒 |
TIMESTAMP_ISO8601 | %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? | 匹配ISO8601时间 |
DATE | %{DATE_US}|%{DATE_EU} | 匹配时间 |
DATESTAMP | %{DATE}[- ]%{TIME} | 匹配时间戳 |
TZ | (?:[PMCE][SD]T|UTC) | 匹配TZ时间 |
DATESTAMP_RFC822 | %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ} | 匹配RFC822时间格式 |
DATESTAMP_RFC2822 | %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE} | 匹配RFC2822时间格式 |
DATESTAMP_OTHER | %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR} | 匹配OTHER时间格式 |
DATESTAMP_EVENTLOG | %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND} | 匹配EVENTLOG 时间格式 |
SYSLOGTIMESTAMP | %{MONTH} +%{MONTHDAY} %{TIME} | 匹配syslog时间戳 |
PROG | (?:[\w._/%-]+) | 匹配PROG |
SYSLOGPROG | %{PROG:program}(?:\[%{POSINT:pid}\])? | 匹配系统日志 PID |
SYSLOGHOST | %{IPORHOST} | 匹配系统日志主机 |
SYSLOGFACILITY | <%{NONNEGINT:facility}.%{NONNEGINT:priority}> | 匹配系统日志设施 |
HTTPDATE | %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} | 匹配http日期 |
SYSLOGBASE | %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: | 匹配系统日志库 |
COMMONAPACHELOG | %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) | 匹配COMMONAPACHELOG |
COMBINEDAPACHELOG | %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent} | 匹配COMBINEDAPACHELOG |
LOGLEVEL | ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?) | 匹配日志等级 |
示例
下面演示如何将一行日志的时间解析出来:
2023-2-14 10:50:15 hello word
使用内置正则:
filter {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time}"
}
}
}
输出:
{
data:2023-2-14
time:10:50:15
}
除此之外我们还可以使用自定义正则,自定义正则以()作为开始,如果定义变量使用?<>来定义,下面是自定义正则解析时间:
filter {
grok {
match => {
"message" => "(?<date>\d{4}-\d{2}-\d{2})\s(?<time>\d{2}:\d{2}:\d{2})"
}
}
}
grok还支持pattern文件,pattern文件名字只能是:postfix,类似c语言里面的宏定义,你可以将你的正则写进去并定义一个别名:
my_time \d{4}-\d{2}-\d{2}
然后就可以在里面使用它了:
filter {
grok {
patterns_dir => ["/ElasticStack/logstash-8.5.2/config/patterns"]
match => {
"message" => "%{my_time:date}"
}
}
}
Mutate
前言
Mutate是过滤器,用于将字段进行二次过滤,它位于grok后一层
示例
grok解析出来的字段默认类型是keyboard,键值,属于txt文本类型,filter提供mutate工具来帮我们定义类型,其格式如下:
mutate {
convert => {
"Test" => "integer"
}
}
mutate提供了如下几种类型:
integer:整形
string: 字符类型
keyboard: 键类型
float: 浮点数类型
除此之外还提供对变量的其它操作
字段重命名:
mutate {
rename => {"name" => "name3"}
}
去除字段空格:
mutate {
strip => ["name"]
}
更新字段值 update
mutate {
update => {"name" => "li"}
}
增加字段 add_field
mutate {
add_field => {"testField1" => "0"}
add_field => {"testField2" => "%{name}"} #引用name中的值
}
移除字段 remove_field
mutate {
remove_field => ["name"]
}
大小写转换 lowercase&uppercase
mutate {
#lowercase => [ "name" ]
uppercase => [ "name" ]
}
正则表达式替换 gsub
这里只针对string类型字段,如下把name字段中的“o”替换为“p”
mutate {
gsub => ["name","o","p"]
}
复制字段 copy
复制一个已存在的字段到另外一个字段,已存在的字段会被重写到一个新的字段,新的字段不需要单独添加
mutate {
copy => {"name" => "name2"}
}
if语句
在logstash里是支持if语句的例如可以通过filebeat传递过来的字段进行判断从而进行不同的操作:
filter {
if "hello" in [fields][type] {
grok {
match => {
"message" => "xxx"
}
}
}
}
上面code判断filebeat里fields下的type字段是否为hello,如果是hello则进行解析
当然也可以运用在output上:
output {
if "hello" in [fields][type] {
elasticsearch {
hosts => ["localhost:9200"]
index => "test"
}
}
stdout { codec => rubydebug }
}