Grok

前言

在Logstash里为我们提供了一个非常强大的Grok日志分析工具,它可以将杂乱无序的日志变的井然有序。

语法与内置正则

Grok属于Filter的插件,同时Grok内置了许多正则,方便我们解析日志,Grok语法如下:

filter {
            grok {
                    match => {
                                "message" => "this is regular"
                    }
            }
}

在logstash里使用数学里的"=>"推导符号来表示关系,message是filebeat发送过来的一个字段内容

grok里的语法格式如下:

%{正则:变量}

grok内置了多个正则,帮助我们来快速解析日志内容:



模式

正则

作用

USERNAME

[a-zA-Z0-9._-]+

提取用户名

USER

%{USERNAME}

提取用户, USERNAME的别名

INT

(?:[+-]?(?:[0-9]+))

提取整数

BASE10NUM

(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))

提取十进制的数

NUMBER

(?:%{BASE10NUM})

提取数字

BASE16NUM

(?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))

提取十六进制的数

BASE16FLOAT

\b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b

提取十六进制的浮点数

POSINT

\b(?:[1-9][0-9]*)\b

提取不以0开头的数,如:01 22 33 04

只提取22 33以0开头的数字不会提取

NONNEGINT

\b(?:[0-9]+)\b

全部匹配,以0开头依然会匹配

WORD

\b\w+\b

匹配单词

NOTSPACE

\S+

匹配非空白字符

SPACE

\s*

匹配空格

DATA

.*?

满足条件字符匹配一次

GREEDYDATA

.*

贪婪匹配

QUOTEDSTRING

(?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))

匹配引用字符串

UUID

[A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}

匹配UUID

MAC

(?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})

匹配MAC地址

CISCOMAC

(?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})

匹配 CISCO MAC地址

WINDOWSMAC

(?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})

匹配Windows MAC地址

COMMONMAC

(?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})

匹配COMMON MAC地址

IPV6

((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?

匹配IPV6

IPV4

(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])

匹配IPV4

IP

(?:%{IPV6}|%{IPV4})

匹配IP6或IP4

HOSTNAME

\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)

匹配主机名

HOST

%{HOSTNAME}

HOSTNAME别名

IPORHOST

(?:%{HOSTNAME}|%{IP})

匹配IPORHOST

HOSTPORT

%{IPORHOST}:%{POSINT}

匹配HOSTPORT

PATH

(?:%{UNIXPATH}|%{WINPATH})

匹配路径

UNIXPATH

(?>/(?>[\w_%!$@:.,-]+|\\.)*)+

unix路径

TTY

(?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))

tty路径

WINPATH

(?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+

windows路径

URIPROTO

[A-Za-z]+(\+[A-Za-z+]+)?

匹配URIPROTO

URIHOST

%{IPORHOST}(?::%{POSINT:port})?

匹配URIHOST

URIPATH

(?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+

匹配URIPATH

URIPARAM

\?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]*

匹配URIPARAM

URIPATHPARAM

%{URIPATH}(?:%{URIPARAM})?

匹配URIPATHPARAM

URI

%{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

匹配URI

MONTH

\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b

匹配月

MONTHNUM

(?:0?[1-9]|1[0-2])

匹配月数

MONTHNUM2

(?:0[1-9]|1[0-2])

匹配月数写法2

MONTHDAY

(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])

匹配日

DAY

(?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)

匹配星期

YEAR

(?>\d\d){1,2}

匹配年

HOUR

(?:2[0123]|[01]?[0-9])

匹配小时

MINUTE

(?:[0-5][0-9])

匹配分钟

SECOND

(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)

匹配秒

TIME

(?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])

匹配时间

DATE_US

%{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}

匹配美国时间格式

DATE_EU

%{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}

匹配欧洲时间格式

ISO8601_TIMEZONE

(?:Z|[+-]%{HOUR}(?::?%{MINUTE}))

匹配ISO8601时区

ISO8601_SECOND

(?:%{SECOND}|60)

匹配ISO8601秒

TIMESTAMP_ISO8601

%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?

匹配ISO8601时间

DATE

%{DATE_US}|%{DATE_EU}

匹配时间

DATESTAMP

%{DATE}[- ]%{TIME}

匹配时间戳

TZ

(?:[PMCE][SD]T|UTC)

匹配TZ时间

DATESTAMP_RFC822

%{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}

匹配RFC822时间格式

DATESTAMP_RFC2822

%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}

匹配RFC2822时间格式

DATESTAMP_OTHER

%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}

匹配OTHER时间格式

DATESTAMP_EVENTLOG

%{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}

匹配EVENTLOG 时间格式

SYSLOGTIMESTAMP

%{MONTH} +%{MONTHDAY} %{TIME}

匹配syslog时间戳

PROG

(?:[\w._/%-]+)

匹配PROG

SYSLOGPROG

%{PROG:program}(?:\[%{POSINT:pid}\])?

匹配系统日志 PID

SYSLOGHOST

%{IPORHOST}

匹配系统日志主机

SYSLOGFACILITY

<%{NONNEGINT:facility}.%{NONNEGINT:priority}>

匹配系统日志设施

HTTPDATE

%{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}

匹配http日期

SYSLOGBASE

%{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:

匹配系统日志库

COMMONAPACHELOG

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

匹配COMMONAPACHELOG

COMBINEDAPACHELOG

%{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

匹配COMBINEDAPACHELOG

LOGLEVEL

([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)

匹配日志等级



示例

下面演示如何将一行日志的时间解析出来:

2023-2-14 10:50:15 hello word

使用内置正则:

filter {
            grok {
                    match => {
                                "message" => "%{DATE:date} %{TIME:time}"
                    }
            }
}

输出:

{
    data:2023-2-14
    time:10:50:15
}

除此之外我们还可以使用自定义正则,自定义正则以()作为开始,如果定义变量使用?<>来定义,下面是自定义正则解析时间:

filter {
            grok {
                    match => {
                                "message" => "(?<date>\d{4}-\d{2}-\d{2})\s(?<time>\d{2}:\d{2}:\d{2})"
                    }
            }
}

grok还支持pattern文件,pattern文件名字只能是:postfix,类似c语言里面的宏定义,你可以将你的正则写进去并定义一个别名:

my_time \d{4}-\d{2}-\d{2}

然后就可以在里面使用它了:

filter {
            grok {
                    patterns_dir => ["/ElasticStack/logstash-8.5.2/config/patterns"]
                    match => {
                                "message" => "%{my_time:date}"
                    }
            }
}

Mutate

前言

Mutate是过滤器,用于将字段进行二次过滤,它位于grok后一层

示例

grok解析出来的字段默认类型是keyboard,键值,属于txt文本类型,filter提供mutate工具来帮我们定义类型,其格式如下:

mutate {
        convert => {
                    "Test" => "integer"
                  }
}

mutate提供了如下几种类型:

integer:整形

string: 字符类型

keyboard: 键类型

float: 浮点数类型

除此之外还提供对变量的其它操作

字段重命名:

mutate {
        rename => {"name" => "name3"}
}

去除字段空格:

mutate {
        strip => ["name"]
}

更新字段值 update

mutate {
        update => {"name" => "li"}
}

增加字段 add_field

mutate {

	add_field => {"testField1" => "0"}

	add_field => {"testField2" => "%{name}"} #引用name中的值

}

移除字段 remove_field

mutate {

	remove_field => ["name"]

}

大小写转换 lowercase&uppercase

mutate {

	#lowercase => [ "name" ]

	uppercase => [ "name" ]

}

正则表达式替换 gsub

这里只针对string类型字段,如下把name字段中的“o”替换为“p”

mutate {
	gsub => ["name","o","p"]
}

复制字段 copy

复制一个已存在的字段到另外一个字段,已存在的字段会被重写到一个新的字段,新的字段不需要单独添加

mutate {
	copy => {"name" => "name2"}
}

if语句

在logstash里是支持if语句的例如可以通过filebeat传递过来的字段进行判断从而进行不同的操作:

filter {
        if "hello" in [fields][type] {
                grok {

                                match => {
                                                "message" => "xxx"
                        }
                }
            
        }
}

上面code判断filebeat里fields下的type字段是否为hello,如果是hello则进行解析

当然也可以运用在output上:

output {
        if "hello" in [fields][type] {
                elasticsearch {
                        hosts => ["localhost:9200"]
                        index => "test"
                }
        }

        stdout { codec => rubydebug }

}