ELK日志分析平台--Logstash数据采集介绍与配置

  • 1. Logstash简介
  • 2. Logstash组成
  • 3. Logstash 安装与配置
  • 3.1 运行 logstash
  • 3.2 file 输出插件
  • 3.3 elasticsearch 输出插件
  • 3.4 Syslog 输入插件
  • 3.5 多行过滤插件
  • 3.6 grok 过滤插件


1. Logstash简介

Logstash 是用于日志的搜集、分析、过滤的工具,支持大量的数据获取方式。一般工作方式为c/s架构,client 端安装在需要收集日志的主机上,server 端负责将收到的各节点日志进行过滤、修改等操作在一并发往 elasticsearch 上去。

Logstash 是一个开源的服务器端数据处理管道。
拥有200多个插件,能够同时从多个来源采集数据,转换数据,然后将数据发送到最喜欢的 “存储库” 中。(大多都是 Elasticsearch。)

2. Logstash组成

Logstash 管道有两个必需的元素,输入和输出,以及一个可选元素过滤器。

logstash springboot日志收集 logstash日志采集_elasticsearch

输入:

采集各种样式、大小和来源的数据.

logstash springboot日志收集 logstash日志采集_java_02

Logstash 支持各种输入选择 ,同时从众多常用来源捕捉事件。

能够以连续的流式传输方式,轻松地从您的日志、指标、Web 应用、数据存储以及各种 AWS 服务采集数据。

过滤器:
实时解析和转换数据.
数据从源传输到存储库的过程中,Logstash 过滤器能够解析各个事件,识别已命名的字段以构建结构,并将它们转换成通用格式,以便更轻松、更快速地分析和实现商业价值。
利用 Grok 从非结构化数据中派生出结构
从 IP 地址破译出地理坐标
将 PII 数据匿名化,完全排除敏感字段
简化整体处理,不受数据源、格式或架构的影响

输出:

选择您的存储库,导出您的数据.

logstash springboot日志收集 logstash日志采集_elasticsearch_03

尽管 Elasticsearch 是我们的首选输出方向,能够为我们的搜索和分析带来无限可能,但它并非唯一选择。
Logstash 提供众多输出选择,您可以将数据发送到您要指定的地方,并且能够灵活地解锁众多下游用例。

3. Logstash 安装与配置

软件下载https://elasticsearch.cn/download/

软件下载时尽量保证和 ElasticSearch 版本一致;此处需要下java包来支持logstash,为了和 elasticsearch 相匹配,此处最低需要 Java 8.

[root@server4 ~]# rpm -ivh jdk-8u181-linux-x64.rpm 
[root@server4 ~]# which java
/usr/bin/java
[root@server4 ~]# rpm -ivh logstash-7.6.1.rpm

3.1 运行 logstash

在命令行中设置的任何标志都会覆盖 logstash.yml 文件中的相应设置,但文件本身不会更改。对于后续的 Logstash 运行,它保持原样。
执行二进制脚本来运行 logstash,/usr/share/logstash/bin下包括启动 Logstash 和 logstash-plugin 安装插件
官方文档https://www.elastic.co/guide/en/logstash/current/running-logstash-command-line.html

用命令的方式来执行标准输入到标准输出,即从接收从终端收到的再从终端输出

[root@server4 conf.d]# pwd
/etc/logstash/conf.d
[root@server4 conf.d]# ls
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
#当没有文件时,会让指定输入的信息。
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2021-12-11 16:11:40.800 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2021-12-11 16:11:40.837 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[WARN ] 2021-12-11 16:11:41.547 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2021-12-11 16:11:41.553 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"7.6.1"}
[INFO ] 2021-12-11 16:11:41.604 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"0b48306f-7137-49f0-b8f1-a3c6939b459b", :path=>"/usr/share/logstash/data/uuid"}
[INFO ] 2021-12-11 16:11:44.243 [Converge PipelineAction::Create<main>] Reflections - Reflections took 93 ms to scan 1 urls, producing 20 keys and 40 values 
[WARN ] 2021-12-11 16:11:46.354 [[main]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.RubyArray) has been create for key: cluster_uuids. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[INFO ] 2021-12-11 16:11:46.365 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["config string"], :thread=>"#<Thread:0x414aebed run>"}
[INFO ] 2021-12-11 16:11:47.863 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2021-12-11 16:11:48.015 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2021-12-11 16:11:48.419 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
westos
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
    "@timestamp" => 2021-12-11T08:12:23.940Z,
       "message" => "westos",
      "@version" => "1",
          "host" => "server4"
}
hellp
{
    "@timestamp" => 2021-12-11T08:12:29.051Z,
       "message" => "hellp",
      "@version" => "1",
          "host" => "server4"
}
world
{
    "@timestamp" => 2021-12-11T08:12:34.675Z,
       "message" => "world",
      "@version" => "1",
          "host" => "server4"
}

3.2 file 输出插件

-f -path.config CONFIG_PATH 从特定文件或目录加载 Logstash 配置。如果给定目录,则该目录中的所有文件将按字典顺序连接,然后解析为单个配置文件
用文件的方式来执行标准输入到标准输出,可以将其写入命令行也可以将其写入文件,然后用绝对路径的方式来指定。

[root@server4 conf.d]# pwd
/etc/logstash/conf.d
[root@server4 conf.d]# vim test.conf
[root@server4 conf.d]# cat test.conf 
input {
	stdin {}		##接收终端输入
}
output {
	stdout {}		##显示终端输出
}
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf 
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2021-12-11 16:15:43.802 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2021-12-11 16:15:43.844 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"7.6.1"}
[INFO ] 2021-12-11 16:15:47.104 [Converge PipelineAction::Create<main>] Reflections - Reflections took 83 ms to scan 1 urls, producing 20 keys and 40 values 
[WARN ] 2021-12-11 16:15:49.358 [[main]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.RubyArray) has been create for key: cluster_uuids. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.
[INFO ] 2021-12-11 16:15:49.368 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/test.conf"], :thread=>"#<Thread:0x7281b76f run>"}
[INFO ] 2021-12-11 16:15:51.525 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2021-12-11 16:15:51.661 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2021-12-11 16:15:52.085 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
hello
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
       "message" => "hello",
    "@timestamp" => 2021-12-11T08:16:13.477Z,
          "host" => "server4",
      "@version" => "1"
}
zxk
{
       "message" => "zxk",
    "@timestamp" => 2021-12-11T08:16:18.370Z,
          "host" => "server4",
      "@version" => "1"
}
[root@server4 conf.d]# vim test.conf 
[root@server4 conf.d]# cat test.conf 
input {
	stdin {}
}
output {
	file {
                path => "/tmp/logstash.txt"	#输出的文件路径
                codec => line { format => "custom format: %{message}"}	
                #定制数据格式,message是输入的格式
                #codec就是用来decode,encode 事件的。所以codec常用在input和output中
   				#codec => line就是输出一行内容
        }
}
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf 
zxk
[INFO ] 2021-12-11 16:20:49.885 [[main]>worker0] file - Opening file {:path=>"/tmp/logstash.txt"}
hello
[INFO ] 2021-12-11 16:21:09.860 [[main]>worker1] file - Closing file /tmp/logstash.txt
[WARN ] 2021-12-11 16:21:38.799 [SIGINT handler] runner - SIGINT received. Shutting down.
[INFO ] 2021-12-11 16:21:39.179 [Converge PipelineAction::Stop<main>] javapipeline - Pipeline terminated {"pipeline.id"=>"main"}
[INFO ] 2021-12-11 16:21:39.915 [LogStash::Runner] runner - Logstash shut down.
[root@server4 conf.d]# cat /tmp/logstash.txt
custom format: zxk
custom format: hello

此时是没有输出的直接写入文件,当也可以添加多个模块来做,加入stdout 来在终端显示信息。

3.3 elasticsearch 输出插件

将日志上传至elasticsearch 中,编辑配置文件。

[root@server4 conf.d]# vim test.conf 
[root@server4 conf.d]# cat test.conf 
input {
	stdin {}
}
output {
	stdout {}
        elasticsearch {
	        hosts => "172.25.25.21:9200"	#输出到的ES主机与端口
                index => "logstash-%{+YYYY.MM.dd}"	 
                 #定制索引名称,每天新建一份
	}
}

更改配置文件之后再次运行 logstash:

[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf 
hello
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
          "host" => "server4",
      "@version" => "1",
       "message" => "hello",
    "@timestamp" => 2021-12-11T08:38:54.875Z
}
westos
{
          "host" => "server4",
      "@version" => "1",
       "message" => "westos",
    "@timestamp" => 2021-12-11T08:38:58.282Z
}
zxk
{
          "host" => "server4",
      "@version" => "1",
       "message" => "zxk",
    "@timestamp" => 2021-12-11T08:38:59.443Z
}

在 es上查看,已自动创建索引和分片,采集到的内容和输入一致。

logstash springboot日志收集 logstash日志采集_elasticsearch_04


采集日志的信息时,由于用的是 systemd 的启动方式,默认是用的logstach 用户,在采集时没有权限,所以来给定权限之后,然后再来设定。

[root@server4 conf.d]# ll /var/log/messages
-rw------- 1 root root 14294 Dec 11 16:31 /var/log/messages
[root@server4 conf.d]# ll /var/log/ -d
drwxr-xr-x. 8 root root 4096 Dec 11 16:09 /var/log/
[root@server4 conf.d]# chmod 644 /var/log/messages
[root@server4 conf.d]# ll /var/log/messages
-rw-r--r-- 1 root root 14294 Dec 11 16:31 /var/log/messages
[root@server4 conf.d]# vim test.conf 
[root@server4 conf.d]# cat test.conf 
input {
	file {		#从文件/var/log/messages输入,从头开始输入
		path => "/var/log/messages"
		start_position => "beginning"
	}
}
output {
	stdout {}
        elasticsearch {
	        hosts => "172.25.25.21:9200"
                index => "logstash-%{+YYYY.MM.dd}"
	}
}

修改完成之后再次运行logstash:

[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf

完成之后在es端从网页中删除 logstash,再次创建时只会将更新的数据写入,(可以使用logger “日志” 命令来产生日志)。logstash 重启会将数据重复写入,当重启之后,logstash 会有一个机制来记录上次读取到的位置,只将新的数据写入;这是因为有一个 .sincedb 文件,它负责记录数据偏移量,已经上传过的数据,不会重复上传;如果想要重新读取,删除该文件,然后默认是从头开始读取。

删除之后再次运行,此时只会将更新的数据写入,

[root@server4 file]# pwd
/usr/share/logstash/data/plugins/inputs/file
[root@server4 file]# l.
.  ..  .sincedb_452905a167cf4509fd08acb964fdb20c
[root@server4 file]# cat .sincedb_452905a167cf4509fd08acb964fdb20c
52394392 0 64768 14294 1639212758.972494 /var/log/messages

logstash 如何区分设备、文件名、文件的不同版本: logstash 会把进度保存到 sincedb 文件中,该文件有 6 个字段,每个字段的含义分别为:inode编号、文件系统的主要设备号、文件系统的次要设备号、文件中的当前字节偏移量、最后一个活动时间戳(浮点数)、此记录匹配的最后一个已知路径。

3.4 Syslog 输入插件

如果要收集多台服务器的日志信息,那么每台都部署 Logstash 就比较麻烦,可以让 Logstash 伪装成日志服务器,直接接受每个节点的远程日志。

再次更改输入规则,让其接收每个节点的远程日志。

[root@server4 conf.d]# ls
test.conf
[root@server4 conf.d]# vim test.conf 
[root@server4 conf.d]# cat test.conf 
input {
	syslog {}
}
output {
	stdout {}
        elasticsearch {
	        hosts => "172.25.25.21:9200"
                index => "syslog-%{+YYYY.MM.dd}"
	}
}
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf 

[root@server4 ~]# netstat -antlup | grep :514
tcp6       0      0 :::514                  :::*                    LISTEN      8200/java           
udp        0      0 0.0.0.0:514             0.0.0.0:*                           8200/java

此时开启之后,默认是有一个 514 接口的5,然后此时便可以将其他ES主机的日志发往该主机。

ES 主机 server1 需要编辑 /etc/rsyslog.conf 远程日志配置文件,打开 514 端口,并指定以何种方式发往那个主机。

[root@server1 ~]# vim /etc/rsyslog.conf 
 14 # Provides UDP syslog reception
 15 $ModLoad imudp
 16 $UDPServerRun 514
 90 *.* @@172.25.25.24:514
[root@server1 ~]# systemctl restart rsyslog.service
此时重启之后,在监控的页面会立即收到信息;
{
           "message" => "Unregistered Authentication Agent for unix-process:4421:9009625 (system bus name :1.66, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)\n",
               "pid" => "2939",
          "facility" => 10,
           "program" => "polkitd",
         "logsource" => "server1",
          "priority" => 85,
              "host" => "172.25.25.21",	#该主机
    "severity_label" => "Notice",
    "facility_label" => "security/authorization",
          "severity" => 5,
          "@version" => "1",
         "timestamp" => "Dec 12 15:10:21",
        "@timestamp" => 2021-12-12T07:10:21.000Z
}

为了测试日志信息的更新状况,此处手动生成日志信息:

[root@server1 ~]# logger hello zxk		#往messages中写入日志。
{
           "message" => "hello zxk\n",
          "facility" => 1,
           "program" => "root",
         "logsource" => "server1",
          "priority" => 13,
              "host" => "172.25.25.21",
    "severity_label" => "Notice",
    "facility_label" => "user-level",
          "severity" => 5,
          "@version" => "1",
         "timestamp" => "Dec 12 15:12:39",
        "@timestamp" => 2021-12-12T07:12:39.000Z
}

3.5 多行过滤插件

在查看日志时,面对杂乱的众多信息,有必要对所需要的信息进行过滤以方便快速定位。

[root@server4 conf.d]# vim multiline.conf
[root@server4 conf.d]# cat multiline.conf 
input {
	stdin {
	codec => multiline {
		pattern => "EOF"	#以什么结尾
		negate => "true"
		what => "previous"		#向上合并
		}
	}
}
output {
	stdout {}
}
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/multiline.conf

1
w
z
2
4
EOF
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
       "message" => "1\nw\nz\n2\n4",
          "tags" => [
        [0] "multiline"
    ],
    "@timestamp" => 2021-12-12T07:18:59.540Z,
      "@version" => "1",
          "host" => "server4"
}

多行日志常用于java的日志输出和云计算的日志输出分析。

[root@server4 conf.d]# vim test.conf 
[root@server4 conf.d]# cat test.conf 
input {
	file {
		path => "/var/log/my-es-2021-12-11-1.log"
		start_position => "beginning"
		codec => multiline {
			pattern => "^\["
			negate => "true"
			what => "previous"
		}
	}
}
output {
	stdout {}
        elasticsearch {
	        hosts => "172.25.25.21:9200"
                index => "syslog-%{+YYYY.MM.dd}"
	}
}
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf

此时便会将多行信息进行合并,方便观察。

logstash springboot日志收集 logstash日志采集_运维_05

3.6 grok 过滤插件

分割命令行的信息输出到终端;
安装 httpd,从外部访问一次然后来看;来通过其日志来过滤出访问IP 。
格式可以在/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns中查看

[root@server4 patterns]# pwd
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns
[root@server4 patterns]# ll httpd 
-rw-r--r-- 1 logstash logstash 987 Feb 29  2020 httpd

把 apache 的日志作为 grok 的输入,日志文件需要给读的权限,日志文件的目录 /var/log/httpd 需要给读和执行的权限,读的时候是logstash 的身份

[root@server4 conf.d]# ll /var/log/httpd/access_log 
-rw-r--r-- 1 root root 168 Dec 12 16:00 /var/log/httpd/access_log
[root@server4 conf.d]# ll -d /var/log/httpd/
drwx------ 2 root root 41 Dec 12 15:59 /var/log/httpd/
[root@server4 conf.d]# chmod 755 /var/log/httpd/
[root@server4 conf.d]# ll -d /var/log/httpd/
drwxr-xr-x 2 root root 41 Dec 12 15:59 /var/log/httpd/

编辑配置文件:

[root@server4 conf.d]# cat apache.conf 
input {
	file {
		path => "/var/log/httpd/access_log"
		start_position => "beginning"
	}
}
filter {
	grok {
	match => { "message" => "%{HTTPD_COMBINEDLOG}" }
	}
}
output {
	stdout {}
	elasticsearch {
	        hosts => "172.25.25.21:9200"
                index => "apachelog-%{+YYYY.MM.dd}"
	}
}
[root@server4 conf.d]# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/apache.conf

{
    "httpversion" => "1.1",
       "@version" => "1",
        "request" => "/",
       "response" => "200",
           "verb" => "GET",
       "clientip" => "172.25.25.250",
          "ident" => "-",
          "bytes" => "12",
     "@timestamp" => 2021-12-12T08:11:34.862Z,
           "host" => "server4",
        "message" => "172.25.25.250 - - [12/Dec/2021:16:00:08 +0800] \"GET / HTTP/1.1\" 200 12 \"-\" \"curl/7.61.1\"",
      "timestamp" => "12/Dec/2021:16:00:08 +0800",
           "path" => "/var/log/httpd/access_log",
          "agent" => "\"curl/7.61.1\"",
       "referrer" => "\"-\"",
           "auth" => "-"
}

完成一次测试便会生成相应的索引:

logstash springboot日志收集 logstash日志采集_elk_06

可以做压测,然后可以统计数量。

logstash springboot日志收集 logstash日志采集_elasticsearch_07