近几年笔者在生产环境中,很多日志处理场景中都适用了Rsyslog,在基于UDP的分布式日志汇聚、日志文件采集方面都有出色的发挥,“The rocket-fast system for log processing” 真不是吹的。
在Rsyslog的官方文档中,发现Rsyslog已经提供了文件(imfile)输入方法,毕竟它原本就在系统中,拿来就用,还选什么 filebeat,logstash。
Rsyslog读文件输入源的支持已经相当完善,直接摘抄官方文档:
This module provides the ability to convert any standard text file into a syslog message. A standard text file is a file consisting of printable characters with lines being delimited by LF.
这个模块(imfile)提供了将任何标准文本文件转换为syslog消息的能力。标准文本文件是由可打印字符组成的文件,其中的行由LF分隔。
The file is read line-by-line and any line read is passed to rsyslog’s rule engine. The rule engine applies filter conditions and selects which actions needs to be carried out. Empty lines are not processed, as they would result in empty syslog records. They are simply ignored.
通过逐行读取文件,并将任何行读取传递给rsyslog的规则引擎。规则引擎应用过滤条件并选择需要执行哪些操作。空行不会被处理,因为它们会产生空的syslog记录。所以被简单的忽略掉了。
As new lines are written they are taken from the file and processed. Depending on the selected mode, this happens via inotify or based on a polling interval. Especially in polling mode, file reading doesn’t happen immediately. But there are also slight delays (due to process scheduling and internal processing) in inotify mode.
当在写入新行时,它们会从文件中取出并进行处理。这取决于所选的模式,通过inotify或基于轮询间隔进行。特别是在轮询模式下,文件读取不会立即发生。但inotify模式也有轻微的延迟(由于进程调度和内部处理)。
The file monitor supports file rotation. To fully work, rsyslogd must run while the file is rotated. Then, any remaining lines from the old file are read and processed and when done with that, the new file is being processed from the beginning. If rsyslogd is stopped during rotation, the new file is read, but any not-yet-reported lines from the previous file can no longer be obtained.
文件监视器支持文件旋转(替换为新的文件时)。要充分工作,必须在文件旋转时运行rsyslogd。然后,读取和处理旧文件中的所有剩余行,当这些行完成后,将从头开始处理新文件。如果在旋转期间停止rsyslogd,则会读取新文件,但前一个文件中尚未报告的行无法被重新获取。
When rsyslogd is stopped while monitoring a text file, it records the last processed location and continues to work from there upon restart. So no data is lost during a restart (except, as noted above, if the file is rotated just in this very moment).
当rsyslogd在监视文本文件时停止时,它将记录最后处理的位置,并在重新启动时从那里继续工作。因此,在重启过程中不会丢失数据(如上所述,除非文件在此时被旋转)。
闲话少说,开始配置一个从本地文件读取日志文件并输出到Kafka的例子:
1. 环境准备,最新V8稳定版已经提供RPM包的Rsyslog-kafka插件了,直接yum安装即可,
yum install rsyslog rsyslog-kafka.x86_64
2. 配置, 将如下内容保存到配置文件/etc/rsyslog.d/nginx_kafka.conf (没有可以自己创建,文件名可自己指定),重启rsyslog即可。
# 加载文件读取模块(im前缀为输入模块),imfile模块
module(load="imfile")
# 加载kafka输出模块(om前缀为输出模块),omkafka
module(load="omkafka")
# nginx 日志模版(原样输出)
template(name="nginxAccessTemplate" type="string" string="%msg%\n")
# 定义消息来源及设置相关的action
input(type="imfile" Tag="nginx" File="/data/log/nginx/access.log" Ruleset="nginx-kafka")
# 定义消息的输出模版
ruleset(name="nginx-kafka") {
#日志转发kafka
action (
type="omkafka"
template="nginxAccessTemplate"
confParam=["compression.codec=snappy","queue.buffering.max.messages=100000"]
partitions.number="3"
topic="topic_nginx_log"
broker="192.168.1.12:19092,192.168.1.13:19092,192.168.1.14:19092"
queue.spoolDirectory="/tmp"
queue.filename="nginx_kafka"
queue.size="360000"
queue.maxdiskspace="4G"
queue.highwatermark="216000"
queue.discardmark="350000"
queue.type="LinkedList"
queue.dequeuebatchsize="4096"
queue.timeoutenqueue="0"
queue.maxfilesize="10M"
queue.saveonshutdown="on"
queue.workerThreads="4"
)
}
说明: 检查conf文件是否正确可以运行rsyslogd debug模式rsyslogd -dn
运行,看日志输出结果,或者直接运行rsyslogd -N 1
检查conf文件是否正确。
完了,整体就这么简单。