项目背景

随着业务的不断优化调整,开发的环境由传统环境转向Docker容器方向,各种开发过程和应用的日志变得更加种类繁多。
因此,集中式的日志管理与展示分析变得尤为重要。

项目实施架构

image

Docker环境搭建

Centos 7.5 docker ( 一 ) 安装

EFK简介

## Elasticsearch :
    官网 https://www.elastic.co
    分布式搜索引擎。具有高可伸缩、高可靠、易管理等特点。可以用于全文检索、结构化检索和分析,并能将这三者结合起来。
    Elasticsearch 基于 Lucene 开发,现在使用最广的开源搜索引擎之一,Wikipedia 、StackOverflow、Github 等都基于它来构建自己的搜索引擎。

## Fluentd (td-agent):
    https://www.fluentd.org
    是开源社区中流行的日志采集器,提供了丰富的插件来适配不同的数据源、输出目的地等。
    fluentd基于C和Ruby实现,并对性能表现关键的一些组件用C语言重新实现,整体性能不错
    由于docker的log driver默认支持Fluentd,所以发送端默认选定Fluentd.
    td-agent是fluentd的易安装版本,由Treasure Data公司维护。一般会默认包含一些常用插件
    fluentd适合折腾,td-agent适合安装在大规模的生产环境。

## Kibana :
    官网 https://www.elastic.co
    可视化化平台。它能够搜索、展示存储在 Elasticsearch 中索引数据。使用它可以很方便的用图表、表格、地图展示和分析数据。

术语约定

Elasticsearch 后续简称为 ES

三大日志采集器横向对比

网上转发前辈的对比

日志客户端(Logstash,Fluentd, Logtail)横评

阿里云ECS配置参数

ES+Kibana 所在主机
ecs.mn4.small 共享通用型 1核 4GB

# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core) 
# uname -r
4.4.162-1.el7.elrepo.x86_64

td-agent 所在主机
需要采集数据的每一台主机。注意,并不是在docker容器内安装。(当然,这视乎你是如何设计日志收集方式而定)

部署EFK

下载docker镜像

点开Download会有比较详细的指引
image
image

具体使用的指令如下:

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.5.4
docker tag docker.elastic.co/elasticsearch/elasticsearch:6.5.4 elasticsearch:6.5.4
docker rmi docker.elastic.co/elasticsearch/elasticsearch:6.5.4 

docker pull docker.elastic.co/kibana/kibana:6.5.4
docker tag docker.elastic.co/kibana/kibana:6.5.4 kibana:6.5.4
docker rmi docker.elastic.co/kibana/kibana:6.5.4 

以上指令看不懂的话,请自行补docker知识

ES + Kibana 所在宿主机内核参数优化

vim /etc/sysctl.conf
vm.max_map_count=262144

立即生效
sysctl -w vm.max_map_count=262144

如果不设置,容器将会报如下错误
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

docker-compose 配置生成容器

用到的一些辅助配置(这些需要你去了解ES和Kibana是如何工作的)

mkdir -p /data/docker/EFK
cd /data/docker/EFK

elasticsearch 相关

主配置文件
vim elasticsearch.yml
内容如下
cluster.name: EFK                   # 这是注释:集群名称
node.name: host-elk01               # 集群中节点名称
path.data: /var/lib/elasticsearch   
path.logs: /var/log/elasticsearch

ES的JVM参数配置文件
vim jvm.options

内容如下,这些内容,可以从官方安装包内获取,这里只是把内容贴出来而已。
# JVM heap size,注意,此2个值要保持一致,否则会出现启动不了
-Xms1500m
-Xmx1500m

## 以下内容我没有修改,建议不熟悉参数的话,使用官方提供的参数
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

# temporary workaround for C2 bug with JDK 10 on hardware with AVX-512
10-:-XX:UseAVX=2

kibana相关

vim kibana.yml

内容如下
xpack.monitoring.ui.container.elasticsearch.enabled: true   # 这是注释,xpack是一个权限控制插件,30天试用。
server.port: 5601                               # kibana 监听端口
server.host: "0"                                # kibana 监听地址,0代表所有地址(0.0.0.0/0)
#server.basePath: ""
#server.rewriteBasePath: false
#server.maxPayloadBytes: 1048576
server.name: kibana                             # 服务器名称
elasticsearch.url: http://elasticsearch:9200    # ES服务器访问地址
#elasticsearch.preserveHost: true
#kibana.index: ".kibana"
#kibana.defaultAppId: "home"
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]
#elasticsearch.ssl.verificationMode: full
#elasticsearch.pingTimeout: 1500
#elasticsearch.requestTimeout: 30000
#elasticsearch.requestHeadersWhitelist: [ authorization ]
#elasticsearch.customHeaders: {}
#elasticsearch.shardTimeout: 30000
#elasticsearch.startupTimeout: 5000
#elasticsearch.logQueries: false
#pid.file: /var/run/kibana.pid
#logging.dest: stdout
#logging.silent: false
#logging.quiet: false
#logging.verbose: false
#ops.interval: 5000
#i18n.locale: "en"

nginx 反向代理相关

因为要用到nginx反向代理,因此,会有一个容器专门运行nginx服务器
SSL证书可以采用Let’s Encrypt颁发的,免费使用90天,到期续约便可一直免费。

提供此nginx容器的主配置文件nginx.conf供参考

user nginx;
worker_processes auto;
worker_rlimit_nofile 60000;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;

events {
    use epoll;
    worker_connections 10240;
}

http {
        server_tokens off;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

#### logs
    access_log  /var/log/nginx/access.log  main;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    types_hash_max_size 2048;

#### include
    include /usr/share/nginx/modules/*.conf;
    include  /etc/nginx/mime.types;
    include /etc/nginx/conf.d/*.conf;
    default_type        application/octet-stream;

  # server set
    include /data/nginx_conf/vhosts/*.conf;

  # upstream set
  #  include /data/nginx_conf/upstream/*.conf;

##### Timeout
        keepalive_timeout   60;
        client_header_timeout 12;
        client_body_timeout 120;
        send_timeout 12;

##### post
        client_max_body_size 100M; # 这个自己设置了,允许上传的大小。

##### Buffer
        client_body_buffer_size 128k;
        client_header_buffer_size 4k;
        client_body_in_single_buffer on;
        large_client_header_buffers 4 8k;
        open_file_cache max=60000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 2;

#### Compression
        gzip on;
        gzip_comp_level 6;
        gzip_min_length 1k;
        gzip_buffers 16 8k;
        gzip_types text/plain text/css text/xml application/xml text/javascript application/javascript application/x-javascript application/x-httpd-php;
        gzip_vary off;
        gzip_disable "MSIE [1-6]\.";

# proxy set

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_connect_timeout 300;
        proxy_read_timeout 300;
        proxy_send_timeout 300;
        proxy_intercept_errors off;
        proxy_ignore_client_abort on;

此容器的虚拟主机配置供参考
# server set
    server {                                               
        listen       9001 default_server ssl;              
        listen       [::]:9001 default_server ssl;         
        server_name  demo.com;                    
        index        index.html index.htm         

# SSL set                                         

    ssl on;                                       
    ssl_certificate "/你的证书链路径/fullchain.cer";
    ssl_certificate_key "/你的证书私钥路径/demo.com.key";
    ssl_session_cache shared:SSL:10m;                                  
    ssl_session_timeout  30m;                                          
    ssl_ciphers HIGH:!aNULL:!MD5;                                      
    ssl_prefer_server_ciphers on;                                      
    add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";

## kibana                                                                               
        location / {                                                                    
                auth_basic "Authorization";                 # 用来在kibana访问前进行账号验证                            
                auth_basic_user_file /etc/.htpasswd;        # 参考下面的工具用法                             
                proxy_pass http://172.18.1.222:5601;        # 反向代理到内部的kibana主机                            
        }                                                                               
}

nginx 基本认证htpasswd工具用法参考

注意,因为kibana默认是没有权限管理的,暴露在公网上,必须在前面加一层nginx基本模块认证,且配置为https协议。
有钱的同学可以使用xpack。
nginx 的基本认证模块auth_basic 中用到的htpasswd加密文件,根据不同的容器需要安装以下组件
centos: httpd-tools
alpine: apache2-utils

创建使用文本文件作为数据库

htpasswd [ -c ] [ -m ] [ -D ] passwdfile username
htpasswd -b [ -c ] [ -m | -d | -p | -s ] [ -D ] passwdfile username password

-c:自动创建文件,仅在文件不存在时使用
-m:md5格式加密,默认方式
-s: sha格式加密
-D:删除指定用户
-b: 批处理时使用,可以通过命令行直接读取密码而不是交互。
-n: 不更新文件,仅仅屏幕输出命令执行结果。

交互方式:
#htpasswd -c /etc/httpd/conf.d/.htpasswd hunk1
New password: 
Re-type new password: 
Adding password for user hunk1

非交互方式:
#htpasswd -bs /etc/httpd/conf.d/.htpasswd hunk2 1234567
Adding password for user hunk2

生成的密码是经过加密的
#cat .htpasswd 
hunk1:xLhgTub5K6Css
hunk2:{SHA}IOq+XWSw4hZ5boNPUtYf0LcDMvw=

仅仅显示命令执行效果
#htpasswd -nbs hunk3 1234567
hunk3:{SHA}IOq+XWSw4hZ5boNPUtYf0LcDMvw=

删除指定用户
#htpasswd -D /etc/httpd/conf.d/.htpasswd hunk2
Deleting password for user hunk2

docker-compose.yml编排

docker创建外部自定义网络和数据卷,注意,这是使用docker-compose编排启动的必要条件
docker network create efk
docker volume create elasticsearch

####################
docker-compose.yml
内容如下

version: "2.4"
## 声明网络
networks:
  efk:
    external: true

## 声明数据卷
volumes:
  elasticsearch:
    external: true

### 服务
services:
  elasticsearch:
    image: elasticsearch:6.5.4
    container_name: elasticsearch
    environment:
      - cluster.name=EFK
      - node.name=host-elk01
      - bootstrap.memory_lock=true
      - "discovery.zen.ping.unicast.hosts=elasticsearch"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    networks:
      - efk
    ports:
      - "9200:9200"
    volumes:
      - /etc/localtime:/etc/localtime
      - "elasticsearch:/usr/share/elasticsearch/data"
      - "/data/docker/EFK/jvm.options:/usr/share/elasticsearch/config/jvm.options"
    restart: "always"

    logging:
      driver: "json-file"
      options:
         max-size: "200k"
         max-file: "1"

  kibana:
    image: kibana:6.5.4
    container_name: kibana
    networks:
      - efk
    ports:
      - "5601:5601"
    volumes:
      - /etc/localtime:/etc/localtime
      - /data/docker/EFK/kibana.yml:/usr/share/kibana/config/kibana.yml
    restart: "always"

    logging:
      driver: "json-file"
      options:
         max-size: "200k"
         max-file: "1"

### 注意,以下服务的镜像属于自定义的,请参考上面的给的参考文件。基于nginx-alpine制作。
  efk-proxy:
    image: efk-proxy:latest
    container_name: efk-proxy
    networks:
      - efk
    ports:
      - "9001:9001"
    volumes:
      - /etc/localtime:/etc/localtime
      - "/data/docker/EFK/ssl/demo.com:/data/ssl/demo.com:ro"
    command: ["nginx", "-g", "daemon off;"]
    restart: "always"
    depends_on:
      - kibana
    logging:
      driver: "json-file"
      options:
         max-size: "200k"
         max-file: "1"

启动ES + Kibana + nginx

docker-compose up -d
宿主机会出现端口5601,9001和9200

简单验证下
# curl 172.18.1.222:9200
{
  "name" : "host-elk01",
  "cluster_name" : "EFK",
  "cluster_uuid" : "CMJ4F-E5TcypIhrReze7mQ",
  "version" : {
    "number" : "6.5.4",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "d2ef93d",
    "build_date" : "2018-12-17T21:17:40.758843Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

打开浏览器,输入EFK对应的域名

image

image

部署 fluentd (td-agent)

在需要采集数据的机器安装client

查看是否安装:
rpm -qa|grep td-agent

按系统版本选择操作
https://www.fluentd.org
Installation Guide 提供了安装指引

image

查看本地安装了哪些组件

# td-agent-gem list --local

*** LOCAL GEMS ***

addressable (2.5.2)
elasticsearch (6.1.0)
elasticsearch-api (6.1.0)
elasticsearch-transport (6.1.0)
excon (0.62.0)
faraday (0.15.3)
fluent-config-regexp-type (1.0.0)
fluent-logger (0.7.2)
fluent-plugin-elasticsearch (2.11.11)
fluent-plugin-kafka (0.7.9)
fluent-plugin-record-modifier (1.1.0)
fluent-plugin-rewrite-tag-filter (2.1.0)
fluent-plugin-s3 (1.1.6)
fluent-plugin-td (1.0.0)
fluent-plugin-td-monitoring (0.2.4)
fluentd (1.2.6)
以上由于篇幅有限,仅仅列出一部分,其中包括了后续要使用到的fluent-plugin-elasticsearch

td-agent 配置文件

不同的操作系统位置不一样,centos系统如下
/etc/td-agent/td-agent.conf
默认会带有一些示例配置,此处暂时不作变更,后续会单独篇章讲解。

默认配置文件路径:/etc/td-agent/td-agent.conf
默认日志文件路径:/var/log/td-agent/td-agent.log
可以从这个日志文件中查看td-agent服务运行日志/报错信息

td-agent相关操作指令

设置开自动启动:                              systemctl enable td-agent.service
启动:                                                systemctl start td-agent
重新启动:                                         systemctl restart td-agent.service
热加载配置文件:                              systemctl reload td-agent.service
停止服务:                                         systemctl stop td-agent.service
检查是否设置了开机启动:                systemctl is-enabled td-agent.service 
                                                            enabled:已开启
                                                            disabled:已关闭

启动之后,默认会监听tcp和udp的24224端口