一、Loki特点

二、Loki组件

三、Loki架构图

四、Loki部署

五、与EFK比较


一、Loki特点

1.1 围绕日志标签构建索引,而不是像es一样进行全文索引
1.2 多租户
    通过tenant ID实现多租户,如果关闭多租户, 则默认唯一租户为fake
1.3 部署模式
    1.1 单进程模式
        所有的组件运行在一个进程中,适用于测试环境或者较小的生产环境
    1.2 微服务可扩展模式
        各组件单独运行,可水平伸缩扩展         

二、Loki组件

1.    Distributor
    负责处理客户端的日志写入,负责接收日志数据,然后将其拆分成多个块,并行的发送给ingester
    Distributor通过GRPC协议与Ingester进行通信
2. Hashing
    Distributor通过一致性哈希和可配置因子来确定哪些Ingester服务的实例应该接收日志数据
    hash基于日志标签和tenant ID
    console中的hash ring用于实现一致性hash,所有Ingester都使用自己拥有的一组令牌注册到console中,Distributor通过找到日志hash值最
    匹配的令牌并将日志数据发送给该令牌的所有者
3. Ingester
    负责将日志数据写入持久化后端(S3,OSS)
    Ingester负责所有的日志行有序
    Ingester负责所有的日志行按升序排序,如果收到乱序的日志行,将拒绝并报错
    来自每一组唯一标签的日志在内存中被构建为“块”,然后被刷新到备份存储后端。
    如果ingester进程奔溃,内存中构建的块的数据未刷写到磁盘,则会丢失
4. Querier
    LogQL
    首先尝试查询所有Ingester的内存数据,然后再从后端存储加载数据。
5. Chunk Store
    块存储是Loki长期数据存储,支持交互式查询和持续写入
    包含如下:
    1.1 块索引
    1.2 块数据本身的键值存储
    注意: 块存储不是单独的服务,而是嵌入到需要访问的Loki数据的服务中的库:Querier和Ingester

三、Loki架构图

 

从分布式系统解读Openstack 分布式部署loki_从分布式系统解读Openstack

 数据写入:

1.1 Distributor负责接收日志数据,然后拆分为多个块,并行的发送给Ingester

1.2 Ingester接收Distributor发送的数据块,缓存在内存中, 同时定时刷写进持久化存储Chunk Store中

数据查询

Ingester接收Querier查询请求,根据块索引查询指定的块,如果内存中没有,将从持久化存储chunk Store中查找数据,并返回

四、Loki部署

本次部署使用单进程模式进行部署,通过复用“阿果阿郭”老师的部署方式进行单进程部署,仅作为学习复习使用,原文链接:k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩

 官网部署方式有:

  • Install using Tanka (recommended)
  • Install through Helm
  • Install through Docker or Docker Compose
  • Install and run locally
  • Install from source

请根据需要自行参考学习

 部署如下:

4.1 安装supervisor

安装supervisor
yum install epel-release -y
yum install supervisor -y

修改内存、进程、文件限制
sed -i '/forking/a LimitNOFILE=65536' /usr/lib/systemd/system/supervisord.service;
sed -i '/forking/a LimitNPROC=65536' /usr/lib/systemd/system/supervisord.service ;
sed -i '/forking/a LimitMEMLOCK=infinity' /usr/lib/systemd/system/supervisord.service ;


启动服务

systemctl start supervisord.service

4.2 安装Loki

上传loki-linux-amd64.zip压缩包到/data/loki目录
解压文件
unzip loki-linux-amd64.zip

验证版本
./loki-linux-amd64 --version


systemd管理Loki
cat <<EOF > /usr/lib/systemd/system/loki.service
[Unit]
Description=loki.service
After=rc-local.service nss-user-lookup.target

[Service]
Type=simple
LimitMEMLOCK=infinity
LimitNPROC=65536
LimitNOFILE=65536
WorkingDirectory=/data/loki
ExecStart=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml

[Install]
WantedBy=multi-user.target
EOF


supervisord管理Loki
cat <<EOF> /etc/supervisord.d/loki.ini
[program:loki]
command=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml
autorestart=true
autostart=true
stderr_logfile=/tmp/loki_err.log
stdout_logfile=/tmp/loki_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/loki
EOF


 配置Loki文件
cat <<EOF> /data/loki/loki-local-config.yaml
auth_enabled: false            #是否启用认证。这里认证是针对多租户而言,这里我们使用单租户

server:
  http_listen_port: 3100
  grpc_server_max_concurrent_streams: 0

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h      
  max_chunk_age: 1h        
  chunk_target_size: 10485760 
  chunk_retain_period: 30s    
  max_transfer_retries: 0     

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
# 存储配置
storage_config:
  boltdb_shipper:
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache                #定义缓存地址
    cache_ttl: 24h          
    shared_store: filesystem
  filesystem:
    directory: /data/loki/chunks        #定义块地址

compactor:
  working_directory: /data/loki/boltdb-shipper-compactor        #压缩位置
  shared_store: filesystem

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 200
  # ingestion_burst_size_mb: 400
  # max_streams_per_user: 0
  # max_chunks_per_query: 20000000
  # max_query_parallelism: 140
  # max_query_series: 5000
  # cardinality_limit: 1000000
  # max_streams_matchers_per_query: 10000

chunk_store_config:
  max_look_back_period: 0s

# 数据保留时间
table_manager:
  retention_deletes_enabled: true
  retention_period: 24h

ruler:
  storage:
    type: local
    local:
      directory: /data/loki/rules
  rule_path: /data/loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

EOF


启动Loki
supervisorctl status
supervisorctl update
supervisorctl status

4.3 安装Promtail

mkdir /data/promtail/{bin,config,logs} -p 
cd /data/promtail/bin
curl -O -L "https://github.com/grafana/loki/releases/download/v2.3.0/promtail-linux-amd64.zip"
unzip "promtail-linux-amd64.zip"
chmod a+x "promtail-linux-amd64" 


配置文件

cat << EOF > /data/promtail/config/promtail.conf
server:      #promtail服务的server配置
  http_listen_address: 0.0.0.0
  http_listen_port: 19080
  grpc_listen_port: 0

positions:
  filename: ./logs/loki_positions.yaml
  ignore_invalid_yaml: true

clients:        #定义Loki服务的地址
  - url: http://127.0.0.1:3100/loki/api/v1/push

scrape_configs:
- job_name: service_log
  file_sd_configs:        #定义抓取的日志,通过文件实现服务发现
    - files:
      - ./config/*.yaml
      refresh_interval: 1m
EOF


配置supervisor管理程序

cat << EOF > /etc/supervisord.d/promtail.ini
[program:promtail]
command=/data/promtail/bin/promtail-linux-amd64 -config.expand-env=true -config.file=/data/promtail/config/promtail.conf
autorestart=true
autostart=true
stderr_logfile=/tmp/promtail_err.log
stdout_logfile=/tmp/promtail_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/promtail/
EOF 


定义收集日志配置

cat << EOF > /data/promtail/config/varlogmessage.yaml
- targets:
    - localhost
  labels:
    __path__: /var/log/messages
    env: {{ENV}}
    hostname: {{BINDIP}}
    service_name: var-log-messages
    log_type: var-log-messages
- targets:
    - localhost
  labels:
    __path__: /var/log/secure
    env: {{ENV}}
    hostname: {{BINDIP}}
    service_name: var-log-secure
    log_type: var-log-secure
EOF 

注意: env中变量使用的jinja2的语法
ENV=test
BINDIP=192.168.161.118
sed -i "s/{{ENV}}/$ENV/g" /data/promtail/config/varlogmessage.yaml
sed -i "s/{{BINDIP}}/$BINDIP/g" /data/promtail/config/varlogmessage.yaml

启动promtail

supervisorctl status
supervisorctl update
supervisorctl status


验证Loki是否收集到日志
curl 127.0.0.1:3100/loki/api/v1/labels 
curl 127.0.0.1:3100/loki/api/v1/label/service_name/values
curl 127.0.0.1:3100/loki/api/v1/label/filename/values

4.4 安装grafana

 1.1 下载grafana二进制包
下载地址:wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.14.linux-amd64.tar.gz
建议国内下载:
https://mirrors.huaweicloud.com/grafana/8.5.9/grafana-enterprise-8.5.9.linux-amd64.tar.gz tar xf grafana-enterprise-8.5.9.linux-amd64.tar.gz -C /data
cd /data/
mv grafana-8.5.9/ grafana


配置supervisor管理grafana

cat <<EOF> /etc/supervisord.d/grafana.ini
[program:grafana]
command=/data/grafana/bin/grafana-server web
autorestart=true
autostart=true
stderr_logfile=/tmp/grafana_err.log
stdout_logfile=/tmp/grafana_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/grafana
EOF


启动grafana

supervisorctl status
supervisorctl update
supervisorctl status

添加loki数据源
通过 Explore 查看 loki 数据
导入grafana loki dashboard 查看数据

 {
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 8,
  "iteration": 1655978337467,
  "links": [],
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${ENV}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 5,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 4,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "rightSide": true,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "8.1.5",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum (count_over_time({service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"}[2m] )) by (hostname)",
          "hide": true,
          "legendFormat": "",
          "queryType": "randomWalk",
          "refId": "A"
        },
        {
          "expr": "sum (count_over_time({service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"}[2m] )) by (hostname,filename)",
          "hide": false,
          "legendFormat": "{{hostname}}/{{filename}}",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "日志量统计",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:319",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:320",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "datasource": "${ENV}",
      "description": "",
      "gridPos": {
        "h": 21,
        "w": 24,
        "x": 0,
        "y": 5
      },
      "id": 2,
      "options": {
        "dedupStrategy": "exact",
        "enableLogDetails": false,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": true
      },
      "pluginVersion": "7.4.3",
      "targets": [
        {
          "expr": "{service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"} |~ \"(?i)$log_level\"",
          "maxLines": 1000,
          "queryType": "randomWalk",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "日志",
      "transparent": true,
      "type": "logs"
    }
  ],
  "refresh": false,
  "schemaVersion": 30,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "crm-cd",
          "value": "crm-cd"
        },
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": false,
        "label": "选择环境",
        "multi": false,
        "name": "ENV",
        "options": [],
        "query": "loki",
        "queryValue": "",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "allValue": null,
        "current": {
          "selected": true,
          "text": "neo-pharma-service",
          "value": "neo-pharma-service"
        },
        "datasource": "${ENV}",
        "definition": "label_values({service_name=~\".+\"},service_name)",
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": false,
        "label": "服务名",
        "multi": false,
        "name": "app_name",
        "options": [],
        "query": "label_values({service_name=~\".+\"},service_name)",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
      {
        "allValue": null,
        "current": {
          "selected": false,
          "text": "/logs/gc.log",
          "value": "/logs/gc.log"
        },
        "datasource": "${ENV}",
        "definition": "label_values({service_name=\"$app_name\"}, filename)",
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": false,
        "label": "日志名",
        "multi": false,
        "name": "log_type",
        "options": [],
        "query": "label_values({service_name=\"$app_name\"}, filename)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "allValue": ".*",
        "current": {
          "selected": true,
          "text": "neo-pharma-service-7c87d876d5-js77h",
          "value": "neo-pharma-service-7c87d876d5-js77h"
        },
        "datasource": "${ENV}",
        "definition": "label_values({service_name=\"$app_name\",filename=\"$log_type\"}, hostname)",
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": false,
        "label": "主机名",
        "multi": false,
        "name": "hostname",
        "options": [],
        "query": "label_values({service_name=\"$app_name\",filename=\"$log_type\"}, hostname)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": "(^\\\\S|^\\\\s)",
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "description": "可以直接输入搜索的关键字进行过滤",
        "error": null,
        "hide": 0,
        "includeAll": true,
        "label": "关键字过滤",
        "multi": false,
        "name": "log_level",
        "options": [
          {
            "selected": true,
            "text": "All",
            "value": "$__all"
          },
          {
            "selected": false,
            "text": "warning",
            "value": "warning"
          },
          {
            "selected": false,
            "text": "unknown",
            "value": "unknown"
          },
          {
            "selected": false,
            "text": "info",
            "value": "info"
          },
          {
            "selected": false,
            "text": "error",
            "value": "error"
          },
          {
            "selected": false,
            "text": "直接输入关键字搜索",
            "value": "直接输入关键字搜索"
          }
        ],
        "query": "warning,unknown,info,error,直接输入关键字搜索",
        "queryValue": "",
        "skipUrlSync": false,
        "type": "custom"
      }
    ]
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "日志中心",
  "uid": "NlV_8QD7k",
  "version": 21
}

 效果图

从分布式系统解读Openstack 分布式部署loki_从分布式系统解读Openstack_02

4.5 安装alertmanager

 cd /data
tar xf alertmanager-0.24.0.linux-amd64.tar.gz
mv alertmanager-0.24.0.linux-amd64 alertmanager 
配置supervisor管理alertmanager
cat <<EOF> /etc/supervisord.d/alertmanager.ini
[program:alertmanager]
command=/data/alertmanager/alertmanager
autorestart=true
autostart=true
stderr_logfile=/tmp/alertmanager_err.log
stdout_logfile=/tmp/alertmanager_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/alertmanager
EOF


配置alertmanager配置文件

cat <<EOF> /data/alertmanager/alertmanager.yml

global:
  smtp_smarthost: 'smtp.qq.com:465'       # smtp地址
  smtp_from: '4506259@qq.com'                # 谁发邮件
  smtp_auth_username: '4507259@qq.com'       # 邮箱用户
  smtp_auth_password: 'gbrqbrcace'                   # 邮箱密码
  smtp_require_tls: false

templates:
- '/usr/local/alertmanager/template/*.tmpl'

route:
  group_by: ["instance"]            # 分组名
  group_wait: 30s                   # 当收到告警的时候,等待三十秒看是否还有告警,如果有就一起发出去
  group_interval: 5m                # 发送警告间隔时间
  repeat_interval: 3h               # 重复报警的间隔时间
  receiver: mail                    # 全局报警组,这个参数是必选的,和下面报警组名要相同

receivers:
- name: 'mail'                      # 报警组名
  email_configs:
  - to: '187171160@163.com'      # 发送给谁
    send_resolved: true            #
EOF

 配置警报规则

cat <<'EOF'> /data/loki/rules/fake/rules.yaml
groups:
  - name: service OutOfMemoryError
    rules:
      # 关键字监控
      - alert: loki check words java.lang.OutOfMemoryError
        expr: sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |= "java.lang.OutOfMemoryError" [5m]) > 0)
        labels:
          severity: critical
        annotations:
          description: '{{$labels.env}} {{$labels.hostname}} file {{$labels.filename}} has  {{ $value }} error'
          summary: java.lang.OutOfMemoryError
      # java 程序日志性能报警
      - alert: loki java full gc count check
        expr: sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |= "Full GC (Allocation" [5m]) > 5)
        labels:
          severity: warning
        annotations:
          description: '{{$labels.env}} {{$labels.hostname}} {{$labels.filename}} {{ $value }}'
          summary: java full gc count check
      # 使用正则表达式报警匹配示例
      - alert: dbperform slowlog sql 慢查询
        expr: 'sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |~ "time: [1-9]\\d{4,}" [5m]) > 5)'
        labels:
          severity: warning
        annotations:
          description: '{{$labels.env}} {{$labels.hostname}} file {{$labels.filename}} has  {{ $value }} error'
          summary: sql slowlog
EOF

 测试警报

echo 'The String object java.lang.OutOfMemoryError is used to represent and manipulate a sequence of characters.' >> /var/log/messages`

 

从分布式系统解读Openstack 分布式部署loki_sed_03

五、与EFK比较

EFK:
1.1 Elasticsearch中的数据作为非结构化JSON对象存储在磁盘上。每个对象的键和每个键的内容都被索引。
     然后可以使用JSON对象定义查询(称为查询DSL)或通过Lucene查询语言查询数据。
1.2 EFK使用fluentd作为日志收集器
Loki:
1.1 单进程模式将日志数据存储到磁盘中,微服务可扩展模式将数据存储在云存储中。日志通过标记标签,仅只有标签被索引,索引更少,成本更低
1.2 Loki使用promtail作为日志收集器。通过发现存储在磁盘上的日志文件, 并将它们与标签做关联,然后转发给Loki
    Promtail可以充当Pod 的sidecar进行Pod的日志收集,以及从指定文件中读取日志、跟踪系统日志

 参考文档:

k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩

Getting started | Grafana Loki documentation