一、Loki特点
二、Loki组件
三、Loki架构图
四、Loki部署
五、与EFK比较
一、Loki特点
1.1 围绕日志标签构建索引,而不是像es一样进行全文索引
1.2 多租户
通过tenant ID实现多租户,如果关闭多租户, 则默认唯一租户为fake
1.3 部署模式
1.1 单进程模式
所有的组件运行在一个进程中,适用于测试环境或者较小的生产环境
1.2 微服务可扩展模式
各组件单独运行,可水平伸缩扩展
二、Loki组件
1. Distributor
负责处理客户端的日志写入,负责接收日志数据,然后将其拆分成多个块,并行的发送给ingester
Distributor通过GRPC协议与Ingester进行通信
2. Hashing
Distributor通过一致性哈希和可配置因子来确定哪些Ingester服务的实例应该接收日志数据
hash基于日志标签和tenant ID
console中的hash ring用于实现一致性hash,所有Ingester都使用自己拥有的一组令牌注册到console中,Distributor通过找到日志hash值最
匹配的令牌并将日志数据发送给该令牌的所有者
3. Ingester
负责将日志数据写入持久化后端(S3,OSS)
Ingester负责所有的日志行有序
Ingester负责所有的日志行按升序排序,如果收到乱序的日志行,将拒绝并报错
来自每一组唯一标签的日志在内存中被构建为“块”,然后被刷新到备份存储后端。
如果ingester进程奔溃,内存中构建的块的数据未刷写到磁盘,则会丢失
4. Querier
LogQL
首先尝试查询所有Ingester的内存数据,然后再从后端存储加载数据。
5. Chunk Store
块存储是Loki长期数据存储,支持交互式查询和持续写入
包含如下:
1.1 块索引
1.2 块数据本身的键值存储
注意: 块存储不是单独的服务,而是嵌入到需要访问的Loki数据的服务中的库:Querier和Ingester
三、Loki架构图
数据写入:
1.1 Distributor负责接收日志数据,然后拆分为多个块,并行的发送给Ingester
1.2 Ingester接收Distributor发送的数据块,缓存在内存中, 同时定时刷写进持久化存储Chunk Store中
数据查询
Ingester接收Querier查询请求,根据块索引查询指定的块,如果内存中没有,将从持久化存储chunk Store中查找数据,并返回
四、Loki部署
本次部署使用单进程模式进行部署,通过复用“阿果阿郭”老师的部署方式进行单进程部署,仅作为学习复习使用,原文链接:k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩
官网部署方式有:
- Install using Tanka (recommended)
- Install through Helm
- Install through Docker or Docker Compose
- Install and run locally
- Install from source
请根据需要自行参考学习
部署如下:
4.1 安装supervisor
安装supervisor
yum install epel-release -y
yum install supervisor -y修改内存、进程、文件限制
sed -i '/forking/a LimitNOFILE=65536' /usr/lib/systemd/system/supervisord.service;
sed -i '/forking/a LimitNPROC=65536' /usr/lib/systemd/system/supervisord.service ;
sed -i '/forking/a LimitMEMLOCK=infinity' /usr/lib/systemd/system/supervisord.service ;
启动服务
systemctl start supervisord.service
4.2 安装Loki
上传loki-linux-amd64.zip压缩包到/data/loki目录
解压文件
unzip loki-linux-amd64.zip验证版本
./loki-linux-amd64 --version
systemd管理Loki
cat <<EOF > /usr/lib/systemd/system/loki.service
[Unit]
Description=loki.service
After=rc-local.service nss-user-lookup.target[Service]
Type=simple
LimitMEMLOCK=infinity
LimitNPROC=65536
LimitNOFILE=65536
WorkingDirectory=/data/loki
ExecStart=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml[Install]
WantedBy=multi-user.target
EOF
supervisord管理Loki
cat <<EOF> /etc/supervisord.d/loki.ini
[program:loki]
command=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml
autorestart=true
autostart=true
stderr_logfile=/tmp/loki_err.log
stdout_logfile=/tmp/loki_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/loki
EOF
配置Loki文件
cat <<EOF> /data/loki/loki-local-config.yaml
auth_enabled: false #是否启用认证。这里认证是针对多租户而言,这里我们使用单租户server:
http_listen_port: 3100
grpc_server_max_concurrent_streams: 0ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h
max_chunk_age: 1h
chunk_target_size: 10485760
chunk_retain_period: 30s
max_transfer_retries: 0schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
# 存储配置
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache #定义缓存地址
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks #定义块地址compactor:
working_directory: /data/loki/boltdb-shipper-compactor #压缩位置
shared_store: filesystemlimits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 200
# ingestion_burst_size_mb: 400
# max_streams_per_user: 0
# max_chunks_per_query: 20000000
# max_query_parallelism: 140
# max_query_series: 5000
# cardinality_limit: 1000000
# max_streams_matchers_per_query: 10000chunk_store_config:
max_look_back_period: 0s# 数据保留时间
table_manager:
retention_deletes_enabled: true
retention_period: 24hruler:
storage:
type: local
local:
directory: /data/loki/rules
rule_path: /data/loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: trueEOF
启动Loki
supervisorctl status
supervisorctl update
supervisorctl status
4.3 安装Promtail
mkdir /data/promtail/{bin,config,logs} -p
cd /data/promtail/bin
curl -O -L "https://github.com/grafana/loki/releases/download/v2.3.0/promtail-linux-amd64.zip"
unzip "promtail-linux-amd64.zip"
chmod a+x "promtail-linux-amd64"
配置文件
cat << EOF > /data/promtail/config/promtail.conf
server: #promtail服务的server配置
http_listen_address: 0.0.0.0
http_listen_port: 19080
grpc_listen_port: 0positions:
filename: ./logs/loki_positions.yaml
ignore_invalid_yaml: trueclients: #定义Loki服务的地址
- url: http://127.0.0.1:3100/loki/api/v1/pushscrape_configs:
- job_name: service_log
file_sd_configs: #定义抓取的日志,通过文件实现服务发现
- files:
- ./config/*.yaml
refresh_interval: 1m
EOF
配置supervisor管理程序
cat << EOF > /etc/supervisord.d/promtail.ini
[program:promtail]
command=/data/promtail/bin/promtail-linux-amd64 -config.expand-env=true -config.file=/data/promtail/config/promtail.conf
autorestart=true
autostart=true
stderr_logfile=/tmp/promtail_err.log
stdout_logfile=/tmp/promtail_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/promtail/
EOF
定义收集日志配置
cat << EOF > /data/promtail/config/varlogmessage.yaml
- targets:
- localhost
labels:
__path__: /var/log/messages
env: {{ENV}}
hostname: {{BINDIP}}
service_name: var-log-messages
log_type: var-log-messages
- targets:
- localhost
labels:
__path__: /var/log/secure
env: {{ENV}}
hostname: {{BINDIP}}
service_name: var-log-secure
log_type: var-log-secure
EOF注意: env中变量使用的jinja2的语法
ENV=test
BINDIP=192.168.161.118
sed -i "s/{{ENV}}/$ENV/g" /data/promtail/config/varlogmessage.yaml
sed -i "s/{{BINDIP}}/$BINDIP/g" /data/promtail/config/varlogmessage.yaml启动promtail
supervisorctl status
supervisorctl update
supervisorctl status
验证Loki是否收集到日志
curl 127.0.0.1:3100/loki/api/v1/labels
curl 127.0.0.1:3100/loki/api/v1/label/service_name/values
curl 127.0.0.1:3100/loki/api/v1/label/filename/values
4.4 安装grafana
1.1 下载grafana二进制包
下载地址:wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.14.linux-amd64.tar.gz
建议国内下载:
https://mirrors.huaweicloud.com/grafana/8.5.9/grafana-enterprise-8.5.9.linux-amd64.tar.gz tar xf grafana-enterprise-8.5.9.linux-amd64.tar.gz -C /data
cd /data/
mv grafana-8.5.9/ grafana
配置supervisor管理grafana
cat <<EOF> /etc/supervisord.d/grafana.ini
[program:grafana]
command=/data/grafana/bin/grafana-server web
autorestart=true
autostart=true
stderr_logfile=/tmp/grafana_err.log
stdout_logfile=/tmp/grafana_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/grafana
EOF
启动grafana
supervisorctl status
supervisorctl update
supervisorctl status添加loki数据源
通过 Explore 查看 loki 数据
导入grafana loki dashboard 查看数据{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 8,
"iteration": 1655978337467,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${ENV}",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 5,
"w": 24,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "8.1.5",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum (count_over_time({service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"}[2m] )) by (hostname)",
"hide": true,
"legendFormat": "",
"queryType": "randomWalk",
"refId": "A"
},
{
"expr": "sum (count_over_time({service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"}[2m] )) by (hostname,filename)",
"hide": false,
"legendFormat": "{{hostname}}/{{filename}}",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "日志量统计",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:319",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:320",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"datasource": "${ENV}",
"description": "",
"gridPos": {
"h": 21,
"w": 24,
"x": 0,
"y": 5
},
"id": 2,
"options": {
"dedupStrategy": "exact",
"enableLogDetails": false,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": true
},
"pluginVersion": "7.4.3",
"targets": [
{
"expr": "{service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"} |~ \"(?i)$log_level\"",
"maxLines": 1000,
"queryType": "randomWalk",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "日志",
"transparent": true,
"type": "logs"
}
],
"refresh": false,
"schemaVersion": 30,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "crm-cd",
"value": "crm-cd"
},
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "选择环境",
"multi": false,
"name": "ENV",
"options": [],
"query": "loki",
"queryValue": "",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
},
{
"allValue": null,
"current": {
"selected": true,
"text": "neo-pharma-service",
"value": "neo-pharma-service"
},
"datasource": "${ENV}",
"definition": "label_values({service_name=~\".+\"},service_name)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "服务名",
"multi": false,
"name": "app_name",
"options": [],
"query": "label_values({service_name=~\".+\"},service_name)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"allValue": null,
"current": {
"selected": false,
"text": "/logs/gc.log",
"value": "/logs/gc.log"
},
"datasource": "${ENV}",
"definition": "label_values({service_name=\"$app_name\"}, filename)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "日志名",
"multi": false,
"name": "log_type",
"options": [],
"query": "label_values({service_name=\"$app_name\"}, filename)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
},
{
"allValue": ".*",
"current": {
"selected": true,
"text": "neo-pharma-service-7c87d876d5-js77h",
"value": "neo-pharma-service-7c87d876d5-js77h"
},
"datasource": "${ENV}",
"definition": "label_values({service_name=\"$app_name\",filename=\"$log_type\"}, hostname)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "主机名",
"multi": false,
"name": "hostname",
"options": [],
"query": "label_values({service_name=\"$app_name\",filename=\"$log_type\"}, hostname)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": "(^\\\\S|^\\\\s)",
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"description": "可以直接输入搜索的关键字进行过滤",
"error": null,
"hide": 0,
"includeAll": true,
"label": "关键字过滤",
"multi": false,
"name": "log_level",
"options": [
{
"selected": true,
"text": "All",
"value": "$__all"
},
{
"selected": false,
"text": "warning",
"value": "warning"
},
{
"selected": false,
"text": "unknown",
"value": "unknown"
},
{
"selected": false,
"text": "info",
"value": "info"
},
{
"selected": false,
"text": "error",
"value": "error"
},
{
"selected": false,
"text": "直接输入关键字搜索",
"value": "直接输入关键字搜索"
}
],
"query": "warning,unknown,info,error,直接输入关键字搜索",
"queryValue": "",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "日志中心",
"uid": "NlV_8QD7k",
"version": 21
}
效果图
4.5 安装alertmanager
cd /data
tar xf alertmanager-0.24.0.linux-amd64.tar.gz
mv alertmanager-0.24.0.linux-amd64 alertmanager
配置supervisor管理alertmanager
cat <<EOF> /etc/supervisord.d/alertmanager.ini
[program:alertmanager]
command=/data/alertmanager/alertmanager
autorestart=true
autostart=true
stderr_logfile=/tmp/alertmanager_err.log
stdout_logfile=/tmp/alertmanager_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/alertmanager
EOF
配置alertmanager配置文件
cat <<EOF> /data/alertmanager/alertmanager.yml
global:
smtp_smarthost: 'smtp.qq.com:465' # smtp地址
smtp_from: '4506259@qq.com' # 谁发邮件
smtp_auth_username: '4507259@qq.com' # 邮箱用户
smtp_auth_password: 'gbrqbrcace' # 邮箱密码
smtp_require_tls: falsetemplates:
- '/usr/local/alertmanager/template/*.tmpl'route:
group_by: ["instance"] # 分组名
group_wait: 30s # 当收到告警的时候,等待三十秒看是否还有告警,如果有就一起发出去
group_interval: 5m # 发送警告间隔时间
repeat_interval: 3h # 重复报警的间隔时间
receiver: mail # 全局报警组,这个参数是必选的,和下面报警组名要相同receivers:
- name: 'mail' # 报警组名
email_configs:
- to: '187171160@163.com' # 发送给谁
send_resolved: true #
EOF配置警报规则
cat <<'EOF'> /data/loki/rules/fake/rules.yaml
groups:
- name: service OutOfMemoryError
rules:
# 关键字监控
- alert: loki check words java.lang.OutOfMemoryError
expr: sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |= "java.lang.OutOfMemoryError" [5m]) > 0)
labels:
severity: critical
annotations:
description: '{{$labels.env}} {{$labels.hostname}} file {{$labels.filename}} has {{ $value }} error'
summary: java.lang.OutOfMemoryError
# java 程序日志性能报警
- alert: loki java full gc count check
expr: sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |= "Full GC (Allocation" [5m]) > 5)
labels:
severity: warning
annotations:
description: '{{$labels.env}} {{$labels.hostname}} {{$labels.filename}} {{ $value }}'
summary: java full gc count check
# 使用正则表达式报警匹配示例
- alert: dbperform slowlog sql 慢查询
expr: 'sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |~ "time: [1-9]\\d{4,}" [5m]) > 5)'
labels:
severity: warning
annotations:
description: '{{$labels.env}} {{$labels.hostname}} file {{$labels.filename}} has {{ $value }} error'
summary: sql slowlog
EOF测试警报
echo 'The String object java.lang.OutOfMemoryError is used to represent and manipulate a sequence of characters.' >> /var/log/messages`
五、与EFK比较
EFK:
1.1 Elasticsearch中的数据作为非结构化JSON对象存储在磁盘上。每个对象的键和每个键的内容都被索引。
然后可以使用JSON对象定义查询(称为查询DSL)或通过Lucene查询语言查询数据。
1.2 EFK使用fluentd作为日志收集器
Loki:
1.1 单进程模式将日志数据存储到磁盘中,微服务可扩展模式将数据存储在云存储中。日志通过标记标签,仅只有标签被索引,索引更少,成本更低
1.2 Loki使用promtail作为日志收集器。通过发现存储在磁盘上的日志文件, 并将它们与标签做关联,然后转发给Loki
Promtail可以充当Pod 的sidecar进行Pod的日志收集,以及从指定文件中读取日志、跟踪系统日志
参考文档:
k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩
Getting started | Grafana Loki documentation