Loki日志监控部署与配置

参考:semaik博主的博客

一. Loki简介

  • Loki相比EFK/ELK,它不对原始日志进行索引,只对日志的标签进行索引,而日志通过压缩进行存储,通常是文件系统存储,所以其操作成本更低,数量级效率更高
  • 由于Loki的存储都是基于文件系统的,所以它的日志搜索时基于内容即日志行中的文本,所以它的查询支持LogQL,在搜索窗口中通过过滤标签的方式进行搜索和查询
  • Loki分两部分,Loki是日志引擎部分,Promtail是收集日志端,然后通过Grafana进行展示
  • 必须要使用高版本的grafana否则会出现无法explore查询不出数据的错误
  • 本章前面部分只用一台服务器部署,如需收集其他节点日志,只需在被收集端部署Promtail Agent即可  见文档末尾扩展部分

Promtail: 代理,负责收集日志并将其发送给 loki

Loki: 日志记录引擎,负责存储日志和处理查询

Grafana: UI 界面

prometheus集成此处略

拖补图

image-20231103115049313

二. Docker-Compose部署

1. 配置文件获取:

2. docker-compose.yml配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
version: "3"

services:
loki:
container_name: loki
image: grafana/loki:2.4.0
restart: always
ports:
- 3100:3100
volumes:
- ./loki-local-config.yaml:/etc/loki/loki-local-config.yaml
- /etc/localtime:/etc/localtime
command: -config.file=/etc/loki/loki-local-config.yaml
networks:
- loki

promtail:
container_name: promtail
image: grafana/promtail:2.4.0
restart: always
depends_on:
- loki
volumes:
- /var/log:/var/log
- ./promtail-local-config.yaml:/etc/promtail/promtail-local-config.yaml
- /etc/localtime:/etc/localtime
command: -config.file=/etc/promtail/promtail-local-config.yaml
networks:
- loki

grafana:
container_name: grafana
image: grafana/grafana:8.5.0
restart: always
depends_on:
- loki
- promtail
ports:
- "3200:3000"
volumes:
# - grafana-storage:/var/lib/grafana (可选)
# - ./grafana.ini:/etc/grafana/grafana.ini (可选)
- /etc/localtime:/etc/localtime
networks:
- loki

networks:
loki:
driver: bridge

NOTE: 使用grafana报警时可选renderer

3. loki配置文件loki-local-config.yaml留存

  • 官方下载的文件默认即可,不用修改
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
auth_enabled: false

server:
http_listen_port: 3100
# 新增-------------------------------
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
# -------------------------------
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory

schema_config:
configs:
- from: 2023-11-04
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h # 每张表的时间范围7天

# 新增--------------------------------
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h

chunk_store_config:
max_look_back_period: 672h # 最大可查询历史日期 28天,这个时间必须是schema_config中的period的倍数,否则报错。
table_manager:
retention_deletes_enabled: true
retention_period: 672h # 表的保留期28天
# -------------------------------------

ruler:
alertmanager_url: http://localhost:9093

4. Promtail配置文件promtail-local-config.yaml留存及解析(此处配置本机)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /tmp/positions.yaml

clients:
# 本机直接修改为http://loki:3100/loki/api/v1/push即可
- url: http://loki:3100/loki/api/v1/push
# 其他节点指定loki节点IP - url: http://10.0.0.10:3100/loki/api/v1/push
# 这里要修改为loki的ip

scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost # 代表收集promtail本机的日志目录
labels:
# 将会作为索引查询,除了"__path__"外,其他的"key: value"可以自定义
job: varlogs
# 收集日志的目录
__path__: /var/log/*log

# apt日志 构建完成后使用apt安装一个软件 用于测试是否成功采集到日志
- job_name: test
static_configs:
- targets:
- localhost
labels:
job: "test"
__path__: /var/log/apt/*.log

# 其他服务日志实例,可按照以下模板修改
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: "70.60-nginx"
__path__: /var/log/nginx/*log

- job_name: db
static_configs:
- targets:
- localhost
labels:
job: "70.60-mysql"
__path__: /var/log/mysqld.log

prometheus集成时只需将下方配置加入docker-compose.yml文件即可(可选)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
prometheus:
image: prom/prometheus:latest
restart: always
container_name: prometheus
hostname: prometheus
environment:
TZ: Asia/Shanghai
ports:
- 9090:9090
user: root
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./data/monitor/prometheus:/prometheus/data:rw
#- /data/monitor/prometheus:/prometheus/data:rw
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=15d"
#- "--web.console.libraries=/usr/share/prometheus/console_libraries"
#- "--web.console.templates=/usr/share/prometheus/consoles"
#- "--enable-feature=remote-write-receiver"
#- "--query.lookback-delta=2m"
networks:
- loki

prometheus.yml配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
global:
scrape_interval: 15s
evaluation_interval: 15s
# 告警(一般不用)
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# 告警规则
rule_files:
- rules/*.yml
# 监控插件
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
# telegraf采集插件
- job_name: 'loki'
static_configs:
- targets: ['loki:3100']

prometheus集成效果

访问:http://10.0.0.10:3100/metrics查看监控数据

image-20231115195351315

prometheus效果:

image-20231115203535680

image-20231103151419666

三. 扩展(SSL以及集群版本普通节点部署)

①. grafana更换SSL

报错:ERR_SSL_PROTOCOL_ERROR

1
2
# 将grafana配置文件cp到本地
docker cp grafana:/etc/grafana/grafana.ini /opt/loki

更改grafana配置文件granfan.ini后重启即可使用自己的SSL证书

image-20231103150610020

②. 集群版本

规划:
IP 角色
10.0.0.10 master (管理节点 部署:loki Promtail-agent grafana 此处见四以上文档部分)
10.0.0.11 普通节点(部署Promtail-agent
普通节点Promtail-agent docker-compose.yml配置
1
2
3
4
5
6
7
8
9
10
11
12
version: "3"

services:
promtail:
container_name: promtail
image: grafana/promtail:2.4.0
restart: always
volumes:
- /var/log:/var/log
- ./promtail-local-config.yaml:/etc/promtail/promtail-local-config.yaml
- /etc/localtime:/etc/localtime
command: -config.file=/etc/promtail/promtail-local-config.yaml
promtail-local-config.yaml 配置指向管理节点loki
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /tmp/positions.yaml

clients:
# 本机直接修改为http://loki:3100/loki/api/v1/push即可
# 此处指向部署节点loki 要修改为loki节点的ip loki节点ip为10.0.0.10
- url: http://10.0.0.10:3100/loki/api/v1/push

scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost # 代表收集promtail本机的日志目录
labels:
# 将会作为索引查询,除了"__path__"外,其他的"key: value"可以自定义
job: "10.0.0.11_varlogs"
# 收集日志的目录
__path__: /var/log/*log

# apt日志 构建完成后使用apt安装一个软件 用于测试是否成功采集到日志
- job_name: test
static_configs:
- targets:
- localhost
labels:
job: "10.0.0.11_test"
__path__: /var/log/yum.log

# 其他服务日志实例,可按照以下模板修改
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: "70.60-nginx"
__path__: /var/log/nginx/*log

- job_name: db
static_configs:
- targets:
- localhost
labels:
job: "70.60-mysql"
__path__: /var/log/mysqld.log
安装lrzsz后测试dashboard效果yum install -y lrzsz此时标签为10.0.0.10_test下监控日志文件/var/log/yum.log详情出现在面板中。
grafana 13639 dashboard效果:

image-20231105144004801

安装lrzsz后:

image-20231105144342634

普通节点部署完成。

四. 当监控nginx时需要修改nginx的日志格式( 测试未成功-忽略

  • Note: 此处nginx为编译部署

  • 参考-轻量级日志可视化平台Grafana Loki接入nginx访问日志-腾讯云开发者社区-腾讯云 (tencent.com)

  • grafana官网

  • 执行修改/usr/local/nginx/conf/nginx.conf

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    http {
    include mime.types;
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
    '$status $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for"';
    log_format json_analytics escape=json '{'
    '"msec": "$msec", ' # request unixtime in seconds with a milliseconds resolution
    '"connection": "$connection", ' # connection serial number
    '"connection_requests": "$connection_requests", ' # number of requests made in connection
    '"pid": "$pid", ' # process pid
    '"request_id": "$request_id", ' # the unique request id
    '"request_length": "$request_length", ' # request length (including headers and body)
    '"remote_addr": "$remote_addr", ' # client IP
    '"remote_user": "$remote_user", ' # client HTTP username
    '"remote_port": "$remote_port", ' # client port
    '"time_local": "$time_local", '
    '"time_iso8601": "$time_iso8601", ' # local time in the ISO 8601 standard format
    '"request": "$request", ' # full path no arguments if the request
    '"request_uri": "$request_uri", ' # full path and arguments if the request
    '"args": "$args", ' # args
    '"status": "$status", ' # response status code
    '"body_bytes_sent": "$body_bytes_sent", ' # the number of body bytes exclude headers sent to a client
    '"bytes_sent": "$bytes_sent", ' # the number of bytes sent to a client
    '"http_referer": "$http_referer", ' # HTTP referer
    '"http_user_agent": "$http_user_agent", ' # user agent
    '"http_x_forwarded_for": "$http_x_forwarded_for", ' # http_x_forwarded_for
    '"http_host": "$http_host", ' # the request Host: header
    '"server_name": "$server_name", ' # the name of the vhost serving the request
    '"request_time": "$request_time", ' # request processing time in seconds with msec resolution
    '"upstream": "$upstream_addr", ' # upstream backend server for proxied requests
    '"upstream_connect_time": "$upstream_connect_time", ' # upstream handshake time incl. TLS
    '"upstream_header_time": "$upstream_header_time", ' # time spent receiving upstream headers
    '"upstream_response_time": "$upstream_response_time", ' # time spend receiving upstream body
    '"upstream_response_length": "$upstream_response_length", ' # upstream response length
    '"upstream_cache_status": "$upstream_cache_status", ' # cache HIT/MISS where applicable
    '"ssl_protocol": "$ssl_protocol", ' # TLS protocol
    '"ssl_cipher": "$ssl_cipher", ' # TLS cipher
    '"scheme": "$scheme", ' # http or https
    '"request_method": "$request_method", ' # request method
    '"server_protocol": "$server_protocol", ' # request protocol, like HTTP/1.1 or HTTP/2.0
    '"pipe": "$pipe", ' # "p" if request was pipelined, "." otherwise
    '"gzip_ratio": "$gzip_ratio", '
    '"http_cf_ray": "$http_cf_ray"'
    '}';
    access_log /usr/local/nginx/logs/access.log json_analytics;
    access_log logs/access.log main;

    截图:

    image-20231115211652591

    image-20231115211758942

    重启nginx查看日志格式变化情况

    1
    2
    # 重启nginx
    systemctl restart nginx

    NOTE: 测试未成功

五. 部署节点 与grafana (13639)结合效果

NOTE: 添加完成后需要等待一段时间等产生新的日志后才能在explore中查询出日志

image-20231103161005759

如:apt install -y lrzszjob标签test随即出现日志


image-20231105141015737

与本机实际最新日志对比一模一样


image-20231103160921219

教程结束。