解决 Loki搜集日志慢的问题

收到反馈,当日志量每秒钟很大的时候:系统的qps为1.2w/s,产生的日志量约为8-10w/s,查看Loki的时候会看到Loki写入的日志时间和服务真实输出的日志时间差距很大,有最多40-60分钟的差距

排查

排查loki-gateway入口

排查loki-gateway服务入口,这个是Loki的simple mode的流量入口,其实就是一个nginx,看到如下的日志:

10.192.0.7 - - [09/Jun/2025:10:06:38 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.0.2.129 - - [09/Jun/2025:10:06:38 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.3 - - [09/Jun/2025:10:06:38 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.0.2.129 - - [09/Jun/2025:10:06:38 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.3 - - [09/Jun/2025:10:06:38 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:39 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:39 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.8 - - [09/Jun/2025:10:06:39 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.3 - - [09/Jun/2025:10:06:39 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.0.2.129 - - [09/Jun/2025:10:06:39 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.3 - - [09/Jun/2025:10:06:39 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:39 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.4 - - [09/Jun/2025:10:06:39 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:39 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.3 - - [09/Jun/2025:10:06:40 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.0.2.129 - - [09/Jun/2025:10:06:40 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:40 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.8 - - [09/Jun/2025:10:06:40 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.3 - - [09/Jun/2025:10:06:40 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:40 +0000]  429 "POST /loki/api/v1/push HTTP/1.1" 227 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:40 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.192.0.7 - - [09/Jun/2025:10:06:40 +0000]  204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"

可以看到,429出现了7次,总共是20次请求!
注:http协议的429编码报告的错误是:请求体太大!

排查promtail日志

由于我们使用的是sidecar的promtail搜集服务日志,所以使用功能如下命令查看:

kubectl logs pod名字 -c promtail-log-sidecar -n 业务名称
#得到如下结果:
level=warn ts=2025-06-09T10:09:24.993836565Z caller=client.go:349 component=client host=loki-gateway.logs:80 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 1398101 bytes/sec) while attempting to ingest '4568' lines totaling '1048551' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

level=warn ts=2025-06-09T10:09:25.652053072Z caller=client.go:349 component=client host=loki-gateway.logs:80 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 1398101 bytes/sec) while attempting to ingest '4568' lines totaling '1048551' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

锁定问题
看到错误日志的意思是:发送的日志的消息体过大,默认的是1.33MB/s 但是发送的日志消息体是:1048574 bytes (计算了下是约等于1MB),已经是低于默认速度了。

解决方案

auth_enabled: false
 common:
   path_prefix: /var/loki
   replication_factor: 3
   storage:
     s3:
       access_key_id: woo-minio
       bucketnames: loki-chunks
       endpoint: minio-release.storage.svc.cluster.local:9000
       insecure: true
       s3: null
       s3forcepathstyle: true
       secret_access_key: 密钥
 limits_config:
   enforce_metric_name: false
   max_cache_freshness_per_query: 10m
   reject_old_samples: true
   reject_old_samples_max_age: 168h
   split_queries_by_interval: 15m
    
   #配置如下的信息,意思是给每个tentant组合100MB的速递,之前是4MB,然后三个loki-writer
   #每个实例只会分到1.3MB/s
   ingestion_rate_mb: 100
   ingestion_burst_size_mb: 150
   per_stream_rate_limit: "100MB"
   per_stream_rate_limit_burst: "300MB"
 memberlist:
   join_members:
   - loki-memberlist
 ruler:
   storage:
     s3:
       bucketnames: loki-ruler
 schema_config:
   configs:
   - from: "2025-01-11"
     index:
       period: 24h
       prefix: loki_index_
     object_store: s3
     schema: v12
     store: boltdb-shipper
 server:
   grpc_listen_port: 9095
   http_listen_port: 3100

promtail也需要配套的修改资源的配置:

limits: cpu: 1000m #CPU要给一个核 
mem: 150Mi #内存还好点,不需要多么的大
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇