Files
2025-10-24 18:18:40 +08:00

782 lines
35 KiB
YAML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

部署到国外43.159.145.241这台机器上的三个疑问:
一、flymoon-jenniefy.jar相关
1、flymoon-jenniefy.jar运行需要访问redis那就还要装在43.159.145.241装redis?
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:6379
2、flymoon-jenniefy.jar运行需要访问mysql是访问腾讯云生产数据库吗假如是代码仓库里flymoon-jenniefy项目的分支sit代码是访问redis和mysql都在sit自己身上部署到国外43.159.145.241这台机器上应该用prod分支
3、8070接口使用?
location ^~ /prod-api {
proxy_pass http://127.0.0.1:8070;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
二、python部署相关完全就照搬昨晚sit的部署
三、jenniefy_web ui域名需要确定网址是https://www.jennie.deal还是其他以及域名证书文件限制国内访问这个得看DNS解析的平台有没有限制
滴滴夜莺监控以es为数据源做告警我的es是从别的机器上收集日志文件存起来的我希望能在es收集到的日志行有error就进行告警到飞书机器人用飞书卡片发通知并能够在卡片通知中显示具体的error行的内容同时卡片不要太长导致刷屏该怎么做
{{ if .IsRecovered }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** S{{.Severity}} Recovered
**告警名称:** {{.RuleName}}
**恢复时间:** {{timeformat .LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** S{{.Severity}} Triggered
**告警名称:** {{.RuleName}}
**触发时间:** {{timeformat .TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{.TriggerValue}}
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
{{- end -}}
{{$domain := "http://请联系管理员修改通知模板将域名替换为实际的域名" }}
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
{{ if .IsRecovered }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** S{{.Severity}} Recovered
**告警名称:** {{.RuleName}}
**恢复时间:** {{timeformat .LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** S{{.Severity}} Triggered
**告警名称:** {{.RuleName}}
**触发时间:** {{timeformat .TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{.TriggerValue}}
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
**错误日志摘要:** {{.Message | slice 0 100 | default "无错误信息"}}
{{- end -}}
{{$domain := "http://192.168.1.7:17000/" }}
[日志详情]({{$domain}}/alert-his-events/{{.Id}})
http://192.168.60.21/app/discover#/?_g=(time:(from:'<start_time>',to:'<end_time>'))&_a=(columns:!(_source),index:'<index_name>',query:(query_string:(query:'message:error')))
(http://192.168.60.21/app/discover#/?_g=(time:(from:'{{timeformat .TriggerTime}}',to:'{{timestamp}}'))&_a=(columns:!(_source),index:'{{ $indexname}}',query:(query_string:(query:'message:error'))))
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexname }}',query:(query_string:(query:'message:error'))))
{{- $startTime := .TriggerTime | mul 1000 }}
{{- $endTime := .LastEvalTime | mul 1000 }}
{{- $kibanaHost := "http://192.168.60.21:5601" }}
{{ if .IsRecovered }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** S{{.Severity}} Recovered
**告警名称:** {{.RuleName}}
**恢复时间:** {{timeformat .LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** {{.Severity}}级
**告警名称:** {{.RuleName}}
**触发时间:** {{timeformat .TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{.TriggerValue}}
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
{{- end -}}
{{$domain := "http://192.168.1.7:17000/" }}
[事件详情]({{$domain}}/alert-his-events/{{.Id}})
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
{{- end -}}
{{- $startTime := .TriggerTime | mul 1000 }}
{{- $endTime := .LastEvalTime | mul 1000 }}
{{- $kibanaHost := "http://192.168.60.21:5601" }}
{{- $indexName := .TagsMap.indexname }}
{{ if .IsRecovered }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** S{{.Severity}} Recovered
**告警名称:** {{.RuleName}}
**恢复时间:** {{timeformat .LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne .Cate "host"}}
**告警集群:** {{.Cluster}}{{end}}
**级别状态:** {{.Severity}}级
**告警名称:** {{.RuleName}}
**触发时间:** {{timeformat .TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{.TriggerValue}}
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
{{- end -}}
[详情]({{$kibanaHost}}/alert-his-events/{{.Id}})
http://192.168.60.21:5601/app/discover#/?_g=(time:(from:'1736939280000',to:'1736939525000'))&_a=(columns:!(),dataSource:(dataViewId:f101ce47-ebde-4f42-bbd1-dee42c68148e,type:dataView),filters:!(),interval:auto,query:(language:lucene,query:(query_string:(query:'message:info'))),sort:!(!('@timestamp',desc)))
tags
agent.*
as*
client.*
cloud.*
container.*
destination.*
dns.*
ecs.*
error.*
event.*
file.*
geo.*
group.*
hash.*
host.*
http.*
log.origin*
log.syslog*
network.*
observer*
organization.*
os.*
package.*
process.*
server.*
service.*
source.*
threat.*
trace.id*
transaction.id*
url.*
user.*
user_agent.*
cloud.*
host.*
kubernetes.*
process.owner.*
jolokia.*
aws*
bucket.*
object.*
fields.*
帮我处理一些字段:改成通配符,去重,比如:
"os.name",
"os.platform",
"os.version",
"package.architecture",
"package.checksum",
"package.description",
变成:
os.*
package.*
docker run -d -p 3030:3030 -p 3333:3333 \
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
-v `pwd`/rules:/opt/elastalert/rules \
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
--network 1panel-network \
--name elastalert praecoapp/elastalert-server:latest
docker run -d -p 8080:8080 --network 1panel-network --name praeco \
-e ELASTICSEARCH_HOST=http://192.168.60.21:9200 \
-e ELASTICSEARCH_USERNAME=admin \
-e ELASTICSEARCH_PASSWORD=123456 \
-e ELASTALERT_HOST=http://192.168.60.21:3030 \
johnsusek/praeco
echo "slack_webhook_url: 'https://open.feishu.cn/open-apis/bot/v2/hook/8bd6a15d-90f0-4f4f-a1b1-bd105f31ea06'" | sudo tee -a rules/BaseRule.config >/dev/null
export PRAECO_ELASTICSEARCH=192.168.1.7
{
"msg_type": "interactive",
"card": {
"header": {
"title": {
"content": "[ INFINI Platform Alerting ]",
"tag": "plain_text"
},
"template":"{{if eq .priority "critical"}}red{{else if eq .priority "high"}}orange{{else if eq .priority "medium"}}yellow{{else if eq .priority "low"}}grey{{else}}blue{{end}}"
},
"elements": [
{
"tag": "markdown",
"content": "🔥 告警事件 [#{{.event_id}}]({{$.env.INFINI_CONSOLE_ENDPOINT}}/#/alerting/message/{{.event_id}}) 正在进行中\n **{{.title}}**\n 优先级: {{.priority}}\n 事件ID: {{.event_id}}\n 目标: {{.resource_name}}-{{.objects}}\n 触发时间: {{.trigger_at | datetime}}"
},
{
"tag": "hr"
},
{
"tag": "markdown",
"content": "**具体错误行内容**: {{ if. hits.hits.0._source.message }}{{.hits.hits.0._source.message }}{{ else }}{{.message | str_replace \"\\n\" \"\\\\n\" }}{{ end }}\n **触发 error 的时间**: {{.trigger_at | datetime }}"
}
]
}
}
# 原来的----------------------------
{{- $startTime := .TriggerTime | mul 1000 }}
{{- $endTime := .LastEvalTime | mul 1000 }}
{{- $indexName := .TagsMap.indexname }}
{{- $query := .TagsMap.query }}
{{ if .IsRecovered }}
{{- if ne .Cate "host"}}
{{end}}
**恢复时间:** {{timeformat .LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne .Cate "host"}}
{{end}}
**触发时间:** {{timeformat .TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{.TriggerValue}}
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
{{- end -}}
{{$domain := "http://192.168.60.21:5601/" }}
[日志详情]({{$domain}}{{$event.RunbookUrl}})
# 原来的----------------------------
{{ if $event.IsRecovered }}
{{- if ne $event.Cate "host"}}
{{end}}
**恢复时间:** {{timeformat $event.LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne $event.Cate "host"}}
{{end}}
**触发时间:** {{timeformat $event.TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{$event.TriggerValue}}
{{if $event.RuleNote }}**对应服务:** **{{$event.RuleNote}}**{{end}}
{{- end -}}
[日志详情]({{$event.Bshboardurl}})
{{- $startTime := .TriggerTime | mul 1000 }}
{{- $endTime := .LastEvalTime | mul 1000 }}
{{- $indexName := .TagsMap.indexname }}
{{- $fieldName := .TagsMap.fieldname }}
{{- $query := .TagsMap.query }}
{{ if .IsRecovered }}
{{- if ne .Cate "host"}}
{{end}}
**恢复时间:** {{timeformat .LastEvalTime}}
**告警描述:** **服务已恢复**
{{- else }}
{{- if ne .Cate "host"}}
{{end}}
**触发时间:** {{timeformat .TriggerTime}}
**发送时间:** {{timestamp}}
**触发时值:** {{.TriggerValue}}
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
{{- end -}}
{{- $domain := "http://192.168.60.21:5601/" }}
{{- $kibanaQuery := printf "%s:error AND @timestamp:[%d TO %d]" $fieldName $startTime $endTime }}
{{- $fromTime := timeformat .TriggerTime "2006-01-02T15:04:05.000Z" }}
{{- $toTime := timeformat .LastEvalTime "2006-01-02T15:04:05.000Z" }}
{{- $kibanaLink := printf "%s/app/discover#/?_a=(index:'%s',query:(language:kuery,query:'%s'))&_g=(time:(from:'%s',to:'%s'))" $domain $indexName $kibanaQuery $fromTime $toTime }}
[日志详情]({{$kibanaLink}})
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
http://192.168.1.7:17000/alert-his-events/
申请zoho个人邮箱https://www.zoho.com.cn/mail/
注册过程中需要手机号码接收验证码
谷歌邮箱https://workspace.google.com/business/signup/newbusiness?xsell=google_accounts&back=https://accounts.google.com/SignUp?ec=asw-gmail-hero-create2&biz=true&continue=https://mail.google.com/mail/&flowEntry=SignUp&flowName=GlifWebSignIn&service=mail&theme=glif&ec=asw-gmail-hero-create2&source=gafb-gmail-hero-zh-CN&hl=zh-CN&ga_region=japac&ga_country=zh-CN&ga_lang=zh-CN
注册完后续需要自行绑定手机号、辅助邮箱以便在失去账号访问权限时能够重新登录账号
开启两步验证需添加手机号收取验证码作为验证
mylidamin@gmail.com aAggyxmm
docker run -d --name my-nginx -p 9200:9200 -v /data/my-nginx/conf:/etc/nginx/conf.d nginx
curl -u admin:123456 http://106.53.194.199:9200/
docker run -d \
--name elastalert2 \
--network=1panel-network \
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
-v /data/elastalert2/rules:/opt/elastalert/rules \
-v /data/elastalert2/data:/opt/elastalert/data \
-e "ES_USERNAME=admin" \
-e "ES_PASSWORD=123456" \
jertel/elastalert2:latest
docker run -d \
--name elastalert2 \
--network=1panel-network \
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
-v /data/elastalert2/rules:/opt/elastalert/rules \
-v /data/elastalert2/data:/opt/elastalert/data \
-v /data/elastalert2/config/smtp_auth.yaml:/opt/elastalert/smtp_auth.yaml \
-e "ES_USERNAME=admin" \
-e "ES_PASSWORD=123456" \
jertel/elastalert2:latest --verbose
docker run -d \
--name elastalert2-1 \
--network=1panel-network \
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
-v /data/elastalert2/rules:/opt/elastalert/rules \
-v /data/elastalert2/data:/opt/elastalert/data \
-v /data/elastalert2/elastalert_modules:/opt/elastalert/elastalert_modules \
-e "ES_USERNAME=admin" \
-e "ES_PASSWORD=123456" \
jertel/elastalert2:latest --verbose
docker run -d \
--name elastalert2 \
--network=1panel-network \
-v /opt/elastalert2/config/config.yaml:/opt/elastalert2/config.yaml \
-v /opt/elastalert2/rules:/opt/elastalert2/rules \
-e "TZ=Asia/Shanghai" \
-e "ES_USERNAME=admin" \
-e "ES_PASSWORD=123456" \
jertel/elastalert2:latest
docker run -d --name elastalert --restart=always \
-v /data/feishu-alert/config.yaml:/opt/elastalert/config.yaml \
-v /data/feishu-alert/rules:/opt/elastalert/rules \
-v /etc/localtime:/etc/localtime \
dengchuanfu/feishualert:v0.1 --verbose
目前我用的filebasts收集日志文件到es中然后需要你推荐我使用一个告警工具进行告警要免费的不要用kibana自带的它需要付费订阅
配置告警规则可以有web界面进行配置将收集的日志行级别是error的告警到飞书机器人群中有多个filebasts在收集每个项目日志会在es中生成对应的数据流。我需要你一步步教我配置好有任何需要的信息跟我拿比如不同项目的名字、数据流的名字、判断error行的字段名字、判断条件、飞书机器人地址等等要优化好比如信息聚合等等。
目前产生的日志行其附带的堆栈信息回比较长需要怎么优化这个告警的显示我希望发到飞书机器人的信息格式是以飞书卡片形式标题是项目名显示的信息是1、触发时间2、触发时产生的error行数量3、具体的error行的内容这里需要处理有的堆栈信息过长的问题4、直接跳转到kibana的地址链接。
我尝试过以下几个但是没成功:
1、INFINI Console是我查过的貌似可以实现但是不会配置
2、滴滴夜莺可以触发告警但是无法在消息中直接显示es的行的内容
3、ElastAlert2部署麻烦我没成功部署过
以下是我的一些项目信息
1、项目名称列表以及对应的数据流
pord-flymoon-task pord01-flymoonlog-pord01-flymoon-task-2025.*
pord-flymoon-sse pord01-flymoonlog-pord01-flymoon-sse-2025.*
pord-flymoon-partner pord01-flymoonlog-pord01-flymoon-partner-2025.*
pord-flymoon-admin pord01-flymoonlog-pord01-flymoon-admin-2025.*
pord-flymoon_crawlspider pord01-flymoonlog-pord01-flymoon_crawlspider-2025.*
pord-fly-moon-email_v2 pord01-flymoonlog-pord01-fly-moon-email_v2-2025.*
out-pord-fly-moon-email_v2 out-241-flymoonlog-pord-fly-moon-email_v2-2025.*
2、代表error行的字段,在kibana筛选的时候,
除了pord-fly-moon-email_v2和out-pord-fly-moon-email_v2项目是用message:error
其他的项目都是parsed_sys_info.log_level:error
3、机器人地址 https://open.feishu.cn/open-apis/bot/v2/hook/7525a700-f27e-4b3d-88eb-7ceaff9e44e4
4、Kibana 基础地址
http://192.168.60.21:5601/
并且有账号admin,密码123456
xpack.encryptedSavedObjects.encryptionKey: "414d80b26291f9a0017f0a3ff591f22a969bec72f48f66de579ab0dadd1131c4"
mkdir /data/feishu-alert/config.yaml
mkdir /data/feishu-alert/rules
vim /data/feishu-alert/rules/alert_error.yaml
rules_folder: /opt/elastalert/rules #指定告警文件存放目录
run_every:
minutes: 1 #ElastAlert查询Elasticsearch的频率这个单位可以是几周到几秒不等
buffer_time:
minutes: 1 #ElastAlert将缓冲最近的一段时间的结果以防某些日志源不是实时的
es_host: 192.168.1.7 #Elasticsearch主机
es_port: 9200 #Elasticsearch端口
writeback_index: elastalert_status #es_host上的索引用于元数据存储。这可以是一个未映射的索引但建议你运行。设置一个映射。
alert_time_limit:
days: 2 #如果一个警报因某种原因而失败ElastAlert将重试直到这个时间段过后
#rule name 必须是独一的
name: alert_error
#必须设置的值多种type可上alert官方文档查询
type: frequency
#指定index支持正则匹配
index: easyspeed-cloud-logs-*
use_strftime_index: true
#时间触发的次数
num_events: 1
#和num_events参数关联1分钟内出现1次会报警
timeframe:
minutes: 1
#同一规则的两次警报之间的最短时间。在此时间内发生的任何警报都将被丢弃。默认值为一分钟。
realert:
minutes: 1
#用来拼配告警规则elasticsearch 的query语句支持 AND&OR等
#我这里是根据业务需求查询level是ERROR并且type是production且排除message里不带有“开始执行批量/更新轨迹维护关键字
#简单匹配查询可以直接query “level ERROR“表示level值是ERROR的触发告警通知。
filter:
- query:
query_string:
query: "NOT message: (开始执行批量 OR 更新轨迹维护) AND level: ERROR AND type: production"
#只需要的字段 https://elastalert.readthedocs.io/en/latest/ruletypes.html#include
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
#飞书告警方式
alert:
- "elastalert_modules.feishu_alert.FeishuAlert"
# 飞书机器人接口地址
feishualert_url: "https://open.feishu.cn/open-apis/bot/v2/hook/"
# 飞书机器人id
feishualert_botid:
"xxx-xxx-xxx"
# 告警标题
feishualert_title:
"业务日志ERROR异常"
#00-08点不告警
feishualert_skip:
start: "00:00:00"
end: "08:00:00"
#告警内容,使用{}可匹配matches
feishualert_body:
"
【告警主题】: {feishualert_title}\n
【告警时间】: {feishualert_time}\n
【告警环境】: 【production】\n
【告警模块】: {source}\n
【业务索引】: {_index}\n
【时间戳】: {@timestamp}\n
【日志级别】: {level}\n
【spanId】: {spanId}\n
【traceId】: {traceId}\n
【host】: {host}\n
【message】: {message}
"
index: pord01-flymoonlog-pord01-flymoon-task-2025.*
use_strftime_index: true
num_events: 1
timeframe:
minutes: 1
realert:
minutes: 1
filter:
- query:
query_string:
query: "parsed_sys_info.log_level:error"
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
(venv) -bash-4.2# ps axjf|grep check_account.py
15294 8347 8346 15294 pts/5 8346 S+ 0 0:00 \_ grep --color=auto check_account.py
1 7961 7750 7750 ? -1 S 0 0:00 bash -c source /data/webapps/prod_check_tiktok_account/venv/bin/activate && nohup python /data/webapps/prod_check_tiktok_account/check_account.py > /data/webapps/prod_check_tiktok_account/output.log 2>&1 & echo $! > /data/webapps/prod_check_tiktok_account/check_account.pid // 保存 PID disown // 让?程彻底脱离 echo "? 生产环境?程已启动PID: $(cat /data/webapps/prod_check_tiktok_account/check_account.pid)"
7961 7964 7750 7750 ? -1 Sl 0 0:02 \_ python /data/webapps/prod_check_tiktok_account/check_account.py
sit_fly_moon_web|sit_jenniefy_web|sit_scalelink_frontend|test_fly_moon_web|test_deeplink_merchant|test_deeplink_merchant_foreign|test_deeplink_merchant_saas|test_jenniefy_web|test_scalelink_frontend|oversea_test_jenniefy_web
deeplink_merchant|deeplink_merchant_saas|fly_moon_web|scalelink_frontend|oversea_jenniefy_web
{{ if $event.IsRecovered }}
{{- if ne $event.Cate "host"}}
{{end}}
**恢复时间:** {{timeformat $event.LastEvalTime}}
**告警描述:** **已恢复**
**当前进程内存占用:**{{xxxx}}
**当前系统剩余内存:** {{xxxx}}
{{- else }}
{{- if ne $event.Cate "host"}}
{{end}}
**触发时间:** {{timeformat $event.TriggerTime}}
**当前进程内存占用:**{{xxxx}}
**当前系统剩余内存:** {{xxxx}}
{{if $event.RuleNote }}**详细信息:** **{{$event.RuleNote}}**{{end}}
{{- end -}}
{{$domain := "http://192.168.60.21:5601" }}
[近1小时日志详情]({{$domain}}{{$labels.url}})
process_resident_memory_bytes{instance="prod01-server", job="prod01-server-process-exporter"}
process_resident_memory_bytes{job='process-exporter',instance=~'$host',name=~'$process_name'}" | first | value | humanize1024
process_resident_memory_bytes{job='prod01-server-process-exporter',instance=~'prod01-server',name=~'prod-flymoon_email_prod'}" | first | value | humanize1024
pipeline {
agent any
environment {
REMOTE_HOST = "43.130.56.138" // 远程服务器 {params.REMOTE_HOST}
REMOTE_PROJECT_PATH = "/data/webapps/lessie_sourcing_agents" // 远程 Python 项目路径
VENV_DIR = "/data/webapps/lessie_sourcing_agents/venv" // 远程虚拟环境目录
CONDA_PATH = "/root/miniconda3/bin/conda" // 修改为实际 Conda 安装路径
}
stages {
stage('Checkout 代码') {
steps {
git branch: "${params.Code_branch}", credentialsId: 'fly_gitlab_auth', url: 'http://172.24.16.20/python/lessie-sourcing-agents.git'
}
}
stage('进程下线') {
steps {
echo("下线")
sh "ssh ${REMOTE_HOST} 'sh /data/sh/kill_lessie_sourcing_agents.sh'"
}
}
stage('工程同步') {
steps {
sh """
ssh ${REMOTE_HOST} 'mkdir -p ${REMOTE_PROJECT_PATH}'
rsync -avz --exclude 'venv' ${WORKSPACE}/ ${REMOTE_HOST}:${REMOTE_PROJECT_PATH}/
"""
}
}
stage('安装依赖') {
steps {
sh """
ssh ${REMOTE_HOST} '
cd ${REMOTE_PROJECT_PATH} &&
source ~/.bashrc &&
conda activate search &&
source ${VENV_DIR}/bin/activate &&
pip install --upgrade pip &&
pip install -r requirements.txt
'
"""
}
}
stage('工程启动') {
steps {
echo("启动")
sh """
ssh ${REMOTE_HOST} '
conda activate search
source ${VENV_DIR}/bin/activate
TIMESTAMP=\$(date +"%Y%m%d_%H%M%S")
nohup python /data/webapps/lessie_sourcing_agents/server.py > /data/webapps/lessie_sourcing_agents/logs/lessie_sourcing_agents_\${TIMESTAMP}.log 2>&1 &
'
"""
}
}
}
post {
success {
echo '✅ 部署成功!'
}
failure {
echo '❌ 部署失败,请检查日志!'
}
}
}
{{/* 通用头部信息:固定展示告警类型、关联机器与规则信息 */}}
**告警类型:** {{if $event.IsRecovered}}内存告警-恢复通知{{else}}内存告警-触发通知{{end}}
**关联机器:** {{$event.Attributes.host | default "未知主机"}}
**告警规则:** {{$event.RuleName}}
{{if $event.RuleId}}**规则ID:** {{$event.RuleId}}{{end}}
{{/* 恢复场景逻辑:展示恢复时间与恢复时内存大小 */}}
{{ if $event.IsRecovered }}
{{- if ne $event.Cate "host"}}{{end}}
**恢复时间:** {{timeformat $event.LastEvalTime "2006-01-02 15:04:05"}}
**恢复时内存使用:** {{$event.TriggerValue | humanizeSize}}
**告警描述:** 机器内存使用率已降至阈值以下,告警恢复
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
{{/* 触发场景逻辑:展示触发时间、发送时间与触发时内存大小 */}}
{{- else }}
{{- if ne $event.Cate "host"}}{{end}}
**触发时间:** {{timeformat $event.TriggerTime "2006-01-02 15:04:05"}}
**发送时间:** {{timeformat $timestamp "2006-01-02 15:04:05"}}
**触发时内存使用:** {{$event.TriggerValue | humanizeSize}}
**告警描述:** 机器内存使用率超过设定阈值,触发告警
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
{{if $event.Threshold}}**告警阈值:** {{$event.Threshold | humanizeSize}}{{end}}
{{- end -}}
分析一下这个内存变化:(base) [root@prod-lessie-server02 ~]#free -h
total used free shared buff/cache available
Mem: 15Gi 15Gi 133Mi 1.0Mi 237Mi 98Mi
Swap: 8.0Gi 8.0Gi 0.0Ki
(base) [root@prod-lessie-server02 ~]#ps aux|grep 7001
root 158009 0.0 0.0 35908 10420 ? S Oct21 0:12 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158011 0.8 90.3 22549000 14317120 ? Sl Oct21 24:39 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158012 0.3 0.7 4111768 117868 ? Sl Oct21 9:26 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158013 0.4 1.5 5852256 243708 ? Sl Oct21 12:08 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158014 0.5 1.6 6056408 260252 ? Sl Oct21 13:51 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158160 0.0 0.2 2404712 39116 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158186 0.0 0.7 2404684 119708 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158218 0.0 0.1 2404716 31388 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158253 0.0 0.2 2406028 35400 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 1028724 0.0 0.0 6408 384 pts/0 D+ 18:37 0:00 grep --color=auto 7001
(base) [root@prod-lessie-server02 ~]#ps aux|grep 7001
root 158009 0.0 0.0 35908 11316 ? S Oct21 0:12 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158012 0.3 0.7 4111768 119916 ? Sl Oct21 9:27 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158013 0.4 1.5 5852256 246652 ? Sl Oct21 12:11 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158014 0.5 1.6 6056408 261532 ? Sl Oct21 13:53 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158160 0.0 0.2 2404712 39244 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158186 0.0 0.7 2404684 119324 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158218 0.0 0.2 2404716 31772 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158253 0.0 0.2 2406028 35400 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 1028770 0.0 0.0 6408 2176 pts/0 S+ 18:39 0:00 grep --color=auto 7001
(base) [root@prod-lessie-server02 ~]#free -h
total used free shared buff/cache available
Mem: 15Gi 2.0Gi 12Gi 1.0Mi 961Mi 13Gi
Swap: 8.0Gi 3.9Gi 4.1Gi
(base) [root@prod-lessie-server02 ~]#
(base) [root@prod-lessie-server02 ~]#ps auxf|grep 7001
root 1029136 0.0 0.0 6408 2304 pts/0 S+ 18:39 0:00 | | \_ grep --color=auto 7001
root 158009 0.0 0.0 35908 14004 ? S Oct21 0:12 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158012 0.3 0.7 4111768 121836 ? Sl Oct21 9:27 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158160 0.0 0.2 2404712 39244 ? Sl Oct21 0:06 | \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158013 0.4 1.5 5852256 248700 ? Sl Oct21 12:11 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158218 0.0 0.2 2404716 31772 ? Sl Oct21 0:06 | \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158014 0.5 1.6 6056408 264348 ? Sl Oct21 13:53 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158253 0.0 0.2 2406028 35400 ? Sl Oct21 0:06 | \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 1028807 50.5 3.4 3214636 553780 ? Sl 18:39 0:08 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 1029008 0.0 2.9 2408940 466648 ? Sl 18:39 0:00 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
root 158186 0.0 0.7 2404652 120604 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
(base) [root@prod-lessie-server02 ~]#
(base) [root@prod-lessie-server02 ~]#free -h
total used free shared buff/cache available
Mem: 15Gi 2.1Gi 12Gi 1.0Mi 1.0Gi 13Gi
Swap: 8.0Gi 3.9Gi 4.1Gi 从工作进程上分析,从内存占用大到恢复是什么原因