782 lines
35 KiB
YAML
782 lines
35 KiB
YAML
部署到国外43.159.145.241这台机器上的三个疑问:
|
||
|
||
一、flymoon-jenniefy.jar相关:
|
||
1、flymoon-jenniefy.jar运行需要访问redis,那就还要装在43.159.145.241装redis?
|
||
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:6379
|
||
|
||
2、flymoon-jenniefy.jar运行需要访问mysql,是访问腾讯云生产数据库吗?假如是,代码仓库里flymoon-jenniefy项目的分支sit代码是访问redis和mysql都在sit自己身上,部署到国外43.159.145.241这台机器上应该用prod分支?
|
||
|
||
3、8070接口使用:?
|
||
location ^~ /prod-api {
|
||
proxy_pass http://127.0.0.1:8070;
|
||
proxy_set_header Host $host;
|
||
proxy_set_header X-Real-IP $remote_addr;
|
||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||
}
|
||
|
||
|
||
二、python部署相关,完全就照搬昨晚sit的部署?
|
||
|
||
|
||
三、jenniefy_web ui,域名需要确定,网址是https://www.jennie.deal还是其他,以及域名证书文件;限制国内访问这个得看DNS解析的平台有没有限制
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
滴滴夜莺监控以es为数据源做告警,我的es是从别的机器上收集日志文件存起来的,我希望能在es收集到的日志行有error就进行告警到飞书机器人,用飞书卡片发通知,并能够在卡片通知中显示具体的error行的内容,同时卡片不要太长导致刷屏,该怎么做
|
||
|
||
|
||
|
||
|
||
{{ if .IsRecovered }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** S{{.Severity}} Recovered
|
||
**告警名称:** {{.RuleName}}
|
||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** S{{.Severity}} Triggered
|
||
**告警名称:** {{.RuleName}}
|
||
**触发时间:** {{timeformat .TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{.TriggerValue}}
|
||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
{{$domain := "http://请联系管理员修改通知模板将域名替换为实际的域名" }}
|
||
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
|
||
|
||
|
||
|
||
|
||
|
||
{{ if .IsRecovered }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** S{{.Severity}} Recovered
|
||
**告警名称:** {{.RuleName}}
|
||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** S{{.Severity}} Triggered
|
||
**告警名称:** {{.RuleName}}
|
||
**触发时间:** {{timeformat .TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{.TriggerValue}}
|
||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||
**错误日志摘要:** {{.Message | slice 0 100 | default "无错误信息"}}
|
||
{{- end -}}
|
||
{{$domain := "http://192.168.1.7:17000/" }}
|
||
[日志详情]({{$domain}}/alert-his-events/{{.Id}})
|
||
|
||
|
||
|
||
http://192.168.60.21/app/discover#/?_g=(time:(from:'<start_time>',to:'<end_time>'))&_a=(columns:!(_source),index:'<index_name>',query:(query_string:(query:'message:error')))
|
||
|
||
|
||
(http://192.168.60.21/app/discover#/?_g=(time:(from:'{{timeformat .TriggerTime}}',to:'{{timestamp}}'))&_a=(columns:!(_source),index:'{{ $indexname}}',query:(query_string:(query:'message:error'))))
|
||
|
||
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexname }}',query:(query_string:(query:'message:error'))))
|
||
|
||
|
||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||
{{- $kibanaHost := "http://192.168.60.21:5601" }}
|
||
{{ if .IsRecovered }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** S{{.Severity}} Recovered
|
||
**告警名称:** {{.RuleName}}
|
||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** {{.Severity}}级
|
||
**告警名称:** {{.RuleName}}
|
||
**触发时间:** {{timeformat .TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{.TriggerValue}}
|
||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
{{$domain := "http://192.168.1.7:17000/" }}
|
||
[事件详情]({{$domain}}/alert-his-events/{{.Id}})
|
||
|
||
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
|
||
|
||
|
||
({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
|
||
|
||
|
||
|
||
|
||
{{- end -}}
|
||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||
{{- $kibanaHost := "http://192.168.60.21:5601" }}
|
||
{{- $indexName := .TagsMap.indexname }}
|
||
|
||
{{ if .IsRecovered }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** S{{.Severity}} Recovered
|
||
**告警名称:** {{.RuleName}}
|
||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne .Cate "host"}}
|
||
**告警集群:** {{.Cluster}}{{end}}
|
||
**级别状态:** {{.Severity}}级
|
||
**告警名称:** {{.RuleName}}
|
||
**触发时间:** {{timeformat .TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{.TriggerValue}}
|
||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
[详情]({{$kibanaHost}}/alert-his-events/{{.Id}})
|
||
|
||
|
||
|
||
|
||
http://192.168.60.21:5601/app/discover#/?_g=(time:(from:'1736939280000',to:'1736939525000'))&_a=(columns:!(),dataSource:(dataViewId:f101ce47-ebde-4f42-bbd1-dee42c68148e,type:dataView),filters:!(),interval:auto,query:(language:lucene,query:(query_string:(query:'message:info'))),sort:!(!('@timestamp',desc)))
|
||
|
||
|
||
|
||
|
||
|
||
tags
|
||
agent.*
|
||
|
||
as*
|
||
client.*
|
||
cloud.*
|
||
container.*
|
||
destination.*
|
||
dns.*
|
||
ecs.*
|
||
error.*
|
||
event.*
|
||
file.*
|
||
geo.*
|
||
group.*
|
||
hash.*
|
||
host.*
|
||
http.*
|
||
log.origin*
|
||
log.syslog*
|
||
network.*
|
||
observer*
|
||
|
||
organization.*
|
||
os.*
|
||
package.*
|
||
process.*
|
||
server.*
|
||
service.*
|
||
source.*
|
||
threat.*
|
||
trace.id*
|
||
transaction.id*
|
||
url.*
|
||
user.*
|
||
user_agent.*
|
||
cloud.*
|
||
host.*
|
||
kubernetes.*
|
||
process.owner.*
|
||
jolokia.*
|
||
aws*
|
||
bucket.*
|
||
object.*
|
||
fields.*
|
||
|
||
|
||
|
||
|
||
帮我处理一些字段:改成通配符,去重,比如:
|
||
"os.name",
|
||
"os.platform",
|
||
"os.version",
|
||
"package.architecture",
|
||
"package.checksum",
|
||
"package.description",
|
||
|
||
变成:
|
||
os.*
|
||
package.*
|
||
|
||
|
||
docker run -d -p 3030:3030 -p 3333:3333 \
|
||
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
|
||
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
|
||
-v `pwd`/rules:/opt/elastalert/rules \
|
||
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
|
||
--network 1panel-network \
|
||
--name elastalert praecoapp/elastalert-server:latest
|
||
|
||
|
||
docker run -d -p 8080:8080 --network 1panel-network --name praeco \
|
||
-e ELASTICSEARCH_HOST=http://192.168.60.21:9200 \
|
||
-e ELASTICSEARCH_USERNAME=admin \
|
||
-e ELASTICSEARCH_PASSWORD=123456 \
|
||
-e ELASTALERT_HOST=http://192.168.60.21:3030 \
|
||
johnsusek/praeco
|
||
|
||
echo "slack_webhook_url: 'https://open.feishu.cn/open-apis/bot/v2/hook/8bd6a15d-90f0-4f4f-a1b1-bd105f31ea06'" | sudo tee -a rules/BaseRule.config >/dev/null
|
||
export PRAECO_ELASTICSEARCH=192.168.1.7
|
||
|
||
|
||
|
||
{
|
||
"msg_type": "interactive",
|
||
"card": {
|
||
"header": {
|
||
"title": {
|
||
"content": "[ INFINI Platform Alerting ]",
|
||
"tag": "plain_text"
|
||
},
|
||
"template":"{{if eq .priority "critical"}}red{{else if eq .priority "high"}}orange{{else if eq .priority "medium"}}yellow{{else if eq .priority "low"}}grey{{else}}blue{{end}}"
|
||
},
|
||
"elements": [
|
||
{
|
||
"tag": "markdown",
|
||
"content": "🔥 告警事件 [#{{.event_id}}]({{$.env.INFINI_CONSOLE_ENDPOINT}}/#/alerting/message/{{.event_id}}) 正在进行中\n **{{.title}}**\n 优先级: {{.priority}}\n 事件ID: {{.event_id}}\n 目标: {{.resource_name}}-{{.objects}}\n 触发时间: {{.trigger_at | datetime}}"
|
||
},
|
||
{
|
||
"tag": "hr"
|
||
},
|
||
{
|
||
"tag": "markdown",
|
||
"content": "**具体错误行内容**: {{ if. hits.hits.0._source.message }}{{.hits.hits.0._source.message }}{{ else }}{{.message | str_replace \"\\n\" \"\\\\n\" }}{{ end }}\n **触发 error 的时间**: {{.trigger_at | datetime }}"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
|
||
|
||
|
||
# 原来的----------------------------
|
||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||
{{- $indexName := .TagsMap.indexname }}
|
||
{{- $query := .TagsMap.query }}
|
||
|
||
{{ if .IsRecovered }}
|
||
{{- if ne .Cate "host"}}
|
||
{{end}}
|
||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne .Cate "host"}}
|
||
{{end}}
|
||
**触发时间:** {{timeformat .TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{.TriggerValue}}
|
||
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
{{$domain := "http://192.168.60.21:5601/" }}
|
||
[日志详情]({{$domain}}{{$event.RunbookUrl}})
|
||
# 原来的----------------------------
|
||
|
||
|
||
|
||
{{ if $event.IsRecovered }}
|
||
{{- if ne $event.Cate "host"}}
|
||
{{end}}
|
||
**恢复时间:** {{timeformat $event.LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne $event.Cate "host"}}
|
||
{{end}}
|
||
**触发时间:** {{timeformat $event.TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{$event.TriggerValue}}
|
||
{{if $event.RuleNote }}**对应服务:** **{{$event.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
[日志详情]({{$event.Bshboardurl}})
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||
{{- $indexName := .TagsMap.indexname }}
|
||
{{- $fieldName := .TagsMap.fieldname }}
|
||
{{- $query := .TagsMap.query }}
|
||
|
||
{{ if .IsRecovered }}
|
||
{{- if ne .Cate "host"}}
|
||
{{end}}
|
||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||
**告警描述:** **服务已恢复**
|
||
{{- else }}
|
||
{{- if ne .Cate "host"}}
|
||
{{end}}
|
||
**触发时间:** {{timeformat .TriggerTime}}
|
||
**发送时间:** {{timestamp}}
|
||
**触发时值:** {{.TriggerValue}}
|
||
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
|
||
{{- $domain := "http://192.168.60.21:5601/" }}
|
||
{{- $kibanaQuery := printf "%s:error AND @timestamp:[%d TO %d]" $fieldName $startTime $endTime }}
|
||
{{- $fromTime := timeformat .TriggerTime "2006-01-02T15:04:05.000Z" }}
|
||
{{- $toTime := timeformat .LastEvalTime "2006-01-02T15:04:05.000Z" }}
|
||
{{- $kibanaLink := printf "%s/app/discover#/?_a=(index:'%s',query:(language:kuery,query:'%s'))&_g=(time:(from:'%s',to:'%s'))" $domain $indexName $kibanaQuery $fromTime $toTime }}
|
||
|
||
[日志详情]({{$kibanaLink}})
|
||
|
||
|
||
|
||
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
|
||
|
||
|
||
http://192.168.1.7:17000/alert-his-events/
|
||
|
||
|
||
|
||
申请zoho个人邮箱:https://www.zoho.com.cn/mail/
|
||
注册过程中需要手机号码接收验证码
|
||
|
||
谷歌邮箱:https://workspace.google.com/business/signup/newbusiness?xsell=google_accounts&back=https://accounts.google.com/SignUp?ec=asw-gmail-hero-create2&biz=true&continue=https://mail.google.com/mail/&flowEntry=SignUp&flowName=GlifWebSignIn&service=mail&theme=glif&ec=asw-gmail-hero-create2&source=gafb-gmail-hero-zh-CN&hl=zh-CN&ga_region=japac&ga_country=zh-CN&ga_lang=zh-CN
|
||
注册完后续需要自行绑定手机号、辅助邮箱以便在失去账号访问权限时能够重新登录账号
|
||
开启两步验证需添加手机号收取验证码作为验证
|
||
|
||
|
||
mylidamin@gmail.com aAggyxmm
|
||
|
||
|
||
docker run -d --name my-nginx -p 9200:9200 -v /data/my-nginx/conf:/etc/nginx/conf.d nginx
|
||
|
||
|
||
curl -u admin:123456 http://106.53.194.199:9200/
|
||
|
||
|
||
|
||
docker run -d \
|
||
--name elastalert2 \
|
||
--network=1panel-network \
|
||
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
||
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
||
-v /data/elastalert2/data:/opt/elastalert/data \
|
||
-e "ES_USERNAME=admin" \
|
||
-e "ES_PASSWORD=123456" \
|
||
jertel/elastalert2:latest
|
||
|
||
|
||
|
||
|
||
docker run -d \
|
||
--name elastalert2 \
|
||
--network=1panel-network \
|
||
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
||
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
||
-v /data/elastalert2/data:/opt/elastalert/data \
|
||
-v /data/elastalert2/config/smtp_auth.yaml:/opt/elastalert/smtp_auth.yaml \
|
||
-e "ES_USERNAME=admin" \
|
||
-e "ES_PASSWORD=123456" \
|
||
jertel/elastalert2:latest --verbose
|
||
|
||
|
||
|
||
docker run -d \
|
||
--name elastalert2-1 \
|
||
--network=1panel-network \
|
||
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
||
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
||
-v /data/elastalert2/data:/opt/elastalert/data \
|
||
-v /data/elastalert2/elastalert_modules:/opt/elastalert/elastalert_modules \
|
||
-e "ES_USERNAME=admin" \
|
||
-e "ES_PASSWORD=123456" \
|
||
jertel/elastalert2:latest --verbose
|
||
|
||
docker run -d \
|
||
--name elastalert2 \
|
||
--network=1panel-network \
|
||
-v /opt/elastalert2/config/config.yaml:/opt/elastalert2/config.yaml \
|
||
-v /opt/elastalert2/rules:/opt/elastalert2/rules \
|
||
-e "TZ=Asia/Shanghai" \
|
||
-e "ES_USERNAME=admin" \
|
||
-e "ES_PASSWORD=123456" \
|
||
jertel/elastalert2:latest
|
||
|
||
docker run -d --name elastalert --restart=always \
|
||
-v /data/feishu-alert/config.yaml:/opt/elastalert/config.yaml \
|
||
-v /data/feishu-alert/rules:/opt/elastalert/rules \
|
||
-v /etc/localtime:/etc/localtime \
|
||
dengchuanfu/feishualert:v0.1 --verbose
|
||
|
||
|
||
目前我用的filebasts收集日志文件到es中,然后需要你推荐我使用一个告警工具进行告警,要免费的不要用kibana自带的,它需要付费订阅,
|
||
配置告警规则可以有web界面进行配置,将收集的日志行级别是error的告警到飞书机器人群中,有多个filebasts在收集,每个项目日志会在es中生成对应的数据流。我需要你一步步教我配置好,有任何需要的信息跟我拿,比如不同项目的名字、数据流的名字、判断error行的字段名字、判断条件、飞书机器人地址等等;要优化好,比如信息聚合等等。
|
||
目前产生的日志行,其附带的堆栈信息回比较长,需要怎么优化这个告警的显示,我希望发到飞书机器人的信息格式是以飞书卡片形式,标题是项目名,显示的信息是:1、触发时间,2、触发时产生的error行数量,3、具体的error行的内容(这里需要处理有的堆栈信息过长的问题),4、直接跳转到kibana的地址链接。
|
||
|
||
我尝试过以下几个但是没成功:
|
||
1、INFINI Console是我查过的貌似可以实现但是不会配置,
|
||
2、滴滴夜莺可以触发告警,但是无法在消息中直接显示es的行的内容,
|
||
3、ElastAlert2部署麻烦我没成功部署过,
|
||
|
||
以下是我的一些项目信息
|
||
1、项目名称列表以及对应的数据流
|
||
pord-flymoon-task pord01-flymoonlog-pord01-flymoon-task-2025.*
|
||
pord-flymoon-sse pord01-flymoonlog-pord01-flymoon-sse-2025.*
|
||
pord-flymoon-partner pord01-flymoonlog-pord01-flymoon-partner-2025.*
|
||
pord-flymoon-admin pord01-flymoonlog-pord01-flymoon-admin-2025.*
|
||
pord-flymoon_crawlspider pord01-flymoonlog-pord01-flymoon_crawlspider-2025.*
|
||
pord-fly-moon-email_v2 pord01-flymoonlog-pord01-fly-moon-email_v2-2025.*
|
||
out-pord-fly-moon-email_v2 out-241-flymoonlog-pord-fly-moon-email_v2-2025.*
|
||
2、代表error行的字段,在kibana筛选的时候,
|
||
除了pord-fly-moon-email_v2和out-pord-fly-moon-email_v2项目是用message:error
|
||
其他的项目都是parsed_sys_info.log_level:error
|
||
3、机器人地址 https://open.feishu.cn/open-apis/bot/v2/hook/7525a700-f27e-4b3d-88eb-7ceaff9e44e4
|
||
4、Kibana 基础地址
|
||
http://192.168.60.21:5601/
|
||
并且有账号admin,密码123456
|
||
|
||
|
||
xpack.encryptedSavedObjects.encryptionKey: "414d80b26291f9a0017f0a3ff591f22a969bec72f48f66de579ab0dadd1131c4"
|
||
|
||
|
||
|
||
mkdir /data/feishu-alert/config.yaml
|
||
|
||
mkdir /data/feishu-alert/rules
|
||
|
||
vim /data/feishu-alert/rules/alert_error.yaml
|
||
|
||
|
||
|
||
|
||
|
||
rules_folder: /opt/elastalert/rules #指定告警文件存放目录
|
||
|
||
run_every:
|
||
minutes: 1 #ElastAlert查询Elasticsearch的频率,这个单位可以是几周到几秒不等
|
||
|
||
buffer_time:
|
||
minutes: 1 #ElastAlert将缓冲最近的一段时间的结果,以防某些日志源不是实时的
|
||
|
||
es_host: 192.168.1.7 #Elasticsearch主机
|
||
|
||
es_port: 9200 #Elasticsearch端口
|
||
|
||
writeback_index: elastalert_status #es_host上的索引,用于元数据存储。这可以是一个未映射的索引,但建议你运行。设置一个映射。
|
||
|
||
alert_time_limit:
|
||
days: 2 #如果一个警报因某种原因而失败,ElastAlert将重试,直到这个时间段过后
|
||
|
||
|
||
|
||
|
||
#rule name 必须是独一的
|
||
name: alert_error
|
||
|
||
#必须设置的值,多种type可上alert官方文档查询
|
||
type: frequency
|
||
|
||
#指定index,支持正则匹配
|
||
index: easyspeed-cloud-logs-*
|
||
|
||
use_strftime_index: true
|
||
|
||
#时间触发的次数
|
||
num_events: 1
|
||
|
||
#和num_events参数关联,1分钟内出现1次会报警
|
||
timeframe:
|
||
minutes: 1
|
||
|
||
#同一规则的两次警报之间的最短时间。在此时间内发生的任何警报都将被丢弃。默认值为一分钟。
|
||
realert:
|
||
minutes: 1
|
||
|
||
#用来拼配告警规则,elasticsearch 的query语句,支持 AND&OR等
|
||
#我这里是根据业务需求,查询level是ERROR并且type是production且排除message里不带有“开始执行批量/更新轨迹维护关键字
|
||
#简单匹配查询,可以直接query: “level: ERROR“表示level值是ERROR的触发告警通知。
|
||
filter:
|
||
- query:
|
||
query_string:
|
||
query: "NOT message: (开始执行批量 OR 更新轨迹维护) AND level: ERROR AND type: production"
|
||
#只需要的字段 https://elastalert.readthedocs.io/en/latest/ruletypes.html#include
|
||
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
|
||
|
||
#飞书告警方式
|
||
alert:
|
||
- "elastalert_modules.feishu_alert.FeishuAlert"
|
||
|
||
# 飞书机器人接口地址
|
||
feishualert_url: "https://open.feishu.cn/open-apis/bot/v2/hook/"
|
||
|
||
# 飞书机器人id
|
||
feishualert_botid:
|
||
"xxx-xxx-xxx"
|
||
|
||
# 告警标题
|
||
feishualert_title:
|
||
"业务日志ERROR异常"
|
||
|
||
#00-08点不告警
|
||
feishualert_skip:
|
||
start: "00:00:00"
|
||
end: "08:00:00"
|
||
|
||
#告警内容,使用{}可匹配matches
|
||
feishualert_body:
|
||
"
|
||
【告警主题】: {feishualert_title}\n
|
||
【告警时间】: {feishualert_time}\n
|
||
【告警环境】: 【production】\n
|
||
【告警模块】: {source}\n
|
||
【业务索引】: {_index}\n
|
||
【时间戳】: {@timestamp}\n
|
||
【日志级别】: {level}\n
|
||
【spanId】: {spanId}\n
|
||
【traceId】: {traceId}\n
|
||
【host】: {host}\n
|
||
【message】: {message}
|
||
"
|
||
|
||
|
||
|
||
|
||
index: pord01-flymoonlog-pord01-flymoon-task-2025.*
|
||
use_strftime_index: true
|
||
num_events: 1
|
||
timeframe:
|
||
minutes: 1
|
||
realert:
|
||
minutes: 1
|
||
filter:
|
||
- query:
|
||
query_string:
|
||
query: "parsed_sys_info.log_level:error"
|
||
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
|
||
|
||
|
||
|
||
|
||
|
||
(venv) -bash-4.2# ps axjf|grep check_account.py
|
||
15294 8347 8346 15294 pts/5 8346 S+ 0 0:00 \_ grep --color=auto check_account.py
|
||
1 7961 7750 7750 ? -1 S 0 0:00 bash -c source /data/webapps/prod_check_tiktok_account/venv/bin/activate && nohup python /data/webapps/prod_check_tiktok_account/check_account.py > /data/webapps/prod_check_tiktok_account/output.log 2>&1 & echo $! > /data/webapps/prod_check_tiktok_account/check_account.pid // 保存 PID disown // 让?程彻底脱离 echo "? 生产环境?程已启动,PID: $(cat /data/webapps/prod_check_tiktok_account/check_account.pid)"
|
||
7961 7964 7750 7750 ? -1 Sl 0 0:02 \_ python /data/webapps/prod_check_tiktok_account/check_account.py
|
||
|
||
|
||
sit_fly_moon_web|sit_jenniefy_web|sit_scalelink_frontend|test_fly_moon_web|test_deeplink_merchant|test_deeplink_merchant_foreign|test_deeplink_merchant_saas|test_jenniefy_web|test_scalelink_frontend|oversea_test_jenniefy_web
|
||
|
||
deeplink_merchant|deeplink_merchant_saas|fly_moon_web|scalelink_frontend|oversea_jenniefy_web
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{{ if $event.IsRecovered }}
|
||
{{- if ne $event.Cate "host"}}
|
||
{{end}}
|
||
**恢复时间:** {{timeformat $event.LastEvalTime}}
|
||
**告警描述:** **已恢复**
|
||
**当前进程内存占用:**{{xxxx}}
|
||
**当前系统剩余内存:** {{xxxx}}
|
||
{{- else }}
|
||
{{- if ne $event.Cate "host"}}
|
||
{{end}}
|
||
**触发时间:** {{timeformat $event.TriggerTime}}
|
||
**当前进程内存占用:**{{xxxx}}
|
||
**当前系统剩余内存:** {{xxxx}}
|
||
{{if $event.RuleNote }}**详细信息:** **{{$event.RuleNote}}**{{end}}
|
||
{{- end -}}
|
||
|
||
{{$domain := "http://192.168.60.21:5601" }}
|
||
[近1小时日志详情]({{$domain}}{{$labels.url}})
|
||
|
||
|
||
|
||
|
||
process_resident_memory_bytes{instance="prod01-server", job="prod01-server-process-exporter"}
|
||
|
||
process_resident_memory_bytes{job='process-exporter',instance=~'$host',name=~'$process_name'}" | first | value | humanize1024
|
||
|
||
process_resident_memory_bytes{job='prod01-server-process-exporter',instance=~'prod01-server',name=~'prod-flymoon_email_prod'}" | first | value | humanize1024
|
||
|
||
|
||
|
||
|
||
pipeline {
|
||
agent any
|
||
|
||
environment {
|
||
REMOTE_HOST = "43.130.56.138" // 远程服务器 {params.REMOTE_HOST}
|
||
REMOTE_PROJECT_PATH = "/data/webapps/lessie_sourcing_agents" // 远程 Python 项目路径
|
||
VENV_DIR = "/data/webapps/lessie_sourcing_agents/venv" // 远程虚拟环境目录
|
||
CONDA_PATH = "/root/miniconda3/bin/conda" // 修改为实际 Conda 安装路径
|
||
}
|
||
|
||
stages {
|
||
stage('Checkout 代码') {
|
||
steps {
|
||
git branch: "${params.Code_branch}", credentialsId: 'fly_gitlab_auth', url: 'http://172.24.16.20/python/lessie-sourcing-agents.git'
|
||
}
|
||
}
|
||
|
||
stage('进程下线') {
|
||
steps {
|
||
echo("下线")
|
||
sh "ssh ${REMOTE_HOST} 'sh /data/sh/kill_lessie_sourcing_agents.sh'"
|
||
}
|
||
}
|
||
|
||
stage('工程同步') {
|
||
steps {
|
||
sh """
|
||
ssh ${REMOTE_HOST} 'mkdir -p ${REMOTE_PROJECT_PATH}'
|
||
rsync -avz --exclude 'venv' ${WORKSPACE}/ ${REMOTE_HOST}:${REMOTE_PROJECT_PATH}/
|
||
"""
|
||
}
|
||
}
|
||
|
||
|
||
stage('安装依赖') {
|
||
steps {
|
||
sh """
|
||
ssh ${REMOTE_HOST} '
|
||
cd ${REMOTE_PROJECT_PATH} &&
|
||
source ~/.bashrc &&
|
||
conda activate search &&
|
||
source ${VENV_DIR}/bin/activate &&
|
||
pip install --upgrade pip &&
|
||
pip install -r requirements.txt
|
||
'
|
||
"""
|
||
}
|
||
}
|
||
|
||
stage('工程启动') {
|
||
steps {
|
||
echo("启动")
|
||
sh """
|
||
ssh ${REMOTE_HOST} '
|
||
conda activate search
|
||
source ${VENV_DIR}/bin/activate
|
||
TIMESTAMP=\$(date +"%Y%m%d_%H%M%S")
|
||
nohup python /data/webapps/lessie_sourcing_agents/server.py > /data/webapps/lessie_sourcing_agents/logs/lessie_sourcing_agents_\${TIMESTAMP}.log 2>&1 &
|
||
'
|
||
"""
|
||
}
|
||
}
|
||
}
|
||
|
||
post {
|
||
success {
|
||
echo '✅ 部署成功!'
|
||
}
|
||
failure {
|
||
echo '❌ 部署失败,请检查日志!'
|
||
}
|
||
}
|
||
}
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{{/* 通用头部信息:固定展示告警类型、关联机器与规则信息 */}}
|
||
**告警类型:** {{if $event.IsRecovered}}内存告警-恢复通知{{else}}内存告警-触发通知{{end}}
|
||
**关联机器:** {{$event.Attributes.host | default "未知主机"}}
|
||
**告警规则:** {{$event.RuleName}}
|
||
{{if $event.RuleId}}**规则ID:** {{$event.RuleId}}{{end}}
|
||
|
||
{{/* 恢复场景逻辑:展示恢复时间与恢复时内存大小 */}}
|
||
{{ if $event.IsRecovered }}
|
||
{{- if ne $event.Cate "host"}}{{end}}
|
||
**恢复时间:** {{timeformat $event.LastEvalTime "2006-01-02 15:04:05"}}
|
||
**恢复时内存使用:** {{$event.TriggerValue | humanizeSize}}
|
||
**告警描述:** 机器内存使用率已降至阈值以下,告警恢复
|
||
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
|
||
|
||
{{/* 触发场景逻辑:展示触发时间、发送时间与触发时内存大小 */}}
|
||
{{- else }}
|
||
{{- if ne $event.Cate "host"}}{{end}}
|
||
**触发时间:** {{timeformat $event.TriggerTime "2006-01-02 15:04:05"}}
|
||
**发送时间:** {{timeformat $timestamp "2006-01-02 15:04:05"}}
|
||
**触发时内存使用:** {{$event.TriggerValue | humanizeSize}}
|
||
**告警描述:** 机器内存使用率超过设定阈值,触发告警
|
||
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
|
||
{{if $event.Threshold}}**告警阈值:** {{$event.Threshold | humanizeSize}}{{end}}
|
||
{{- end -}}
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
分析一下这个内存变化:(base) [root@prod-lessie-server02 ~]#free -h
|
||
total used free shared buff/cache available
|
||
Mem: 15Gi 15Gi 133Mi 1.0Mi 237Mi 98Mi
|
||
Swap: 8.0Gi 8.0Gi 0.0Ki
|
||
(base) [root@prod-lessie-server02 ~]#ps aux|grep 7001
|
||
root 158009 0.0 0.0 35908 10420 ? S Oct21 0:12 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158011 0.8 90.3 22549000 14317120 ? Sl Oct21 24:39 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158012 0.3 0.7 4111768 117868 ? Sl Oct21 9:26 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158013 0.4 1.5 5852256 243708 ? Sl Oct21 12:08 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158014 0.5 1.6 6056408 260252 ? Sl Oct21 13:51 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158160 0.0 0.2 2404712 39116 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158186 0.0 0.7 2404684 119708 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158218 0.0 0.1 2404716 31388 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158253 0.0 0.2 2406028 35400 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 1028724 0.0 0.0 6408 384 pts/0 D+ 18:37 0:00 grep --color=auto 7001
|
||
(base) [root@prod-lessie-server02 ~]#ps aux|grep 7001
|
||
root 158009 0.0 0.0 35908 11316 ? S Oct21 0:12 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158012 0.3 0.7 4111768 119916 ? Sl Oct21 9:27 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158013 0.4 1.5 5852256 246652 ? Sl Oct21 12:11 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158014 0.5 1.6 6056408 261532 ? Sl Oct21 13:53 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158160 0.0 0.2 2404712 39244 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158186 0.0 0.7 2404684 119324 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158218 0.0 0.2 2404716 31772 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158253 0.0 0.2 2406028 35400 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 1028770 0.0 0.0 6408 2176 pts/0 S+ 18:39 0:00 grep --color=auto 7001
|
||
(base) [root@prod-lessie-server02 ~]#free -h
|
||
total used free shared buff/cache available
|
||
Mem: 15Gi 2.0Gi 12Gi 1.0Mi 961Mi 13Gi
|
||
Swap: 8.0Gi 3.9Gi 4.1Gi
|
||
(base) [root@prod-lessie-server02 ~]#
|
||
(base) [root@prod-lessie-server02 ~]#ps auxf|grep 7001
|
||
root 1029136 0.0 0.0 6408 2304 pts/0 S+ 18:39 0:00 | | \_ grep --color=auto 7001
|
||
root 158009 0.0 0.0 35908 14004 ? S Oct21 0:12 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158012 0.3 0.7 4111768 121836 ? Sl Oct21 9:27 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158160 0.0 0.2 2404712 39244 ? Sl Oct21 0:06 | \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158013 0.4 1.5 5852256 248700 ? Sl Oct21 12:11 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158218 0.0 0.2 2404716 31772 ? Sl Oct21 0:06 | \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158014 0.5 1.6 6056408 264348 ? Sl Oct21 13:53 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158253 0.0 0.2 2406028 35400 ? Sl Oct21 0:06 | \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 1028807 50.5 3.4 3214636 553780 ? Sl 18:39 0:08 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 1029008 0.0 2.9 2408940 466648 ? Sl 18:39 0:00 \_ /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
root 158186 0.0 0.7 2404652 120604 ? Sl Oct21 0:06 /data/webapps/prod_lessie_sourcing_agents/venv/bin/python /data/webapps/prod_lessie_sourcing_agents/venv/bin/gunicorn -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:7001 --timeout 300 dialogue.app:app --max-requests 500 --max-requests-jitter 50
|
||
(base) [root@prod-lessie-server02 ~]#
|
||
(base) [root@prod-lessie-server02 ~]#free -h
|
||
total used free shared buff/cache available
|
||
Mem: 15Gi 2.1Gi 12Gi 1.0Mi 1.0Gi 13Gi
|
||
Swap: 8.0Gi 3.9Gi 4.1Gi 从工作进程上分析,从内存占用大到恢复是什么原因 |