初始化提交
This commit is contained in:
728
1.yml
Normal file
728
1.yml
Normal file
@@ -0,0 +1,728 @@
|
||||
部署到国外43.159.145.241这台机器上的三个疑问:
|
||||
|
||||
一、flymoon-jenniefy.jar相关:
|
||||
1、flymoon-jenniefy.jar运行需要访问redis,那就还要装在43.159.145.241装redis?
|
||||
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:6379
|
||||
|
||||
2、flymoon-jenniefy.jar运行需要访问mysql,是访问腾讯云生产数据库吗?假如是,代码仓库里flymoon-jenniefy项目的分支sit代码是访问redis和mysql都在sit自己身上,部署到国外43.159.145.241这台机器上应该用prod分支?
|
||||
|
||||
3、8070接口使用:?
|
||||
location ^~ /prod-api {
|
||||
proxy_pass http://127.0.0.1:8070;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
}
|
||||
|
||||
|
||||
二、python部署相关,完全就照搬昨晚sit的部署?
|
||||
|
||||
|
||||
三、jenniefy_web ui,域名需要确定,网址是https://www.jennie.deal还是其他,以及域名证书文件;限制国内访问这个得看DNS解析的平台有没有限制
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
滴滴夜莺监控以es为数据源做告警,我的es是从别的机器上收集日志文件存起来的,我希望能在es收集到的日志行有error就进行告警到飞书机器人,用飞书卡片发通知,并能够在卡片通知中显示具体的error行的内容,同时卡片不要太长导致刷屏,该怎么做
|
||||
|
||||
|
||||
|
||||
|
||||
{{ if .IsRecovered }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** S{{.Severity}} Recovered
|
||||
**告警名称:** {{.RuleName}}
|
||||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** S{{.Severity}} Triggered
|
||||
**告警名称:** {{.RuleName}}
|
||||
**触发时间:** {{timeformat .TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{.TriggerValue}}
|
||||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
{{$domain := "http://请联系管理员修改通知模板将域名替换为实际的域名" }}
|
||||
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
{{ if .IsRecovered }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** S{{.Severity}} Recovered
|
||||
**告警名称:** {{.RuleName}}
|
||||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** S{{.Severity}} Triggered
|
||||
**告警名称:** {{.RuleName}}
|
||||
**触发时间:** {{timeformat .TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{.TriggerValue}}
|
||||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||||
**错误日志摘要:** {{.Message | slice 0 100 | default "无错误信息"}}
|
||||
{{- end -}}
|
||||
{{$domain := "http://192.168.1.7:17000/" }}
|
||||
[日志详情]({{$domain}}/alert-his-events/{{.Id}})
|
||||
|
||||
|
||||
|
||||
http://192.168.60.21/app/discover#/?_g=(time:(from:'<start_time>',to:'<end_time>'))&_a=(columns:!(_source),index:'<index_name>',query:(query_string:(query:'message:error')))
|
||||
|
||||
|
||||
(http://192.168.60.21/app/discover#/?_g=(time:(from:'{{timeformat .TriggerTime}}',to:'{{timestamp}}'))&_a=(columns:!(_source),index:'{{ $indexname}}',query:(query_string:(query:'message:error'))))
|
||||
|
||||
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexname }}',query:(query_string:(query:'message:error'))))
|
||||
|
||||
|
||||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||||
{{- $kibanaHost := "http://192.168.60.21:5601" }}
|
||||
{{ if .IsRecovered }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** S{{.Severity}} Recovered
|
||||
**告警名称:** {{.RuleName}}
|
||||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** {{.Severity}}级
|
||||
**告警名称:** {{.RuleName}}
|
||||
**触发时间:** {{timeformat .TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{.TriggerValue}}
|
||||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
{{$domain := "http://192.168.1.7:17000/" }}
|
||||
[事件详情]({{$domain}}/alert-his-events/{{.Id}})
|
||||
|
||||
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
|
||||
|
||||
|
||||
({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
|
||||
|
||||
|
||||
|
||||
|
||||
{{- end -}}
|
||||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||||
{{- $kibanaHost := "http://192.168.60.21:5601" }}
|
||||
{{- $indexName := .TagsMap.indexname }}
|
||||
|
||||
{{ if .IsRecovered }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** S{{.Severity}} Recovered
|
||||
**告警名称:** {{.RuleName}}
|
||||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne .Cate "host"}}
|
||||
**告警集群:** {{.Cluster}}{{end}}
|
||||
**级别状态:** {{.Severity}}级
|
||||
**告警名称:** {{.RuleName}}
|
||||
**触发时间:** {{timeformat .TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{.TriggerValue}}
|
||||
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
[详情]({{$kibanaHost}}/alert-his-events/{{.Id}})
|
||||
|
||||
|
||||
|
||||
|
||||
http://192.168.60.21:5601/app/discover#/?_g=(time:(from:'1736939280000',to:'1736939525000'))&_a=(columns:!(),dataSource:(dataViewId:f101ce47-ebde-4f42-bbd1-dee42c68148e,type:dataView),filters:!(),interval:auto,query:(language:lucene,query:(query_string:(query:'message:info'))),sort:!(!('@timestamp',desc)))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
tags
|
||||
agent.*
|
||||
|
||||
as*
|
||||
client.*
|
||||
cloud.*
|
||||
container.*
|
||||
destination.*
|
||||
dns.*
|
||||
ecs.*
|
||||
error.*
|
||||
event.*
|
||||
file.*
|
||||
geo.*
|
||||
group.*
|
||||
hash.*
|
||||
host.*
|
||||
http.*
|
||||
log.origin*
|
||||
log.syslog*
|
||||
network.*
|
||||
observer*
|
||||
|
||||
organization.*
|
||||
os.*
|
||||
package.*
|
||||
process.*
|
||||
server.*
|
||||
service.*
|
||||
source.*
|
||||
threat.*
|
||||
trace.id*
|
||||
transaction.id*
|
||||
url.*
|
||||
user.*
|
||||
user_agent.*
|
||||
cloud.*
|
||||
host.*
|
||||
kubernetes.*
|
||||
process.owner.*
|
||||
jolokia.*
|
||||
aws*
|
||||
bucket.*
|
||||
object.*
|
||||
fields.*
|
||||
|
||||
|
||||
|
||||
|
||||
帮我处理一些字段:改成通配符,去重,比如:
|
||||
"os.name",
|
||||
"os.platform",
|
||||
"os.version",
|
||||
"package.architecture",
|
||||
"package.checksum",
|
||||
"package.description",
|
||||
|
||||
变成:
|
||||
os.*
|
||||
package.*
|
||||
|
||||
|
||||
docker run -d -p 3030:3030 -p 3333:3333 \
|
||||
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
|
||||
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
|
||||
-v `pwd`/rules:/opt/elastalert/rules \
|
||||
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
|
||||
--network 1panel-network \
|
||||
--name elastalert praecoapp/elastalert-server:latest
|
||||
|
||||
|
||||
docker run -d -p 8080:8080 --network 1panel-network --name praeco \
|
||||
-e ELASTICSEARCH_HOST=http://192.168.60.21:9200 \
|
||||
-e ELASTICSEARCH_USERNAME=admin \
|
||||
-e ELASTICSEARCH_PASSWORD=123456 \
|
||||
-e ELASTALERT_HOST=http://192.168.60.21:3030 \
|
||||
johnsusek/praeco
|
||||
|
||||
echo "slack_webhook_url: 'https://open.feishu.cn/open-apis/bot/v2/hook/8bd6a15d-90f0-4f4f-a1b1-bd105f31ea06'" | sudo tee -a rules/BaseRule.config >/dev/null
|
||||
export PRAECO_ELASTICSEARCH=192.168.1.7
|
||||
|
||||
|
||||
|
||||
{
|
||||
"msg_type": "interactive",
|
||||
"card": {
|
||||
"header": {
|
||||
"title": {
|
||||
"content": "[ INFINI Platform Alerting ]",
|
||||
"tag": "plain_text"
|
||||
},
|
||||
"template":"{{if eq .priority "critical"}}red{{else if eq .priority "high"}}orange{{else if eq .priority "medium"}}yellow{{else if eq .priority "low"}}grey{{else}}blue{{end}}"
|
||||
},
|
||||
"elements": [
|
||||
{
|
||||
"tag": "markdown",
|
||||
"content": "🔥 告警事件 [#{{.event_id}}]({{$.env.INFINI_CONSOLE_ENDPOINT}}/#/alerting/message/{{.event_id}}) 正在进行中\n **{{.title}}**\n 优先级: {{.priority}}\n 事件ID: {{.event_id}}\n 目标: {{.resource_name}}-{{.objects}}\n 触发时间: {{.trigger_at | datetime}}"
|
||||
},
|
||||
{
|
||||
"tag": "hr"
|
||||
},
|
||||
{
|
||||
"tag": "markdown",
|
||||
"content": "**具体错误行内容**: {{ if. hits.hits.0._source.message }}{{.hits.hits.0._source.message }}{{ else }}{{.message | str_replace \"\\n\" \"\\\\n\" }}{{ end }}\n **触发 error 的时间**: {{.trigger_at | datetime }}"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
# 原来的----------------------------
|
||||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||||
{{- $indexName := .TagsMap.indexname }}
|
||||
{{- $query := .TagsMap.query }}
|
||||
|
||||
{{ if .IsRecovered }}
|
||||
{{- if ne .Cate "host"}}
|
||||
{{end}}
|
||||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne .Cate "host"}}
|
||||
{{end}}
|
||||
**触发时间:** {{timeformat .TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{.TriggerValue}}
|
||||
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
{{$domain := "http://192.168.60.21:5601/" }}
|
||||
[日志详情]({{$domain}}{{$event.RunbookUrl}})
|
||||
# 原来的----------------------------
|
||||
|
||||
|
||||
|
||||
{{ if $event.IsRecovered }}
|
||||
{{- if ne $event.Cate "host"}}
|
||||
{{end}}
|
||||
**恢复时间:** {{timeformat $event.LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne $event.Cate "host"}}
|
||||
{{end}}
|
||||
**触发时间:** {{timeformat $event.TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{$event.TriggerValue}}
|
||||
{{if $event.RuleNote }}**对应服务:** **{{$event.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
[日志详情]({{$event.Bshboardurl}})
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
{{- $startTime := .TriggerTime | mul 1000 }}
|
||||
{{- $endTime := .LastEvalTime | mul 1000 }}
|
||||
{{- $indexName := .TagsMap.indexname }}
|
||||
{{- $fieldName := .TagsMap.fieldname }}
|
||||
{{- $query := .TagsMap.query }}
|
||||
|
||||
{{ if .IsRecovered }}
|
||||
{{- if ne .Cate "host"}}
|
||||
{{end}}
|
||||
**恢复时间:** {{timeformat .LastEvalTime}}
|
||||
**告警描述:** **服务已恢复**
|
||||
{{- else }}
|
||||
{{- if ne .Cate "host"}}
|
||||
{{end}}
|
||||
**触发时间:** {{timeformat .TriggerTime}}
|
||||
**发送时间:** {{timestamp}}
|
||||
**触发时值:** {{.TriggerValue}}
|
||||
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
|
||||
{{- $domain := "http://192.168.60.21:5601/" }}
|
||||
{{- $kibanaQuery := printf "%s:error AND @timestamp:[%d TO %d]" $fieldName $startTime $endTime }}
|
||||
{{- $fromTime := timeformat .TriggerTime "2006-01-02T15:04:05.000Z" }}
|
||||
{{- $toTime := timeformat .LastEvalTime "2006-01-02T15:04:05.000Z" }}
|
||||
{{- $kibanaLink := printf "%s/app/discover#/?_a=(index:'%s',query:(language:kuery,query:'%s'))&_g=(time:(from:'%s',to:'%s'))" $domain $indexName $kibanaQuery $fromTime $toTime }}
|
||||
|
||||
[日志详情]({{$kibanaLink}})
|
||||
|
||||
|
||||
|
||||
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
|
||||
|
||||
|
||||
http://192.168.1.7:17000/alert-his-events/
|
||||
|
||||
|
||||
|
||||
申请zoho个人邮箱:https://www.zoho.com.cn/mail/
|
||||
注册过程中需要手机号码接收验证码
|
||||
|
||||
谷歌邮箱:https://workspace.google.com/business/signup/newbusiness?xsell=google_accounts&back=https://accounts.google.com/SignUp?ec=asw-gmail-hero-create2&biz=true&continue=https://mail.google.com/mail/&flowEntry=SignUp&flowName=GlifWebSignIn&service=mail&theme=glif&ec=asw-gmail-hero-create2&source=gafb-gmail-hero-zh-CN&hl=zh-CN&ga_region=japac&ga_country=zh-CN&ga_lang=zh-CN
|
||||
注册完后续需要自行绑定手机号、辅助邮箱以便在失去账号访问权限时能够重新登录账号
|
||||
开启两步验证需添加手机号收取验证码作为验证
|
||||
|
||||
|
||||
mylidamin@gmail.com aAggyxmm
|
||||
|
||||
|
||||
docker run -d --name my-nginx -p 9200:9200 -v /data/my-nginx/conf:/etc/nginx/conf.d nginx
|
||||
|
||||
|
||||
curl -u admin:123456 http://106.53.194.199:9200/
|
||||
|
||||
|
||||
|
||||
docker run -d \
|
||||
--name elastalert2 \
|
||||
--network=1panel-network \
|
||||
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
||||
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
||||
-v /data/elastalert2/data:/opt/elastalert/data \
|
||||
-e "ES_USERNAME=admin" \
|
||||
-e "ES_PASSWORD=123456" \
|
||||
jertel/elastalert2:latest
|
||||
|
||||
|
||||
|
||||
|
||||
docker run -d \
|
||||
--name elastalert2 \
|
||||
--network=1panel-network \
|
||||
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
||||
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
||||
-v /data/elastalert2/data:/opt/elastalert/data \
|
||||
-v /data/elastalert2/config/smtp_auth.yaml:/opt/elastalert/smtp_auth.yaml \
|
||||
-e "ES_USERNAME=admin" \
|
||||
-e "ES_PASSWORD=123456" \
|
||||
jertel/elastalert2:latest --verbose
|
||||
|
||||
|
||||
|
||||
docker run -d \
|
||||
--name elastalert2-1 \
|
||||
--network=1panel-network \
|
||||
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
||||
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
||||
-v /data/elastalert2/data:/opt/elastalert/data \
|
||||
-v /data/elastalert2/elastalert_modules:/opt/elastalert/elastalert_modules \
|
||||
-e "ES_USERNAME=admin" \
|
||||
-e "ES_PASSWORD=123456" \
|
||||
jertel/elastalert2:latest --verbose
|
||||
|
||||
docker run -d \
|
||||
--name elastalert2 \
|
||||
--network=1panel-network \
|
||||
-v /opt/elastalert2/config/config.yaml:/opt/elastalert2/config.yaml \
|
||||
-v /opt/elastalert2/rules:/opt/elastalert2/rules \
|
||||
-e "TZ=Asia/Shanghai" \
|
||||
-e "ES_USERNAME=admin" \
|
||||
-e "ES_PASSWORD=123456" \
|
||||
jertel/elastalert2:latest
|
||||
|
||||
docker run -d --name elastalert --restart=always \
|
||||
-v /data/feishu-alert/config.yaml:/opt/elastalert/config.yaml \
|
||||
-v /data/feishu-alert/rules:/opt/elastalert/rules \
|
||||
-v /etc/localtime:/etc/localtime \
|
||||
dengchuanfu/feishualert:v0.1 --verbose
|
||||
|
||||
|
||||
目前我用的filebasts收集日志文件到es中,然后需要你推荐我使用一个告警工具进行告警,要免费的不要用kibana自带的,它需要付费订阅,
|
||||
配置告警规则可以有web界面进行配置,将收集的日志行级别是error的告警到飞书机器人群中,有多个filebasts在收集,每个项目日志会在es中生成对应的数据流。我需要你一步步教我配置好,有任何需要的信息跟我拿,比如不同项目的名字、数据流的名字、判断error行的字段名字、判断条件、飞书机器人地址等等;要优化好,比如信息聚合等等。
|
||||
目前产生的日志行,其附带的堆栈信息回比较长,需要怎么优化这个告警的显示,我希望发到飞书机器人的信息格式是以飞书卡片形式,标题是项目名,显示的信息是:1、触发时间,2、触发时产生的error行数量,3、具体的error行的内容(这里需要处理有的堆栈信息过长的问题),4、直接跳转到kibana的地址链接。
|
||||
|
||||
我尝试过以下几个但是没成功:
|
||||
1、INFINI Console是我查过的貌似可以实现但是不会配置,
|
||||
2、滴滴夜莺可以触发告警,但是无法在消息中直接显示es的行的内容,
|
||||
3、ElastAlert2部署麻烦我没成功部署过,
|
||||
|
||||
以下是我的一些项目信息
|
||||
1、项目名称列表以及对应的数据流
|
||||
pord-flymoon-task pord01-flymoonlog-pord01-flymoon-task-2025.*
|
||||
pord-flymoon-sse pord01-flymoonlog-pord01-flymoon-sse-2025.*
|
||||
pord-flymoon-partner pord01-flymoonlog-pord01-flymoon-partner-2025.*
|
||||
pord-flymoon-admin pord01-flymoonlog-pord01-flymoon-admin-2025.*
|
||||
pord-flymoon_crawlspider pord01-flymoonlog-pord01-flymoon_crawlspider-2025.*
|
||||
pord-fly-moon-email_v2 pord01-flymoonlog-pord01-fly-moon-email_v2-2025.*
|
||||
out-pord-fly-moon-email_v2 out-241-flymoonlog-pord-fly-moon-email_v2-2025.*
|
||||
2、代表error行的字段,在kibana筛选的时候,
|
||||
除了pord-fly-moon-email_v2和out-pord-fly-moon-email_v2项目是用message:error
|
||||
其他的项目都是parsed_sys_info.log_level:error
|
||||
3、机器人地址 https://open.feishu.cn/open-apis/bot/v2/hook/7525a700-f27e-4b3d-88eb-7ceaff9e44e4
|
||||
4、Kibana 基础地址
|
||||
http://192.168.60.21:5601/
|
||||
并且有账号admin,密码123456
|
||||
|
||||
|
||||
xpack.encryptedSavedObjects.encryptionKey: "414d80b26291f9a0017f0a3ff591f22a969bec72f48f66de579ab0dadd1131c4"
|
||||
|
||||
|
||||
|
||||
mkdir /data/feishu-alert/config.yaml
|
||||
|
||||
mkdir /data/feishu-alert/rules
|
||||
|
||||
vim /data/feishu-alert/rules/alert_error.yaml
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
rules_folder: /opt/elastalert/rules #指定告警文件存放目录
|
||||
|
||||
run_every:
|
||||
minutes: 1 #ElastAlert查询Elasticsearch的频率,这个单位可以是几周到几秒不等
|
||||
|
||||
buffer_time:
|
||||
minutes: 1 #ElastAlert将缓冲最近的一段时间的结果,以防某些日志源不是实时的
|
||||
|
||||
es_host: 192.168.1.7 #Elasticsearch主机
|
||||
|
||||
es_port: 9200 #Elasticsearch端口
|
||||
|
||||
writeback_index: elastalert_status #es_host上的索引,用于元数据存储。这可以是一个未映射的索引,但建议你运行。设置一个映射。
|
||||
|
||||
alert_time_limit:
|
||||
days: 2 #如果一个警报因某种原因而失败,ElastAlert将重试,直到这个时间段过后
|
||||
|
||||
|
||||
|
||||
|
||||
#rule name 必须是独一的
|
||||
name: alert_error
|
||||
|
||||
#必须设置的值,多种type可上alert官方文档查询
|
||||
type: frequency
|
||||
|
||||
#指定index,支持正则匹配
|
||||
index: easyspeed-cloud-logs-*
|
||||
|
||||
use_strftime_index: true
|
||||
|
||||
#时间触发的次数
|
||||
num_events: 1
|
||||
|
||||
#和num_events参数关联,1分钟内出现1次会报警
|
||||
timeframe:
|
||||
minutes: 1
|
||||
|
||||
#同一规则的两次警报之间的最短时间。在此时间内发生的任何警报都将被丢弃。默认值为一分钟。
|
||||
realert:
|
||||
minutes: 1
|
||||
|
||||
#用来拼配告警规则,elasticsearch 的query语句,支持 AND&OR等
|
||||
#我这里是根据业务需求,查询level是ERROR并且type是production且排除message里不带有“开始执行批量/更新轨迹维护关键字
|
||||
#简单匹配查询,可以直接query: “level: ERROR“表示level值是ERROR的触发告警通知。
|
||||
filter:
|
||||
- query:
|
||||
query_string:
|
||||
query: "NOT message: (开始执行批量 OR 更新轨迹维护) AND level: ERROR AND type: production"
|
||||
#只需要的字段 https://elastalert.readthedocs.io/en/latest/ruletypes.html#include
|
||||
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
|
||||
|
||||
#飞书告警方式
|
||||
alert:
|
||||
- "elastalert_modules.feishu_alert.FeishuAlert"
|
||||
|
||||
# 飞书机器人接口地址
|
||||
feishualert_url: "https://open.feishu.cn/open-apis/bot/v2/hook/"
|
||||
|
||||
# 飞书机器人id
|
||||
feishualert_botid:
|
||||
"xxx-xxx-xxx"
|
||||
|
||||
# 告警标题
|
||||
feishualert_title:
|
||||
"业务日志ERROR异常"
|
||||
|
||||
#00-08点不告警
|
||||
feishualert_skip:
|
||||
start: "00:00:00"
|
||||
end: "08:00:00"
|
||||
|
||||
#告警内容,使用{}可匹配matches
|
||||
feishualert_body:
|
||||
"
|
||||
【告警主题】: {feishualert_title}\n
|
||||
【告警时间】: {feishualert_time}\n
|
||||
【告警环境】: 【production】\n
|
||||
【告警模块】: {source}\n
|
||||
【业务索引】: {_index}\n
|
||||
【时间戳】: {@timestamp}\n
|
||||
【日志级别】: {level}\n
|
||||
【spanId】: {spanId}\n
|
||||
【traceId】: {traceId}\n
|
||||
【host】: {host}\n
|
||||
【message】: {message}
|
||||
"
|
||||
|
||||
|
||||
|
||||
|
||||
index: pord01-flymoonlog-pord01-flymoon-task-2025.*
|
||||
use_strftime_index: true
|
||||
num_events: 1
|
||||
timeframe:
|
||||
minutes: 1
|
||||
realert:
|
||||
minutes: 1
|
||||
filter:
|
||||
- query:
|
||||
query_string:
|
||||
query: "parsed_sys_info.log_level:error"
|
||||
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
(venv) -bash-4.2# ps axjf|grep check_account.py
|
||||
15294 8347 8346 15294 pts/5 8346 S+ 0 0:00 \_ grep --color=auto check_account.py
|
||||
1 7961 7750 7750 ? -1 S 0 0:00 bash -c source /data/webapps/prod_check_tiktok_account/venv/bin/activate && nohup python /data/webapps/prod_check_tiktok_account/check_account.py > /data/webapps/prod_check_tiktok_account/output.log 2>&1 & echo $! > /data/webapps/prod_check_tiktok_account/check_account.pid // 保存 PID disown // 让?程彻底脱离 echo "? 生产环境?程已启动,PID: $(cat /data/webapps/prod_check_tiktok_account/check_account.pid)"
|
||||
7961 7964 7750 7750 ? -1 Sl 0 0:02 \_ python /data/webapps/prod_check_tiktok_account/check_account.py
|
||||
|
||||
|
||||
sit_fly_moon_web|sit_jenniefy_web|sit_scalelink_frontend|test_fly_moon_web|test_deeplink_merchant|test_deeplink_merchant_foreign|test_deeplink_merchant_saas|test_jenniefy_web|test_scalelink_frontend|oversea_test_jenniefy_web
|
||||
|
||||
deeplink_merchant|deeplink_merchant_saas|fly_moon_web|scalelink_frontend|oversea_jenniefy_web
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
{{ if $event.IsRecovered }}
|
||||
{{- if ne $event.Cate "host"}}
|
||||
{{end}}
|
||||
**恢复时间:** {{timeformat $event.LastEvalTime}}
|
||||
**告警描述:** **已恢复**
|
||||
**当前进程内存占用:**{{xxxx}}
|
||||
**当前系统剩余内存:** {{xxxx}}
|
||||
{{- else }}
|
||||
{{- if ne $event.Cate "host"}}
|
||||
{{end}}
|
||||
**触发时间:** {{timeformat $event.TriggerTime}}
|
||||
**当前进程内存占用:**{{xxxx}}
|
||||
**当前系统剩余内存:** {{xxxx}}
|
||||
{{if $event.RuleNote }}**详细信息:** **{{$event.RuleNote}}**{{end}}
|
||||
{{- end -}}
|
||||
|
||||
{{$domain := "http://192.168.60.21:5601" }}
|
||||
[近1小时日志详情]({{$domain}}{{$labels.url}})
|
||||
|
||||
|
||||
|
||||
|
||||
process_resident_memory_bytes{instance="prod01-server", job="prod01-server-process-exporter"}
|
||||
|
||||
process_resident_memory_bytes{job='process-exporter',instance=~'$host',name=~'$process_name'}" | first | value | humanize1024
|
||||
|
||||
process_resident_memory_bytes{job='prod01-server-process-exporter',instance=~'prod01-server',name=~'prod-flymoon_email_prod'}" | first | value | humanize1024
|
||||
|
||||
|
||||
|
||||
|
||||
pipeline {
|
||||
agent any
|
||||
|
||||
environment {
|
||||
REMOTE_HOST = "43.130.56.138" // 远程服务器 {params.REMOTE_HOST}
|
||||
REMOTE_PROJECT_PATH = "/data/webapps/lessie_sourcing_agents" // 远程 Python 项目路径
|
||||
VENV_DIR = "/data/webapps/lessie_sourcing_agents/venv" // 远程虚拟环境目录
|
||||
CONDA_PATH = "/root/miniconda3/bin/conda" // 修改为实际 Conda 安装路径
|
||||
}
|
||||
|
||||
stages {
|
||||
stage('Checkout 代码') {
|
||||
steps {
|
||||
git branch: "${params.Code_branch}", credentialsId: 'fly_gitlab_auth', url: 'http://172.24.16.20/python/lessie-sourcing-agents.git'
|
||||
}
|
||||
}
|
||||
|
||||
stage('进程下线') {
|
||||
steps {
|
||||
echo("下线")
|
||||
sh "ssh ${REMOTE_HOST} 'sh /data/sh/kill_lessie_sourcing_agents.sh'"
|
||||
}
|
||||
}
|
||||
|
||||
stage('工程同步') {
|
||||
steps {
|
||||
sh """
|
||||
ssh ${REMOTE_HOST} 'mkdir -p ${REMOTE_PROJECT_PATH}'
|
||||
rsync -avz --exclude 'venv' ${WORKSPACE}/ ${REMOTE_HOST}:${REMOTE_PROJECT_PATH}/
|
||||
"""
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
stage('安装依赖') {
|
||||
steps {
|
||||
sh """
|
||||
ssh ${REMOTE_HOST} '
|
||||
cd ${REMOTE_PROJECT_PATH} &&
|
||||
source ~/.bashrc &&
|
||||
conda activate search &&
|
||||
source ${VENV_DIR}/bin/activate &&
|
||||
pip install --upgrade pip &&
|
||||
pip install -r requirements.txt
|
||||
'
|
||||
"""
|
||||
}
|
||||
}
|
||||
|
||||
stage('工程启动') {
|
||||
steps {
|
||||
echo("启动")
|
||||
sh """
|
||||
ssh ${REMOTE_HOST} '
|
||||
conda activate search
|
||||
source ${VENV_DIR}/bin/activate
|
||||
TIMESTAMP=\$(date +"%Y%m%d_%H%M%S")
|
||||
nohup python /data/webapps/lessie_sourcing_agents/server.py > /data/webapps/lessie_sourcing_agents/logs/lessie_sourcing_agents_\${TIMESTAMP}.log 2>&1 &
|
||||
'
|
||||
"""
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
post {
|
||||
success {
|
||||
echo '✅ 部署成功!'
|
||||
}
|
||||
failure {
|
||||
echo '❌ 部署失败,请检查日志!'
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
{{/* 通用头部信息:固定展示告警类型、关联机器与规则信息 */}}
|
||||
**告警类型:** {{if $event.IsRecovered}}内存告警-恢复通知{{else}}内存告警-触发通知{{end}}
|
||||
**关联机器:** {{$event.Attributes.host | default "未知主机"}}
|
||||
**告警规则:** {{$event.RuleName}}
|
||||
{{if $event.RuleId}}**规则ID:** {{$event.RuleId}}{{end}}
|
||||
|
||||
{{/* 恢复场景逻辑:展示恢复时间与恢复时内存大小 */}}
|
||||
{{ if $event.IsRecovered }}
|
||||
{{- if ne $event.Cate "host"}}{{end}}
|
||||
**恢复时间:** {{timeformat $event.LastEvalTime "2006-01-02 15:04:05"}}
|
||||
**恢复时内存使用:** {{$event.TriggerValue | humanizeSize}}
|
||||
**告警描述:** 机器内存使用率已降至阈值以下,告警恢复
|
||||
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
|
||||
|
||||
{{/* 触发场景逻辑:展示触发时间、发送时间与触发时内存大小 */}}
|
||||
{{- else }}
|
||||
{{- if ne $event.Cate "host"}}{{end}}
|
||||
**触发时间:** {{timeformat $event.TriggerTime "2006-01-02 15:04:05"}}
|
||||
**发送时间:** {{timeformat $timestamp "2006-01-02 15:04:05"}}
|
||||
**触发时内存使用:** {{$event.TriggerValue | humanizeSize}}
|
||||
**告警描述:** 机器内存使用率超过设定阈值,触发告警
|
||||
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
|
||||
{{if $event.Threshold}}**告警阈值:** {{$event.Threshold | humanizeSize}}{{end}}
|
||||
{{- end -}}
|
||||
Reference in New Issue
Block a user