728 lines
25 KiB
YAML
728 lines
25 KiB
YAML
|
|
部署到国外43.159.145.241这台机器上的三个疑问:
|
|||
|
|
|
|||
|
|
一、flymoon-jenniefy.jar相关:
|
|||
|
|
1、flymoon-jenniefy.jar运行需要访问redis,那就还要装在43.159.145.241装redis?
|
|||
|
|
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:6379
|
|||
|
|
|
|||
|
|
2、flymoon-jenniefy.jar运行需要访问mysql,是访问腾讯云生产数据库吗?假如是,代码仓库里flymoon-jenniefy项目的分支sit代码是访问redis和mysql都在sit自己身上,部署到国外43.159.145.241这台机器上应该用prod分支?
|
|||
|
|
|
|||
|
|
3、8070接口使用:?
|
|||
|
|
location ^~ /prod-api {
|
|||
|
|
proxy_pass http://127.0.0.1:8070;
|
|||
|
|
proxy_set_header Host $host;
|
|||
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|||
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
|
|||
|
|
二、python部署相关,完全就照搬昨晚sit的部署?
|
|||
|
|
|
|||
|
|
|
|||
|
|
三、jenniefy_web ui,域名需要确定,网址是https://www.jennie.deal还是其他,以及域名证书文件;限制国内访问这个得看DNS解析的平台有没有限制
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
滴滴夜莺监控以es为数据源做告警,我的es是从别的机器上收集日志文件存起来的,我希望能在es收集到的日志行有error就进行告警到飞书机器人,用飞书卡片发通知,并能够在卡片通知中显示具体的error行的内容,同时卡片不要太长导致刷屏,该怎么做
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{ if .IsRecovered }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** S{{.Severity}} Recovered
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**恢复时间:** {{timeformat .LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** S{{.Severity}} Triggered
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**触发时间:** {{timeformat .TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{.TriggerValue}}
|
|||
|
|
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
{{$domain := "http://请联系管理员修改通知模板将域名替换为实际的域名" }}
|
|||
|
|
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{ if .IsRecovered }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** S{{.Severity}} Recovered
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**恢复时间:** {{timeformat .LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** S{{.Severity}} Triggered
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**触发时间:** {{timeformat .TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{.TriggerValue}}
|
|||
|
|
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
|||
|
|
**错误日志摘要:** {{.Message | slice 0 100 | default "无错误信息"}}
|
|||
|
|
{{- end -}}
|
|||
|
|
{{$domain := "http://192.168.1.7:17000/" }}
|
|||
|
|
[日志详情]({{$domain}}/alert-his-events/{{.Id}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
http://192.168.60.21/app/discover#/?_g=(time:(from:'<start_time>',to:'<end_time>'))&_a=(columns:!(_source),index:'<index_name>',query:(query_string:(query:'message:error')))
|
|||
|
|
|
|||
|
|
|
|||
|
|
(http://192.168.60.21/app/discover#/?_g=(time:(from:'{{timeformat .TriggerTime}}',to:'{{timestamp}}'))&_a=(columns:!(_source),index:'{{ $indexname}}',query:(query_string:(query:'message:error'))))
|
|||
|
|
|
|||
|
|
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexname }}',query:(query_string:(query:'message:error'))))
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{- $startTime := .TriggerTime | mul 1000 }}
|
|||
|
|
{{- $endTime := .LastEvalTime | mul 1000 }}
|
|||
|
|
{{- $kibanaHost := "http://192.168.60.21:5601" }}
|
|||
|
|
{{ if .IsRecovered }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** S{{.Severity}} Recovered
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**恢复时间:** {{timeformat .LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** {{.Severity}}级
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**触发时间:** {{timeformat .TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{.TriggerValue}}
|
|||
|
|
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
{{$domain := "http://192.168.1.7:17000/" }}
|
|||
|
|
[事件详情]({{$domain}}/alert-his-events/{{.Id}})
|
|||
|
|
|
|||
|
|
[查看 Kibana 日志 (message:error)]({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
|
|||
|
|
|
|||
|
|
|
|||
|
|
({{ $kibanaHost }}/app/discover#/?_g=(time:(from:'{{ $startTime }}',to:'{{ $endTime }}'))&_a=(columns:!(_source),index:'{{ $indexName }}',query:(query_string:(query:'message:error'))))
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{- end -}}
|
|||
|
|
{{- $startTime := .TriggerTime | mul 1000 }}
|
|||
|
|
{{- $endTime := .LastEvalTime | mul 1000 }}
|
|||
|
|
{{- $kibanaHost := "http://192.168.60.21:5601" }}
|
|||
|
|
{{- $indexName := .TagsMap.indexname }}
|
|||
|
|
|
|||
|
|
{{ if .IsRecovered }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** S{{.Severity}} Recovered
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**恢复时间:** {{timeformat .LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
**告警集群:** {{.Cluster}}{{end}}
|
|||
|
|
**级别状态:** {{.Severity}}级
|
|||
|
|
**告警名称:** {{.RuleName}}
|
|||
|
|
**触发时间:** {{timeformat .TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{.TriggerValue}}
|
|||
|
|
{{if .RuleNote }}**告警描述:** **{{.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
[详情]({{$kibanaHost}}/alert-his-events/{{.Id}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
http://192.168.60.21:5601/app/discover#/?_g=(time:(from:'1736939280000',to:'1736939525000'))&_a=(columns:!(),dataSource:(dataViewId:f101ce47-ebde-4f42-bbd1-dee42c68148e,type:dataView),filters:!(),interval:auto,query:(language:lucene,query:(query_string:(query:'message:info'))),sort:!(!('@timestamp',desc)))
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
tags
|
|||
|
|
agent.*
|
|||
|
|
|
|||
|
|
as*
|
|||
|
|
client.*
|
|||
|
|
cloud.*
|
|||
|
|
container.*
|
|||
|
|
destination.*
|
|||
|
|
dns.*
|
|||
|
|
ecs.*
|
|||
|
|
error.*
|
|||
|
|
event.*
|
|||
|
|
file.*
|
|||
|
|
geo.*
|
|||
|
|
group.*
|
|||
|
|
hash.*
|
|||
|
|
host.*
|
|||
|
|
http.*
|
|||
|
|
log.origin*
|
|||
|
|
log.syslog*
|
|||
|
|
network.*
|
|||
|
|
observer*
|
|||
|
|
|
|||
|
|
organization.*
|
|||
|
|
os.*
|
|||
|
|
package.*
|
|||
|
|
process.*
|
|||
|
|
server.*
|
|||
|
|
service.*
|
|||
|
|
source.*
|
|||
|
|
threat.*
|
|||
|
|
trace.id*
|
|||
|
|
transaction.id*
|
|||
|
|
url.*
|
|||
|
|
user.*
|
|||
|
|
user_agent.*
|
|||
|
|
cloud.*
|
|||
|
|
host.*
|
|||
|
|
kubernetes.*
|
|||
|
|
process.owner.*
|
|||
|
|
jolokia.*
|
|||
|
|
aws*
|
|||
|
|
bucket.*
|
|||
|
|
object.*
|
|||
|
|
fields.*
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
帮我处理一些字段:改成通配符,去重,比如:
|
|||
|
|
"os.name",
|
|||
|
|
"os.platform",
|
|||
|
|
"os.version",
|
|||
|
|
"package.architecture",
|
|||
|
|
"package.checksum",
|
|||
|
|
"package.description",
|
|||
|
|
|
|||
|
|
变成:
|
|||
|
|
os.*
|
|||
|
|
package.*
|
|||
|
|
|
|||
|
|
|
|||
|
|
docker run -d -p 3030:3030 -p 3333:3333 \
|
|||
|
|
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
|
|||
|
|
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
|
|||
|
|
-v `pwd`/rules:/opt/elastalert/rules \
|
|||
|
|
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
|
|||
|
|
--network 1panel-network \
|
|||
|
|
--name elastalert praecoapp/elastalert-server:latest
|
|||
|
|
|
|||
|
|
|
|||
|
|
docker run -d -p 8080:8080 --network 1panel-network --name praeco \
|
|||
|
|
-e ELASTICSEARCH_HOST=http://192.168.60.21:9200 \
|
|||
|
|
-e ELASTICSEARCH_USERNAME=admin \
|
|||
|
|
-e ELASTICSEARCH_PASSWORD=123456 \
|
|||
|
|
-e ELASTALERT_HOST=http://192.168.60.21:3030 \
|
|||
|
|
johnsusek/praeco
|
|||
|
|
|
|||
|
|
echo "slack_webhook_url: 'https://open.feishu.cn/open-apis/bot/v2/hook/8bd6a15d-90f0-4f4f-a1b1-bd105f31ea06'" | sudo tee -a rules/BaseRule.config >/dev/null
|
|||
|
|
export PRAECO_ELASTICSEARCH=192.168.1.7
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"msg_type": "interactive",
|
|||
|
|
"card": {
|
|||
|
|
"header": {
|
|||
|
|
"title": {
|
|||
|
|
"content": "[ INFINI Platform Alerting ]",
|
|||
|
|
"tag": "plain_text"
|
|||
|
|
},
|
|||
|
|
"template":"{{if eq .priority "critical"}}red{{else if eq .priority "high"}}orange{{else if eq .priority "medium"}}yellow{{else if eq .priority "low"}}grey{{else}}blue{{end}}"
|
|||
|
|
},
|
|||
|
|
"elements": [
|
|||
|
|
{
|
|||
|
|
"tag": "markdown",
|
|||
|
|
"content": "🔥 告警事件 [#{{.event_id}}]({{$.env.INFINI_CONSOLE_ENDPOINT}}/#/alerting/message/{{.event_id}}) 正在进行中\n **{{.title}}**\n 优先级: {{.priority}}\n 事件ID: {{.event_id}}\n 目标: {{.resource_name}}-{{.objects}}\n 触发时间: {{.trigger_at | datetime}}"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"tag": "hr"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"tag": "markdown",
|
|||
|
|
"content": "**具体错误行内容**: {{ if. hits.hits.0._source.message }}{{.hits.hits.0._source.message }}{{ else }}{{.message | str_replace \"\\n\" \"\\\\n\" }}{{ end }}\n **触发 error 的时间**: {{.trigger_at | datetime }}"
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
# 原来的----------------------------
|
|||
|
|
{{- $startTime := .TriggerTime | mul 1000 }}
|
|||
|
|
{{- $endTime := .LastEvalTime | mul 1000 }}
|
|||
|
|
{{- $indexName := .TagsMap.indexname }}
|
|||
|
|
{{- $query := .TagsMap.query }}
|
|||
|
|
|
|||
|
|
{{ if .IsRecovered }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**恢复时间:** {{timeformat .LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**触发时间:** {{timeformat .TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{.TriggerValue}}
|
|||
|
|
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
{{$domain := "http://192.168.60.21:5601/" }}
|
|||
|
|
[日志详情]({{$domain}}{{$event.RunbookUrl}})
|
|||
|
|
# 原来的----------------------------
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{ if $event.IsRecovered }}
|
|||
|
|
{{- if ne $event.Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**恢复时间:** {{timeformat $event.LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne $event.Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**触发时间:** {{timeformat $event.TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{$event.TriggerValue}}
|
|||
|
|
{{if $event.RuleNote }}**对应服务:** **{{$event.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
[日志详情]({{$event.Bshboardurl}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{- $startTime := .TriggerTime | mul 1000 }}
|
|||
|
|
{{- $endTime := .LastEvalTime | mul 1000 }}
|
|||
|
|
{{- $indexName := .TagsMap.indexname }}
|
|||
|
|
{{- $fieldName := .TagsMap.fieldname }}
|
|||
|
|
{{- $query := .TagsMap.query }}
|
|||
|
|
|
|||
|
|
{{ if .IsRecovered }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**恢复时间:** {{timeformat .LastEvalTime}}
|
|||
|
|
**告警描述:** **服务已恢复**
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne .Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**触发时间:** {{timeformat .TriggerTime}}
|
|||
|
|
**发送时间:** {{timestamp}}
|
|||
|
|
**触发时值:** {{.TriggerValue}}
|
|||
|
|
{{if .RuleNote }}**告警服务:** **{{.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
|
|||
|
|
{{- $domain := "http://192.168.60.21:5601/" }}
|
|||
|
|
{{- $kibanaQuery := printf "%s:error AND @timestamp:[%d TO %d]" $fieldName $startTime $endTime }}
|
|||
|
|
{{- $fromTime := timeformat .TriggerTime "2006-01-02T15:04:05.000Z" }}
|
|||
|
|
{{- $toTime := timeformat .LastEvalTime "2006-01-02T15:04:05.000Z" }}
|
|||
|
|
{{- $kibanaLink := printf "%s/app/discover#/?_a=(index:'%s',query:(language:kuery,query:'%s'))&_g=(time:(from:'%s',to:'%s'))" $domain $indexName $kibanaQuery $fromTime $toTime }}
|
|||
|
|
|
|||
|
|
[日志详情]({{$kibanaLink}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
[事件详情]({{$domain}}/alert-his-events/{{.Id}})|[屏蔽1小时]({{$domain}}/alert-mutes/add?busiGroup={{.GroupId}}&cate={{.Cate}}&datasource_ids={{.DatasourceId}}&prod={{.RuleProd}}{{range $key, $value := .TagsMap}}&tags={{$key}}%3D{{$value}}{{end}})|[查看曲线]({{$domain}}/metric/explorer?data_source_id={{.DatasourceId}}&data_source_name=prometheus&mode=graph&prom_ql={{.PromQl|escape}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
http://192.168.1.7:17000/alert-his-events/
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
申请zoho个人邮箱:https://www.zoho.com.cn/mail/
|
|||
|
|
注册过程中需要手机号码接收验证码
|
|||
|
|
|
|||
|
|
谷歌邮箱:https://workspace.google.com/business/signup/newbusiness?xsell=google_accounts&back=https://accounts.google.com/SignUp?ec=asw-gmail-hero-create2&biz=true&continue=https://mail.google.com/mail/&flowEntry=SignUp&flowName=GlifWebSignIn&service=mail&theme=glif&ec=asw-gmail-hero-create2&source=gafb-gmail-hero-zh-CN&hl=zh-CN&ga_region=japac&ga_country=zh-CN&ga_lang=zh-CN
|
|||
|
|
注册完后续需要自行绑定手机号、辅助邮箱以便在失去账号访问权限时能够重新登录账号
|
|||
|
|
开启两步验证需添加手机号收取验证码作为验证
|
|||
|
|
|
|||
|
|
|
|||
|
|
mylidamin@gmail.com aAggyxmm
|
|||
|
|
|
|||
|
|
|
|||
|
|
docker run -d --name my-nginx -p 9200:9200 -v /data/my-nginx/conf:/etc/nginx/conf.d nginx
|
|||
|
|
|
|||
|
|
|
|||
|
|
curl -u admin:123456 http://106.53.194.199:9200/
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
docker run -d \
|
|||
|
|
--name elastalert2 \
|
|||
|
|
--network=1panel-network \
|
|||
|
|
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
|||
|
|
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
|||
|
|
-v /data/elastalert2/data:/opt/elastalert/data \
|
|||
|
|
-e "ES_USERNAME=admin" \
|
|||
|
|
-e "ES_PASSWORD=123456" \
|
|||
|
|
jertel/elastalert2:latest
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
docker run -d \
|
|||
|
|
--name elastalert2 \
|
|||
|
|
--network=1panel-network \
|
|||
|
|
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
|||
|
|
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
|||
|
|
-v /data/elastalert2/data:/opt/elastalert/data \
|
|||
|
|
-v /data/elastalert2/config/smtp_auth.yaml:/opt/elastalert/smtp_auth.yaml \
|
|||
|
|
-e "ES_USERNAME=admin" \
|
|||
|
|
-e "ES_PASSWORD=123456" \
|
|||
|
|
jertel/elastalert2:latest --verbose
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
docker run -d \
|
|||
|
|
--name elastalert2-1 \
|
|||
|
|
--network=1panel-network \
|
|||
|
|
-v /data/elastalert2/config/config.yaml:/opt/elastalert/config.yaml \
|
|||
|
|
-v /data/elastalert2/rules:/opt/elastalert/rules \
|
|||
|
|
-v /data/elastalert2/data:/opt/elastalert/data \
|
|||
|
|
-v /data/elastalert2/elastalert_modules:/opt/elastalert/elastalert_modules \
|
|||
|
|
-e "ES_USERNAME=admin" \
|
|||
|
|
-e "ES_PASSWORD=123456" \
|
|||
|
|
jertel/elastalert2:latest --verbose
|
|||
|
|
|
|||
|
|
docker run -d \
|
|||
|
|
--name elastalert2 \
|
|||
|
|
--network=1panel-network \
|
|||
|
|
-v /opt/elastalert2/config/config.yaml:/opt/elastalert2/config.yaml \
|
|||
|
|
-v /opt/elastalert2/rules:/opt/elastalert2/rules \
|
|||
|
|
-e "TZ=Asia/Shanghai" \
|
|||
|
|
-e "ES_USERNAME=admin" \
|
|||
|
|
-e "ES_PASSWORD=123456" \
|
|||
|
|
jertel/elastalert2:latest
|
|||
|
|
|
|||
|
|
docker run -d --name elastalert --restart=always \
|
|||
|
|
-v /data/feishu-alert/config.yaml:/opt/elastalert/config.yaml \
|
|||
|
|
-v /data/feishu-alert/rules:/opt/elastalert/rules \
|
|||
|
|
-v /etc/localtime:/etc/localtime \
|
|||
|
|
dengchuanfu/feishualert:v0.1 --verbose
|
|||
|
|
|
|||
|
|
|
|||
|
|
目前我用的filebasts收集日志文件到es中,然后需要你推荐我使用一个告警工具进行告警,要免费的不要用kibana自带的,它需要付费订阅,
|
|||
|
|
配置告警规则可以有web界面进行配置,将收集的日志行级别是error的告警到飞书机器人群中,有多个filebasts在收集,每个项目日志会在es中生成对应的数据流。我需要你一步步教我配置好,有任何需要的信息跟我拿,比如不同项目的名字、数据流的名字、判断error行的字段名字、判断条件、飞书机器人地址等等;要优化好,比如信息聚合等等。
|
|||
|
|
目前产生的日志行,其附带的堆栈信息回比较长,需要怎么优化这个告警的显示,我希望发到飞书机器人的信息格式是以飞书卡片形式,标题是项目名,显示的信息是:1、触发时间,2、触发时产生的error行数量,3、具体的error行的内容(这里需要处理有的堆栈信息过长的问题),4、直接跳转到kibana的地址链接。
|
|||
|
|
|
|||
|
|
我尝试过以下几个但是没成功:
|
|||
|
|
1、INFINI Console是我查过的貌似可以实现但是不会配置,
|
|||
|
|
2、滴滴夜莺可以触发告警,但是无法在消息中直接显示es的行的内容,
|
|||
|
|
3、ElastAlert2部署麻烦我没成功部署过,
|
|||
|
|
|
|||
|
|
以下是我的一些项目信息
|
|||
|
|
1、项目名称列表以及对应的数据流
|
|||
|
|
pord-flymoon-task pord01-flymoonlog-pord01-flymoon-task-2025.*
|
|||
|
|
pord-flymoon-sse pord01-flymoonlog-pord01-flymoon-sse-2025.*
|
|||
|
|
pord-flymoon-partner pord01-flymoonlog-pord01-flymoon-partner-2025.*
|
|||
|
|
pord-flymoon-admin pord01-flymoonlog-pord01-flymoon-admin-2025.*
|
|||
|
|
pord-flymoon_crawlspider pord01-flymoonlog-pord01-flymoon_crawlspider-2025.*
|
|||
|
|
pord-fly-moon-email_v2 pord01-flymoonlog-pord01-fly-moon-email_v2-2025.*
|
|||
|
|
out-pord-fly-moon-email_v2 out-241-flymoonlog-pord-fly-moon-email_v2-2025.*
|
|||
|
|
2、代表error行的字段,在kibana筛选的时候,
|
|||
|
|
除了pord-fly-moon-email_v2和out-pord-fly-moon-email_v2项目是用message:error
|
|||
|
|
其他的项目都是parsed_sys_info.log_level:error
|
|||
|
|
3、机器人地址 https://open.feishu.cn/open-apis/bot/v2/hook/7525a700-f27e-4b3d-88eb-7ceaff9e44e4
|
|||
|
|
4、Kibana 基础地址
|
|||
|
|
http://192.168.60.21:5601/
|
|||
|
|
并且有账号admin,密码123456
|
|||
|
|
|
|||
|
|
|
|||
|
|
xpack.encryptedSavedObjects.encryptionKey: "414d80b26291f9a0017f0a3ff591f22a969bec72f48f66de579ab0dadd1131c4"
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
mkdir /data/feishu-alert/config.yaml
|
|||
|
|
|
|||
|
|
mkdir /data/feishu-alert/rules
|
|||
|
|
|
|||
|
|
vim /data/feishu-alert/rules/alert_error.yaml
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
rules_folder: /opt/elastalert/rules #指定告警文件存放目录
|
|||
|
|
|
|||
|
|
run_every:
|
|||
|
|
minutes: 1 #ElastAlert查询Elasticsearch的频率,这个单位可以是几周到几秒不等
|
|||
|
|
|
|||
|
|
buffer_time:
|
|||
|
|
minutes: 1 #ElastAlert将缓冲最近的一段时间的结果,以防某些日志源不是实时的
|
|||
|
|
|
|||
|
|
es_host: 192.168.1.7 #Elasticsearch主机
|
|||
|
|
|
|||
|
|
es_port: 9200 #Elasticsearch端口
|
|||
|
|
|
|||
|
|
writeback_index: elastalert_status #es_host上的索引,用于元数据存储。这可以是一个未映射的索引,但建议你运行。设置一个映射。
|
|||
|
|
|
|||
|
|
alert_time_limit:
|
|||
|
|
days: 2 #如果一个警报因某种原因而失败,ElastAlert将重试,直到这个时间段过后
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
#rule name 必须是独一的
|
|||
|
|
name: alert_error
|
|||
|
|
|
|||
|
|
#必须设置的值,多种type可上alert官方文档查询
|
|||
|
|
type: frequency
|
|||
|
|
|
|||
|
|
#指定index,支持正则匹配
|
|||
|
|
index: easyspeed-cloud-logs-*
|
|||
|
|
|
|||
|
|
use_strftime_index: true
|
|||
|
|
|
|||
|
|
#时间触发的次数
|
|||
|
|
num_events: 1
|
|||
|
|
|
|||
|
|
#和num_events参数关联,1分钟内出现1次会报警
|
|||
|
|
timeframe:
|
|||
|
|
minutes: 1
|
|||
|
|
|
|||
|
|
#同一规则的两次警报之间的最短时间。在此时间内发生的任何警报都将被丢弃。默认值为一分钟。
|
|||
|
|
realert:
|
|||
|
|
minutes: 1
|
|||
|
|
|
|||
|
|
#用来拼配告警规则,elasticsearch 的query语句,支持 AND&OR等
|
|||
|
|
#我这里是根据业务需求,查询level是ERROR并且type是production且排除message里不带有“开始执行批量/更新轨迹维护关键字
|
|||
|
|
#简单匹配查询,可以直接query: “level: ERROR“表示level值是ERROR的触发告警通知。
|
|||
|
|
filter:
|
|||
|
|
- query:
|
|||
|
|
query_string:
|
|||
|
|
query: "NOT message: (开始执行批量 OR 更新轨迹维护) AND level: ERROR AND type: production"
|
|||
|
|
#只需要的字段 https://elastalert.readthedocs.io/en/latest/ruletypes.html#include
|
|||
|
|
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
|
|||
|
|
|
|||
|
|
#飞书告警方式
|
|||
|
|
alert:
|
|||
|
|
- "elastalert_modules.feishu_alert.FeishuAlert"
|
|||
|
|
|
|||
|
|
# 飞书机器人接口地址
|
|||
|
|
feishualert_url: "https://open.feishu.cn/open-apis/bot/v2/hook/"
|
|||
|
|
|
|||
|
|
# 飞书机器人id
|
|||
|
|
feishualert_botid:
|
|||
|
|
"xxx-xxx-xxx"
|
|||
|
|
|
|||
|
|
# 告警标题
|
|||
|
|
feishualert_title:
|
|||
|
|
"业务日志ERROR异常"
|
|||
|
|
|
|||
|
|
#00-08点不告警
|
|||
|
|
feishualert_skip:
|
|||
|
|
start: "00:00:00"
|
|||
|
|
end: "08:00:00"
|
|||
|
|
|
|||
|
|
#告警内容,使用{}可匹配matches
|
|||
|
|
feishualert_body:
|
|||
|
|
"
|
|||
|
|
【告警主题】: {feishualert_title}\n
|
|||
|
|
【告警时间】: {feishualert_time}\n
|
|||
|
|
【告警环境】: 【production】\n
|
|||
|
|
【告警模块】: {source}\n
|
|||
|
|
【业务索引】: {_index}\n
|
|||
|
|
【时间戳】: {@timestamp}\n
|
|||
|
|
【日志级别】: {level}\n
|
|||
|
|
【spanId】: {spanId}\n
|
|||
|
|
【traceId】: {traceId}\n
|
|||
|
|
【host】: {host}\n
|
|||
|
|
【message】: {message}
|
|||
|
|
"
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
index: pord01-flymoonlog-pord01-flymoon-task-2025.*
|
|||
|
|
use_strftime_index: true
|
|||
|
|
num_events: 1
|
|||
|
|
timeframe:
|
|||
|
|
minutes: 1
|
|||
|
|
realert:
|
|||
|
|
minutes: 1
|
|||
|
|
filter:
|
|||
|
|
- query:
|
|||
|
|
query_string:
|
|||
|
|
query: "parsed_sys_info.log_level:error"
|
|||
|
|
include: ["spanId", "level", "@timestamp", "_index", "source", "traceId","host","message"]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
(venv) -bash-4.2# ps axjf|grep check_account.py
|
|||
|
|
15294 8347 8346 15294 pts/5 8346 S+ 0 0:00 \_ grep --color=auto check_account.py
|
|||
|
|
1 7961 7750 7750 ? -1 S 0 0:00 bash -c source /data/webapps/prod_check_tiktok_account/venv/bin/activate && nohup python /data/webapps/prod_check_tiktok_account/check_account.py > /data/webapps/prod_check_tiktok_account/output.log 2>&1 & echo $! > /data/webapps/prod_check_tiktok_account/check_account.pid // 保存 PID disown // 让?程彻底脱离 echo "? 生产环境?程已启动,PID: $(cat /data/webapps/prod_check_tiktok_account/check_account.pid)"
|
|||
|
|
7961 7964 7750 7750 ? -1 Sl 0 0:02 \_ python /data/webapps/prod_check_tiktok_account/check_account.py
|
|||
|
|
|
|||
|
|
|
|||
|
|
sit_fly_moon_web|sit_jenniefy_web|sit_scalelink_frontend|test_fly_moon_web|test_deeplink_merchant|test_deeplink_merchant_foreign|test_deeplink_merchant_saas|test_jenniefy_web|test_scalelink_frontend|oversea_test_jenniefy_web
|
|||
|
|
|
|||
|
|
deeplink_merchant|deeplink_merchant_saas|fly_moon_web|scalelink_frontend|oversea_jenniefy_web
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{ if $event.IsRecovered }}
|
|||
|
|
{{- if ne $event.Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**恢复时间:** {{timeformat $event.LastEvalTime}}
|
|||
|
|
**告警描述:** **已恢复**
|
|||
|
|
**当前进程内存占用:**{{xxxx}}
|
|||
|
|
**当前系统剩余内存:** {{xxxx}}
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne $event.Cate "host"}}
|
|||
|
|
{{end}}
|
|||
|
|
**触发时间:** {{timeformat $event.TriggerTime}}
|
|||
|
|
**当前进程内存占用:**{{xxxx}}
|
|||
|
|
**当前系统剩余内存:** {{xxxx}}
|
|||
|
|
{{if $event.RuleNote }}**详细信息:** **{{$event.RuleNote}}**{{end}}
|
|||
|
|
{{- end -}}
|
|||
|
|
|
|||
|
|
{{$domain := "http://192.168.60.21:5601" }}
|
|||
|
|
[近1小时日志详情]({{$domain}}{{$labels.url}})
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
process_resident_memory_bytes{instance="prod01-server", job="prod01-server-process-exporter"}
|
|||
|
|
|
|||
|
|
process_resident_memory_bytes{job='process-exporter',instance=~'$host',name=~'$process_name'}" | first | value | humanize1024
|
|||
|
|
|
|||
|
|
process_resident_memory_bytes{job='prod01-server-process-exporter',instance=~'prod01-server',name=~'prod-flymoon_email_prod'}" | first | value | humanize1024
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
pipeline {
|
|||
|
|
agent any
|
|||
|
|
|
|||
|
|
environment {
|
|||
|
|
REMOTE_HOST = "43.130.56.138" // 远程服务器 {params.REMOTE_HOST}
|
|||
|
|
REMOTE_PROJECT_PATH = "/data/webapps/lessie_sourcing_agents" // 远程 Python 项目路径
|
|||
|
|
VENV_DIR = "/data/webapps/lessie_sourcing_agents/venv" // 远程虚拟环境目录
|
|||
|
|
CONDA_PATH = "/root/miniconda3/bin/conda" // 修改为实际 Conda 安装路径
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
stages {
|
|||
|
|
stage('Checkout 代码') {
|
|||
|
|
steps {
|
|||
|
|
git branch: "${params.Code_branch}", credentialsId: 'fly_gitlab_auth', url: 'http://172.24.16.20/python/lessie-sourcing-agents.git'
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
stage('进程下线') {
|
|||
|
|
steps {
|
|||
|
|
echo("下线")
|
|||
|
|
sh "ssh ${REMOTE_HOST} 'sh /data/sh/kill_lessie_sourcing_agents.sh'"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
stage('工程同步') {
|
|||
|
|
steps {
|
|||
|
|
sh """
|
|||
|
|
ssh ${REMOTE_HOST} 'mkdir -p ${REMOTE_PROJECT_PATH}'
|
|||
|
|
rsync -avz --exclude 'venv' ${WORKSPACE}/ ${REMOTE_HOST}:${REMOTE_PROJECT_PATH}/
|
|||
|
|
"""
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
|
|||
|
|
stage('安装依赖') {
|
|||
|
|
steps {
|
|||
|
|
sh """
|
|||
|
|
ssh ${REMOTE_HOST} '
|
|||
|
|
cd ${REMOTE_PROJECT_PATH} &&
|
|||
|
|
source ~/.bashrc &&
|
|||
|
|
conda activate search &&
|
|||
|
|
source ${VENV_DIR}/bin/activate &&
|
|||
|
|
pip install --upgrade pip &&
|
|||
|
|
pip install -r requirements.txt
|
|||
|
|
'
|
|||
|
|
"""
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
stage('工程启动') {
|
|||
|
|
steps {
|
|||
|
|
echo("启动")
|
|||
|
|
sh """
|
|||
|
|
ssh ${REMOTE_HOST} '
|
|||
|
|
conda activate search
|
|||
|
|
source ${VENV_DIR}/bin/activate
|
|||
|
|
TIMESTAMP=\$(date +"%Y%m%d_%H%M%S")
|
|||
|
|
nohup python /data/webapps/lessie_sourcing_agents/server.py > /data/webapps/lessie_sourcing_agents/logs/lessie_sourcing_agents_\${TIMESTAMP}.log 2>&1 &
|
|||
|
|
'
|
|||
|
|
"""
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
post {
|
|||
|
|
success {
|
|||
|
|
echo '✅ 部署成功!'
|
|||
|
|
}
|
|||
|
|
failure {
|
|||
|
|
echo '❌ 部署失败,请检查日志!'
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
{{/* 通用头部信息:固定展示告警类型、关联机器与规则信息 */}}
|
|||
|
|
**告警类型:** {{if $event.IsRecovered}}内存告警-恢复通知{{else}}内存告警-触发通知{{end}}
|
|||
|
|
**关联机器:** {{$event.Attributes.host | default "未知主机"}}
|
|||
|
|
**告警规则:** {{$event.RuleName}}
|
|||
|
|
{{if $event.RuleId}}**规则ID:** {{$event.RuleId}}{{end}}
|
|||
|
|
|
|||
|
|
{{/* 恢复场景逻辑:展示恢复时间与恢复时内存大小 */}}
|
|||
|
|
{{ if $event.IsRecovered }}
|
|||
|
|
{{- if ne $event.Cate "host"}}{{end}}
|
|||
|
|
**恢复时间:** {{timeformat $event.LastEvalTime "2006-01-02 15:04:05"}}
|
|||
|
|
**恢复时内存使用:** {{$event.TriggerValue | humanizeSize}}
|
|||
|
|
**告警描述:** 机器内存使用率已降至阈值以下,告警恢复
|
|||
|
|
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
|
|||
|
|
|
|||
|
|
{{/* 触发场景逻辑:展示触发时间、发送时间与触发时内存大小 */}}
|
|||
|
|
{{- else }}
|
|||
|
|
{{- if ne $event.Cate "host"}}{{end}}
|
|||
|
|
**触发时间:** {{timeformat $event.TriggerTime "2006-01-02 15:04:05"}}
|
|||
|
|
**发送时间:** {{timeformat $timestamp "2006-01-02 15:04:05"}}
|
|||
|
|
**触发时内存使用:** {{$event.TriggerValue | humanizeSize}}
|
|||
|
|
**告警描述:** 机器内存使用率超过设定阈值,触发告警
|
|||
|
|
{{if $event.RuleNote }}**规则说明:** {{$event.RuleNote}}{{end}}
|
|||
|
|
{{if $event.Threshold}}**告警阈值:** {{$event.Threshold | humanizeSize}}{{end}}
|
|||
|
|
{{- end -}}
|