一、Service Monitor找不到监控主机排查
… 本次方法不仅仅针对controller-manager 和scheduler的排查思路,所有service Monitor均可以按照流程排查问题。 …
watchdog告警
如果这个告警产生了,就说明是正常。可以关掉,此处不做处理。
1. 问题: 发现没有controller-manager 和scheduler的监控。
2. 检查Service Monitor 是否成功创建
确认 Service Monitor 是否成功创建
[root@k8s-master01 ~] #kubectl get servicemonitor -n monitoring kube-controller-manager kube-scheduler
NAME AGE
kube-controller-manager 14d
kube-scheduler 14d[root@k8s-master01 ~]#kubectl get servicemonitor -n monitoring
NAME AGE
......
kube-controller-manager 15d
kube-scheduler 15d
......
# 已经创建成功
已经创建成功对应的servicemonitor。
3. 检查Service Monitor 标签是否配置正确
该Service Monitor匹配的是kube-system命名空间下,具有app.kubernetes.io/name=kube-controller-manager
标签
# 该Service Monitor匹配的是kube-system命名空间下,具有app.kubernetes.io/name=kube-controller-manager标签# kube-controller-manager
[root@k8s-master01 ~] # kubectl get servicemonitor -n monitoring kube-controller-manager -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
......port: https-metricsscheme: httpstlsConfig:insecureSkipVerify: truejobLabel: app.kubernetes.io/namenamespaceSelector:matchNames:- kube-systemselector:matchLabels:app.kubernetes.io/name: kube-controller-manager # 使用的标签
---
# kube-scheduler
[root@k8s-master01 prometheus]#kubectl get servicemonitors.monitoring.coreos.com -n monitoring kube-scheduler -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
.....namespaceSelector:matchNames:- kube-systemselector:matchLabels:app.kubernetes.io/name: kube-scheduler # 使用的标签
接下来通过该标签查看是否有该Service,可以看到并没有此标签的 Service,所以导致了找不到需要监控的目标。
[root@k8s-master01 ~] # kubectl get svc -n kube-system -l app.kubernetes.io/name=kube-controller-manager
No resources found in kube-system namespace.# 也发现了没有controller-manager的svc
[root@k8s-master01 ~]#kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
etcd-prom ClusterIP 10.96.191.9 <none> 2379/TCP 7d2h
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 201d
kubelet ClusterIP None <none> 10250/TCP,10255/TC