1. 说明
此项目使用镜像 docker.io/prom/prometheus:v2.38.0
,离线环境请在部署前将其导入至 containerd 或私有镜像源地址中。
prometheus 配置放在 configmap/conf-prometheus
中,报警规则配置放在 configmap/conf-prometheus-rules
中。
2. 导入RBAC规则
prometheus 需要从 kubernetes 下抓取配置,导入以下配置创建 RBAC 规则。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor-app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor-app
3. 创建持久卷
使用以下配置创建对应的持久卷:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-prometheus
namespace: monitor-app
labels:
app: prometheus
spec:
storageClassName: 'sc-nfs-share'
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
4. 创建配置
使用以下配置创建 Prometheus 需要的相关的配置文件,其中包括 prometheus 本身的配置和报警规则的配置:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
apiVersion: v1
kind: ConfigMap
metadata:
name: conf-prometheus
namespace: monitor-app
labels:
app: prometheus
data:
prometheus.yml: |
global:
scrape_interval: 10s
evaluation_interval: 30s
scrape_timeout: 10s
alerting:
alertmanagers:
- static_configs:
- targets:
- svc-alert-manager.monitor-app.svc.cluster.local:9093
rule_files:
- "/etc/prometheus/rules/*.yaml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: [ "localhost:9090" ]
- job_name: 'cluster-node-monitor'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
---
apiVersion: v1
kind: ConfigMap
metadata:
name: conf-prometheus-rules
namespace: monitor-app
labels:
app: prometheus
data:
sample.yaml: |
groups:
- name: "sample-config"
rules:
- alert: "PostgresSQL离线告警"
for: "0m"
annotations:
summary: "服务器的PostgresSQL在线情况产生告警。"
description: "POSTGRESQL组件 - PostgresSQL在线情况告警 "
expr: "pg_up == 0"
5. 导入启动配置
使用以下配置创建 Prometheus 启动配置。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sts-prometheus
namespace: monitor-app
labels:
app: prometheus
spec:
serviceName: svc-prometheus
selector:
matchLabels:
app: prometheus
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: docker.io/prom/prometheus:v2.38.0
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
- name: prometheus-conf
mountPath: /etc/prometheus/
- name: prometheus-rules
mountPath: /etc/prometheus/rules/
volumes:
- name: prometheus-data
persistentVolumeClaim:
claimName: pvc-prometheus
- name: prometheus-conf
configMap:
name: conf-prometheus
- name: prometheus-rules
configMap:
name: conf-prometheus-rules
6. 创建 Service
使用以下配置,创建对应的 Service 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: svc-prometheus
namespace: monitor-app
spec:
ports:
- port: 9090
name: prometheus
selector:
app: prometheus
7. 创建 Ingress
Service 创建完成后,使用以下配置将对应的 Service 暴露到 Ingress 中。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prometheus
namespace: monitor-app
labels:
app: prometheus
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/proxy-body-size: "0"
spec:
ingressClassName: nginx-private
tls:
- hosts:
- prometheus.internal.d7z.net
secretName: tls-pri-d7z
rules:
- host: prometheus.internal.d7z.net
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: svc-prometheus
port:
name: prometheus
8. 创建集群服务器监控
prometheus
部署完成后,导入以下配置开启对 kubernetes
集群节点本身的监控。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: daemon-cluster-node-exporter
namespace: kube-system
labels:
app: cluster-node-exporter
spec:
selector:
matchLabels:
app: cluster-node-exporter
template:
metadata:
labels:
app: cluster-node-exporter
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9100'
prometheus.io/path: 'metrics'
spec:
containers:
- name: node-exporter
image: docker.io/prom/node-exporter:v1.3.1
ports:
- name: metrics
containerPort: 9100
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/host"
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /host
volumes:
- name: dev
hostPath:
path: /dev
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostPID: true
hostNetwork: true
hostIPC: true
tolerations:
- operator: "Exists"
9. 测试
配置导入完成后,使用如下命令查看结果:
1
kubectl get pods,svc,ingress,configmaps -n monitor-app -l app=prometheus
待一切完成后,访问 https://prometheus.internal.d7z.net
查看 web 页面。