1. 说明

此项目使用镜像 docker.io/prom/prometheus:v2.38.0 ,离线环境请在部署前将其导入至 containerd 或私有镜像源地址中。

prometheus 配置放在 configmap/conf-prometheus 中,报警规则配置放在 configmap/conf-prometheus-rules 中。

2. 导入RBAC规则

prometheus 需要从 kubernetes 下抓取配置,导入以下配置创建 RBAC 规则。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/metrics
      - services
      - endpoints
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
    verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitor-app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: monitor-app

3. 创建持久卷

使用以下配置创建对应的持久卷:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-prometheus
  namespace: monitor-app
  labels:
    app: prometheus
spec:
  storageClassName: 'sc-nfs-share'
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi

4. 创建配置

使用以下配置创建 Prometheus 需要的相关的配置文件,其中包括 prometheus 本身的配置和报警规则的配置:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
apiVersion: v1
kind: ConfigMap
metadata:
  name: conf-prometheus
  namespace: monitor-app
  labels:
    app: prometheus
data:
  prometheus.yml: |
    global:
      scrape_interval: 10s
      evaluation_interval: 30s
      scrape_timeout: 10s
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
                - svc-alert-manager.monitor-app.svc.cluster.local:9093
    rule_files:
      - "/etc/prometheus/rules/*.yaml"
    scrape_configs:
      - job_name: "prometheus"
        static_configs:
          - targets: [ "localhost:9090" ]
      - job_name: 'cluster-node-monitor'
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: conf-prometheus-rules
  namespace: monitor-app
  labels:
    app: prometheus
data:
  sample.yaml: |
    groups:
      - name: "sample-config"
        rules:
          - alert: "PostgresSQL离线告警"
            for: "0m"
            annotations:
              summary: "服务器的PostgresSQL在线情况产生告警。"
              description: "POSTGRESQL组件 - PostgresSQL在线情况告警 "
            expr: "pg_up == 0"

5. 导入启动配置

使用以下配置创建 Prometheus 启动配置。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sts-prometheus
  namespace: monitor-app
  labels:
    app: prometheus
spec:
  serviceName: svc-prometheus
  selector:
    matchLabels:
      app: prometheus
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
        - name: prometheus
          image: docker.io/prom/prometheus:v2.38.0
          volumeMounts:
            - name: prometheus-data
              mountPath: /prometheus
            - name: prometheus-conf
              mountPath: /etc/prometheus/
            - name: prometheus-rules
              mountPath: /etc/prometheus/rules/
      volumes:
        - name: prometheus-data
          persistentVolumeClaim:
            claimName: pvc-prometheus
        - name: prometheus-conf
          configMap:
            name: conf-prometheus
        - name: prometheus-rules
          configMap:
            name: conf-prometheus-rules

6. 创建 Service

使用以下配置,创建对应的 Service 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus
  name: svc-prometheus
  namespace: monitor-app
spec:
  ports:
    - port: 9090
      name: prometheus
  selector:
    app: prometheus

7. 创建 Ingress

Service 创建完成后,使用以下配置将对应的 Service 暴露到 Ingress 中。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-prometheus
  namespace: monitor-app
  labels:
    app: prometheus
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
spec:
  ingressClassName: nginx-private
  tls:
    - hosts:
        - prometheus.internal.d7z.net
      secretName: tls-pri-d7z
  rules:
    - host: prometheus.internal.d7z.net
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: svc-prometheus
                port:
                  name: prometheus

8. 创建集群服务器监控

prometheus 部署完成后,导入以下配置开启对 kubernetes 集群节点本身的监控。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: daemon-cluster-node-exporter
  namespace: kube-system
  labels:
    app: cluster-node-exporter
spec:
  selector:
    matchLabels:
      app: cluster-node-exporter
  template:
    metadata:
      labels:
        app: cluster-node-exporter
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9100'
        prometheus.io/path: 'metrics'
    spec:
      containers:
        - name: node-exporter
          image: docker.io/prom/node-exporter:v1.3.1
          ports:
            - name: metrics
              containerPort: 9100
          args:
            - "--path.procfs=/host/proc"
            - "--path.sysfs=/host/sys"
            - "--path.rootfs=/host"
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /host
      volumes:
        - name: dev
          hostPath:
            path: /dev
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostPID: true
      hostNetwork: true
      hostIPC: true
      tolerations:
        - operator: "Exists"

9. 测试

配置导入完成后,使用如下命令查看结果:

1
kubectl get pods,svc,ingress,configmaps -n monitor-app -l app=prometheus

待一切完成后,访问 https://prometheus.internal.d7z.net 查看 web 页面。