Prometheus

Introduction

It is an opensource monitoring solution for metrics and alerting.

Chart

Prometheus-operator public helm chart'sopen in new window version 5.0.13 is being used to deploy grafana on cluster. StakaterKubeHelmMonitoringopen in new window repository is being used for deployment.

Image Issue

None. Image specifications are given below:

repository: quay.io/prometheus/prometheus
tag: v2.7.2

Cherry Pickable

No, becuase it is being deployed with Prometheus-Operator helm charts. Although, it can be deployed using its own helm chart.

Single Sign On

Applicable but not supported.

Installation

It will be deployed by the pipeline of StakaterKubeHelmMonitoringopen in new window repository.

Dependencies

It requires helm operator to be running in cluster.

Chart Infromation

It is part of prometheus-operator chart.

repository: https://kubernetes-charts.storage.googleapis.com
name: prometheus-operator
version: 5.0.13

Hard-coded-values

Hard coded values for Prometheus are given below:

It will configure additional prometheus rules

additionalPrometheusRules:
  name: fluentd-rules
  additionalLabels:
    kind: infra
  groups:
  - name: Fluentd
    rules:
    - alert: IncreasedFluentdRetryWait
      expr: max_over_time(fluentd_output_status_retry_wait[1m]) > 1000
      for: 20s
      labels:
        severity: critical    
        kind: infra            
      annotations:
        description: 'Fluentd Output Status Retry Wait has increased from 1000 in 1 minute'
        summary: Retry Wait Increased
    - alert: IncreasedFluentdRetryCount
      expr: rate(fluentd_output_status_retry_count[1m]) > 0.5
      for: 20s
      labels:
        severity: critical     
        kind: infra           
      annotations:
        description: 'Rate of Fluentd Output Retry Count has increased from 0.5 in 1m'
        summary: Retry Wait Increased
    - alert: IncreasedFluentdOutputBufferLength
      expr: max_over_time(fluentd_output_status_buffer_queue_length[1m]) > 500
      for: 10s
      labels:
        severity: critical
        kind: infra
      annotations:
        description: 'Fluentd Output Status Buffer Queue length has increased from 500.'
        summary: Fluentd Buffer Queue length Increased

It will configure additional service monitors

prometheus:
  # Adding additional service monitors
  additionalServiceMonitors:
  - name: monitoring-fluentd
    jobLabel: k8s-app
    selector:
      matchLabels:
        app.kubernetes.io/name: fluentd-elasticsearch
    namespaceSelector:
      matchNames:
        - logging
    endpoints:
    - port: monitor-agent
      scheme: http
      interval: 20s
      path: /metrics

  - name: external-ingress
    jobLabel: k8s-app
    selector:
      matchLabels:
        k8s-app: external-ingress             
    namespaceSelector:
      matchNames:
        - global
    endpoints:
    - port: metrics
      interval: 30s
    
  - name: internal-ingress
    jobLabel: k8s-app
    selector:
      matchLabels:
        k8s-app: internal-ingress
    namespaceSelector:
      matchNames:
        - global 
    endpoints:
    - port: metrics
      interval: 30s

It will configure service and its annotations

prometheus:
  service:
    labels:
      expose: true
    annotations:
      config.xposer.stakater.com/Domain: stakater.com
      config.xposer.stakater.com/IngressNameTemplate: '{{.Service}}-{{.Namespace}}'
      config.xposer.stakater.com/IngressURLPath: /
      config.xposer.stakater.com/IngressURLTemplate: 'prometheus.{{.Namespace}}.{{.Domain}}'
      xposer.stakater.com/annotations: |-
        kubernetes.io/ingress.class: external-ingress
        ingress.kubernetes.io/rewrite-target: /
        ingress.kubernetes.io/force-ssl-redirect: true
        forecastle.stakater.com/expose: true
        forecastle.stakater.com/icon: https://raw.githubusercontent.com/stakater/ForecastleIcons/master/prometheus.png
        forecastle.stakater.com/appName: Prometheus