Building a Production-Grade Observability Stack on Kubernetes with Prometheus, Grafana, and Loki

Observability is no longer optional for production Kubernetes environments. As microservices architectures grow in complexity, the ability to understand system behavior through metrics, logs, and traces becomes critical for maintaining reliability and reducing mean time to resolution (MTTR).

This article walks through deploying a complete observability stack on Kubernetes using Prometheus for metrics, Grafana for visualization, and Loki for log aggregation. We’ll cover high-availability configurations, persistent storage, alerting, and best practices for production deployments.

Prerequisites

Before starting, ensure you have:

  • Kubernetes cluster (1.25+) with at least 3 worker nodes
  • kubectl configured with cluster admin access
  • Helm 3.x installed
  • Storage class configured for persistent volumes
  • Minimum 8GB RAM and 4 vCPUs per node for production workloads

Step 1: Create Dedicated Namespace

Isolate observability components in a dedicated namespace:

kubectl create namespace observability

kubectl label namespace observability \
  monitoring=enabled \
  pod-security.kubernetes.io/enforce=privileged

Step 2: Deploy Prometheus with High Availability

We’ll use the kube-prometheus-stack Helm chart, which includes Prometheus Operator, Alertmanager, and common exporters.

Add Helm Repository

helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts
helm repo update

Create Values File

# prometheus-values.yaml
prometheus:
  prometheusSpec:
    replicas: 2
    retention: 30d
    retentionSize: 40GB
    
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
      limits:
        cpu: 2000m
        memory: 8Gi
    
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    
    podAntiAffinity: hard
    
    additionalScrapeConfigs:
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__

alertmanager:
  alertmanagerSpec:
    replicas: 3
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    
    podAntiAffinity: hard

  config:
    global:
      resolve_timeout: 5m
      slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    
    route:
      group_by: ['alertname', 'namespace', 'severity']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      receiver: 'slack-notifications'
      routes:
      - match:
          severity: critical
        receiver: 'slack-critical'
        repeat_interval: 1h
      - match:
          severity: warning
        receiver: 'slack-notifications'
    
    receivers:
    - name: 'slack-notifications'
      slack_configs:
      - channel: '#alerts'
        send_resolved: true
        title: '{{ .Status | toUpper }}: {{ .CommonLabels.alertname }}'
        text: >-
          {{ range .Alerts }}
          *Namespace:* {{ .Labels.namespace }}
          *Pod:* {{ .Labels.pod }}
          *Description:* {{ .Annotations.description }}
          {{ end }}
    
    - name: 'slack-critical'
      slack_configs:
      - channel: '#alerts-critical'
        send_resolved: true

nodeExporter:
  enabled: true

kubeStateMetrics:
  enabled: true

grafana:
  enabled: true
  replicas: 2
  
  persistence:
    enabled: true
    storageClassName: gp3
    size: 10Gi
  
  adminPassword: "CHANGE_ME_SECURE_PASSWORD"
  
  datasources:
    datasources.yaml:
      apiVersion: 1
      datasources:
      - name: Prometheus
        type: prometheus
        url: http://prometheus-kube-prometheus-prometheus:9090
        access: proxy
        isDefault: true
      - name: Loki
        type: loki
        url: http://loki-gateway.observability.svc.cluster.local
        access: proxy
  
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
      - name: 'default'
        orgId: 1
        folder: ''
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards/default
  
  dashboards:
    default:
      kubernetes-cluster:
        gnetId: 7249
        revision: 1
        datasource: Prometheus
      node-exporter:
        gnetId: 1860
        revision: 31
        datasource: Prometheus
      kubernetes-pods:
        gnetId: 6417
        revision: 1
        datasource: Prometheus

  ingress:
    enabled: true
    ingressClassName: nginx
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - grafana.example.com
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.example.com

Install Prometheus Stack

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace observability \
  --values prometheus-values.yaml \
  --version 55.5.0

Verify Deployment

kubectl get pods -n observability -l app.kubernetes.io/name=prometheus

kubectl get pods -n observability -l app.kubernetes.io/name=alertmanager

Step 3: Deploy Loki for Log Aggregation

Loki provides cost-effective log aggregation by indexing only metadata (labels) rather than full log content.

Create Loki Values File

# loki-values.yaml
loki:
  auth_enabled: false
  
  commonConfig:
    replication_factor: 3
    path_prefix: /var/loki
  
  storage:
    type: s3
    bucketNames:
      chunks: loki-chunks-bucket
      ruler: loki-ruler-bucket
      admin: loki-admin-bucket
    s3:
      endpoint: s3.us-east-1.amazonaws.com
      region: us-east-1
      secretAccessKey: ${AWS_SECRET_ACCESS_KEY}
      accessKeyId: ${AWS_ACCESS_KEY_ID}
      s3ForcePathStyle: false
      insecure: false
  
  schemaConfig:
    configs:
    - from: 2024-01-01
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
  
  limits_config:
    retention_period: 744h  # 31 days
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 20
    max_streams_per_user: 10000
    max_line_size: 256kb
  
  compactor:
    working_directory: /var/loki/compactor
    shared_store: s3
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_delay: 2h

deploymentMode: Distributed

ingester:
  replicas: 3
  persistence:
    enabled: true
    size: 10Gi
    storageClass: gp3
  
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

distributor:
  replicas: 3
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: 1000m
      memory: 1Gi

querier:
  replicas: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

queryFrontend:
  replicas: 2
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: 1000m
      memory: 1Gi

queryScheduler:
  replicas: 2

compactor:
  replicas: 1
  persistence:
    enabled: true
    size: 10Gi
    storageClass: gp3

gateway:
  replicas: 2
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
      - host: loki.example.com
        paths:
          - path: /
            pathType: Prefix

Install Loki

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki \
  --namespace observability \
  --values loki-values.yaml \
  --version 5.41.0

Step 4: Deploy Promtail for Log Collection

Promtail runs as a DaemonSet to collect logs from all nodes and forward them to Loki.

# promtail-values.yaml
config:
  clients:
    - url: http://loki-gateway.observability.svc.cluster.local/loki/api/v1/push
      tenant_id: default
  
  snippets:
    pipelineStages:
    - cri: {}
    - multiline:
        firstline: '^\d{4}-\d{2}-\d{2}'
        max_wait_time: 3s
    - json:
        expressions:
          level: level
          msg: msg
          timestamp: timestamp
    - labels:
        level:
    - timestamp:
        source: timestamp
        format: RFC3339

  scrapeConfigs: |
    - job_name: kubernetes-pods
      pipeline_stages:
        {{- toYaml .Values.config.snippets.pipelineStages | nindent 8 }}
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - source_labels:
            - __meta_kubernetes_pod_controller_name
          regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
          action: replace
          target_label: __tmp_controller_name
        - source_labels:
            - __meta_kubernetes_pod_label_app_kubernetes_io_name
            - __meta_kubernetes_pod_label_app
            - __tmp_controller_name
            - __meta_kubernetes_pod_name
          regex: ^;*([^;]+)(;.*)?$
          action: replace
          target_label: app
        - source_labels:
            - __meta_kubernetes_pod_label_app_kubernetes_io_instance
            - __meta_kubernetes_pod_label_instance
          regex: ^;*([^;]+)(;.*)?$
          action: replace
          target_label: instance
        - source_labels:
            - __meta_kubernetes_pod_label_app_kubernetes_io_component
            - __meta_kubernetes_pod_label_component
          regex: ^;*([^;]+)(;.*)?$
          action: replace
          target_label: component
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_node_name
          target_label: node_name
        - action: replace
          source_labels:
            - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          replacement: $1
          separator: /
          source_labels:
            - namespace
            - app
          target_label: job
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_name
          target_label: pod
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_container_name
          target_label: container
        - action: replace
          replacement: /var/log/pods/*$1/*.log
          separator: /
          source_labels:
            - __meta_kubernetes_pod_uid
            - __meta_kubernetes_pod_container_name
          target_label: __path__
        - action: replace
          regex: true/(.*)
          replacement: /var/log/pods/*$1/*.log
          separator: /
          source_labels:
            - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
            - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
            - __meta_kubernetes_pod_container_name
          target_label: __path__

daemonset:
  enabled: true

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

tolerations:
  - key: node-role.kubernetes.io/master
    operator: Exists
    effect: NoSchedule
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule

Install Promtail

helm install promtail grafana/promtail \
  --namespace observability \
  --values promtail-values.yaml \
  --version 6.15.3

Step 5: Configure Custom Alerts

Create PrometheusRule resources for critical alerts:

# custom-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: custom-application-alerts
  namespace: observability
  labels:
    release: prometheus
spec:
  groups:
  - name: application.rules
    rules:
    - alert: HighErrorRate
      expr: |
        (
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (namespace, service)
          /
          sum(rate(http_requests_total[5m])) by (namespace, service)
        ) > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High error rate detected"
        description: "Service {{ $labels.service }} in namespace {{ $labels.namespace }} has error rate of {{ $value | humanizePercentage }}"
    
    - alert: HighLatency
      expr: |
        histogram_quantile(0.95, 
          sum(rate(http_request_duration_seconds_bucket[5m])) by (le, namespace, service)
        ) > 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High latency detected"
        description: "Service {{ $labels.service }} p95 latency is {{ $value | humanizeDuration }}"
    
    - alert: PodCrashLooping
      expr: |
        increase(kube_pod_container_status_restarts_total[1h]) > 5
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Pod crash looping"
        description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has restarted {{ $value }} times in the last hour"
    
    - alert: PersistentVolumeUsageHigh
      expr: |
        (
          kubelet_volume_stats_used_bytes
          /
          kubelet_volume_stats_capacity_bytes
        ) > 0.85
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "PV usage high"
        description: "PersistentVolume {{ $labels.persistentvolumeclaim }} is {{ $value | humanizePercentage }} full"

  - name: infrastructure.rules
    rules:
    - alert: NodeMemoryPressure
      expr: |
        (
          node_memory_MemAvailable_bytes
          /
          node_memory_MemTotal_bytes
        ) < 0.1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node memory pressure"
        description: "Node {{ $labels.instance }} has only {{ $value | humanizePercentage }} memory available"
    
    - alert: NodeDiskPressure
      expr: |
        (
          node_filesystem_avail_bytes{mountpoint="/"}
          /
          node_filesystem_size_bytes{mountpoint="/"}
        ) < 0.1
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "Node disk pressure"
        description: "Node {{ $labels.instance }} has only {{ $value | humanizePercentage }} disk space available"
    
    - alert: NodeCPUHigh
      expr: |
        100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage"
        description: "Node {{ $labels.instance }} CPU usage is {{ $value | humanize }}%"

Apply the alerts:

kubectl apply -f custom-alerts.yaml

Step 6: Create Custom Grafana Dashboard

Create a ConfigMap with a custom dashboard for application metrics:

# application-dashboard.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: application-dashboard
  namespace: observability
  labels:
    grafana_dashboard: "1"
data:
  application-overview.json: |
    {
      "annotations": {
        "list": []
      },
      "editable": true,
      "fiscalYearStartMonth": 0,
      "graphTooltip": 0,
      "id": null,
      "links": [],
      "liveNow": false,
      "panels": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {"color": "green", "value": null},
                  {"color": "yellow", "value": 0.01},
                  {"color": "red", "value": 0.05}
                ]
              },
              "unit": "percentunit"
            }
          },
          "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
          "id": 1,
          "options": {
            "colorMode": "value",
            "graphMode": "area",
            "justifyMode": "auto",
            "orientation": "auto",
            "reduceOptions": {
              "calcs": ["lastNotNull"],
              "fields": "",
              "values": false
            },
            "textMode": "auto"
          },
          "targets": [
            {
              "expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))",
              "refId": "A"
            }
          ],
          "title": "Error Rate",
          "type": "stat"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "prometheus"
          },
          "fieldConfig": {
            "defaults": {
              "color": {"mode": "palette-classic"},
              "unit": "reqps"
            }
          },
          "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
          "id": 2,
          "targets": [
            {
              "expr": "sum(rate(http_requests_total[5m])) by (service)",
              "legendFormat": "{{service}}",
              "refId": "A"
            }
          ],
          "title": "Requests per Second",
          "type": "timeseries"
        }
      ],
      "schemaVersion": 38,
      "style": "dark",
      "tags": ["application", "custom"],
      "templating": {"list": []},
      "time": {"from": "now-1h", "to": "now"},
      "title": "Application Overview",
      "uid": "app-overview"
    }

Step 7: ServiceMonitor for Application Metrics

Enable Prometheus to scrape your application metrics:

# application-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: application-metrics
  namespace: observability
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      monitoring: enabled
  namespaceSelector:
    matchNames:
      - production
      - staging
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    scheme: http

Add labels to your application service:

yaml

apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  labels:
    monitoring: enabled
spec:
  ports:
  - name: http
    port: 8080
  - name: metrics
    port: 9090
  selector:
    app: api-service

Production Best Practices

Resource Planning

ComponentMin ReplicasCPU RequestMemory RequestStorage
Prometheus2500m2Gi50Gi
Alertmanager3100m256Mi10Gi
Grafana2250m512Mi10Gi
Loki Ingester3500m1Gi10Gi
Loki Querier3500m1Gi
PromtailDaemonSet100m128Mi

Retention Policies

# Prometheus: Balance storage cost with query needs
retention: 30d
retentionSize: 40GB

# Loki: Configure compactor for automatic cleanup
limits_config:
  retention_period: 744h  # 31 days

Security Hardening

# Network Policy for Prometheus
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: prometheus-network-policy
  namespace: observability
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: prometheus
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          monitoring: enabled
    ports:
    - protocol: TCP
      port: 9090
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 9090
    - protocol: TCP
      port: 443

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.