Skip to main content

Kubernetes

Collect metrics from your Kubernetes cluster — node resource usage, pod CPU/memory, deployment health, persistent volume status, and more.

Pattern: kube-state-metrics + kubelet cAdvisor → Prometheus scrape → xScaler remote_write


Dashboards

Overview

Kubernetes Global Overview dashboard

Kubernetes Global Overview dashboard 2

Pods

Kubernetes Pods dashboard

Kubernetes Pods dashboard 2

Nodes

Kubernetes Nodes dashboard


Prerequisites

  • Kubernetes cluster (any version ≥ 1.21)
  • kubectl access
  • Prometheus or Grafana Alloy deployed in the cluster
  • xScaler tenant credentials

Step 1 — Deploy kube-state-metrics

kube-state-metrics exposes Kubernetes object state as Prometheus metrics.

kubectl apply -f https://github.com/kubernetes/kube-state-metrics/releases/latest/download/kube-state-metrics.yaml

Verify:

kubectl get pods -n kube-system -l app.kubernetes.io/name=kube-state-metrics

Step 2 — Scrape and forward to xScaler

Option A — Prometheus (in-cluster)

Add these scrape jobs to your Prometheus config or prometheus-operator ServiceMonitor:

scrape_configs:
# kube-state-metrics — Kubernetes object state
- job_name: kube-state-metrics
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']

# kubelet cAdvisor — container resource usage
- job_name: kubelet
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

remote_write:
- url: https://euw1-01.m.xscalerlabs.com/api/v1/push
authorization:
credentials: <token>
headers:
X-Scope-OrgID: <tenant-id>

Option B — Grafana Alloy (in-cluster)

// Discover all Kubernetes pods with prometheus.io/scrape annotation
discovery.kubernetes "pods" {
role = "pod"
}

discovery.relabel "annotated_pods" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_pod_annotation_prometheus_io_scrape"]
action = "keep"
regex = "true"
}
}

prometheus.scrape "kubernetes" {
targets = discovery.relabel.annotated_pods.output
forward_to = [prometheus.remote_write.xscaler.receiver]
}

prometheus.remote_write "xscaler" {
endpoint {
url = "https://euw1-01.m.xscalerlabs.com/api/v1/push"
authorization {
type = "Bearer"
credentials = env("XSCALER_TOKEN")
}
headers = { "X-Scope-OrgID" = env("XSCALER_TENANT_ID") }
}
}

Option C — OpenTelemetry Collector

receivers:
k8s_cluster:
collection_interval: 15s
node_conditions_to_report: [Ready, MemoryPressure, DiskPressure]
allocatable_types_to_report: [cpu, memory, storage]
kubeletstats:
collection_interval: 15s
auth_type: serviceAccount
endpoint: https://${K8S_NODE_NAME}:10250
insecure_skip_verify: true
metric_groups: [node, pod, container, volume]

processors:
memory_limiter:
check_interval: 1s
limit_mib: 512
batch:
timeout: 10s
k8sattributes:
passthrough: false

exporters:
otlphttp/xscaler:
endpoint: https://euw1-01.m.xscalerlabs.com
headers:
Authorization: "Bearer <token>"
X-Scope-OrgID: "<tenant-id>"

service:
pipelines:
metrics:
receivers: [k8s_cluster, kubeletstats]
processors: [memory_limiter, k8sattributes, batch]
exporters: [otlphttp/xscaler]

Logs

Collect all container stdout/stderr across the cluster, enriched with pod, namespace, and node labels. Add the following to your Alloy config (run on each node or deploy as a DaemonSet):

discovery.kubernetes "kubernetes_pods" {
role = "pod"
}

discovery.relabel "kubernetes_logs" {
targets = discovery.kubernetes.kubernetes_pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
rule {
replacement = "integrations/kubernetes"
target_label = "job"
}
rule {
source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
separator = "/"
target_label = "__path__"
replacement = "/var/log/pods/*$1/*.log"
}
}

loki.source.file "kubernetes_logs" {
targets = discovery.relabel.kubernetes_logs.output
forward_to = [loki.write.xscaler.receiver]
}

loki.write "xscaler" {
endpoint {
url = "https://euw1-01.l.xscalerlabs.com/api/v1/logs/push"

http_client_config {
authorization {
type = "Bearer"
credentials = env("XSCALER_TOKEN")
}
}

headers = { "X-Scope-OrgID" = env("XSCALER_TENANT_ID") }
}
}

Key metrics

MetricDescription
kube_pod_status_phasePod phase (Running, Pending, Failed)
kube_deployment_status_replicas_availableAvailable replicas per deployment
kube_deployment_status_replicas_unavailableUnavailable replicas
kube_node_status_conditionNode conditions (Ready, MemoryPressure)
kube_pod_container_resource_requestsRequested CPU/memory per container
kube_pod_container_resource_limitsCPU/memory limits per container
container_cpu_usage_seconds_totalContainer CPU usage (cAdvisor)
container_memory_working_set_bytesContainer memory usage (cAdvisor)
kube_persistentvolumeclaim_status_phasePVC status (Bound, Pending)
kube_horizontalpodautoscaler_status_current_replicasHPA current replica count

Useful PromQL queries

# Pods not in Running state
count by (namespace, pod) (kube_pod_status_phase{phase!="Running"})

# Container CPU usage % of limit
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod, namespace)
/ sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod, namespace)
* 100

# Container memory usage % of limit
sum(container_memory_working_set_bytes) by (pod, namespace)
/ sum(kube_pod_container_resource_limits{resource="memory"}) by (pod, namespace)
* 100

# Deployments with unavailable replicas
kube_deployment_status_replicas_unavailable > 0

# Node not Ready
kube_node_status_condition{condition="Ready", status="true"} == 0