Skip to main content

PostgreSQL

Collect query performance, connection pool, replication lag, table bloat, and lock metrics from PostgreSQL using postgres_exporter.

Pattern: postgres_exporter → Prometheus scrape → xScaler remote_write


Dashboard

Dashboard


Prerequisites

  • PostgreSQL 11 or later
  • A dedicated monitoring user with read-only access to pg_stat_* views
  • Prometheus or Grafana Alloy
  • xScaler tenant credentials

Step 1 — Create a monitoring user

-- Run as superuser
CREATE USER xscaler_monitor WITH PASSWORD 'strongpassword';
GRANT pg_monitor TO xscaler_monitor;

Step 2 — Run postgres_exporter

docker run --rm -d \
--name postgres-exporter \
-p 9187:9187 \
-e DATA_SOURCE_NAME="postgresql://xscaler_monitor:strongpassword@localhost:5432/postgres?sslmode=disable" \
prometheuscommunity/postgres-exporter

Verify:

curl -s http://localhost:9187/metrics | grep pg_up
# pg_up 1

Step 3 — Scrape and forward to xScaler

Prometheus

scrape_configs:
- job_name: postgresql
static_configs:
- targets: ['localhost:9187']
labels:
instance: primary

remote_write:
- url: https://euw1-01.m.xscalerlabs.com/api/v1/push
authorization:
credentials: <token>
headers:
X-Scope-OrgID: <tenant-id>

Grafana Alloy

prometheus.scrape "postgresql" {
targets = [{"__address__" = "localhost:9187"}]
forward_to = [prometheus.remote_write.xscaler.receiver]
}

prometheus.remote_write "xscaler" {
endpoint {
url = "https://euw1-01.m.xscalerlabs.com/api/v1/push"
authorization {
type = "Bearer"
credentials = "<token>"
}
headers = { "X-Scope-OrgID" = "<tenant-id>" }
}
}

OpenTelemetry Collector

receivers:
postgresql:
endpoint: localhost:5432
transport: tcp
username: xscaler_monitor
password: strongpassword
databases: [postgres]
collection_interval: 15s
tls:
insecure: true

processors:
memory_limiter:
check_interval: 1s
limit_mib: 256
batch:
timeout: 10s

exporters:
otlphttp/xscaler:
endpoint: https://euw1-01.m.xscalerlabs.com
headers:
Authorization: "Bearer <token>"
X-Scope-OrgID: "<tenant-id>"

service:
pipelines:
metrics:
receivers: [postgresql]
processors: [memory_limiter, batch]
exporters: [otlphttp/xscaler]

Logs

Collect query logs, slow queries, connection errors, and autovacuum activity. Add the following to your Alloy config:

local.file_match "postgresql_logs" {
path_targets = [{
__address__ = "localhost",
__path__ = "/var/log/postgresql/postgresql-*.log",
instance = constants.hostname,
job = "integrations/postgres",
}]
}

loki.source.file "postgresql_logs" {
targets = local.file_match.postgresql_logs.targets
forward_to = [loki.write.xscaler.receiver]
}

loki.write "xscaler" {
endpoint {
url = "https://euw1-01.l.xscalerlabs.com/api/v1/logs/push"

http_client_config {
authorization {
type = "Bearer"
credentials = env("XSCALER_TOKEN")
}
}

headers = { "X-Scope-OrgID" = env("XSCALER_TENANT_ID") }
}
}

Key metrics

MetricDescription
pg_upExporter reachability (1 = up)
pg_stat_database_numbackendsActive connections per database
pg_stat_database_xact_commitCommitted transactions/sec
pg_stat_database_xact_rollbackRolled-back transactions/sec
pg_stat_database_blks_hitBuffer cache hits
pg_stat_database_blks_readDisk reads (cache misses)
pg_stat_bgwriter_checkpoint_write_timeCheckpoint write duration (ms)
pg_replication_lagReplication lag in bytes (primary)
pg_locks_countActive locks by mode
pg_stat_user_tables_n_dead_tupDead tuples (bloat indicator)

Useful PromQL queries

# Active connections % of max
sum(pg_stat_database_numbackends)
/ scalar(pg_settings_max_connections) * 100

# Cache hit ratio (should be > 99%)
sum(rate(pg_stat_database_blks_hit[5m]))
/ (sum(rate(pg_stat_database_blks_hit[5m])) + sum(rate(pg_stat_database_blks_read[5m])))
* 100

# Transaction commit rate/sec
rate(pg_stat_database_xact_commit[5m])

# Replication lag (bytes)
pg_replication_lag

# Tables with high dead tuple count
topk(10, pg_stat_user_tables_n_dead_tup)

Troubleshooting

pg_up 0

  • Check DATA_SOURCE_NAME — verify host, port, user, and password
  • Ensure pg_monitor role is granted: \du xscaler_monitor in psql

High connection count

  • Check pg_stat_database_numbackends vs pg_settings_max_connections
  • Consider connection pooling (PgBouncer)