TensorFlow Serving

Monitor TensorFlow Serving — request counts, latency percentiles, model load status, and runtime performance — using TF Serving's built-in Prometheus metrics endpoint.

Pattern: TF Serving /monitoring/prometheus/metrics → Prometheus scrape → xScaler remote_write

Prerequisites

TensorFlow Serving 2.x
xScaler tenant credentials (token + tenant ID)

Enable Metrics

Create a monitoring config file monitoring_config.txt:

prometheus_config: <
  enable: true,
  path: "/monitoring/prometheus/metrics"
>

Start TensorFlow Serving with:

tensorflow_model_server \
  --port=8500 \
  --rest_api_port=8501 \
  --model_name=my_model \
  --model_base_path=/models/my_model \
  --monitoring_config_file=monitoring_config.txt

Option A — Prometheus

scrape_configs:
  - job_name: tensorflow_serving
    static_configs:
      - targets: ['localhost:8501']
    metrics_path: /monitoring/prometheus/metrics

remote_write:
  - url: https://euw1-01.m.xscalerlabs.com/api/v1/push
    authorization:
      credentials: <token>
    headers:
      X-Scope-OrgID: <tenant-id>

Option B — Grafana Alloy

prometheus.scrape "tensorflow" {
  targets      = [{"__address__" = "localhost:8501"}]
  metrics_path = "/monitoring/prometheus/metrics"
  forward_to   = [prometheus.remote_write.xscaler.receiver]
}

prometheus.remote_write "xscaler" {
  endpoint {
    url = "https://euw1-01.m.xscalerlabs.com/api/v1/push"
    authorization {
      type        = "Bearer"
      credentials = "<token>"
    }
    headers = { "X-Scope-OrgID" = "<tenant-id>" }
  }
}

Option C — OpenTelemetry Collector

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: tensorflow_serving
          static_configs:
            - targets: ['localhost:8501']
          metrics_path: /monitoring/prometheus/metrics

processors:
  batch:
    timeout: 10s

exporters:
  otlphttp/xscaler:
    endpoint: https://euw1-01.m.xscalerlabs.com
    headers:
      Authorization: "Bearer <token>"
      X-Scope-OrgID: "<tenant-id>"
    compression: gzip

service:
  pipelines:
    metrics:
      receivers:  [prometheus]
      processors: [batch]
      exporters:  [otlphttp/xscaler]

Logs

Collect training and inference log — pipe TensorFlow output to a log file and tail it. Add the following to your Alloy config, adjusting __path__ to match your application's log file location:

local.file_match "tensorflow_logs" {
  path_targets = [{
    __address__ = "localhost",
    __path__    = "/var/log/tensorflow/app.log",
    instance    = constants.hostname,
    job         = "integrations/tensorflow",
  }]
}

loki.source.file "tensorflow_logs" {
  targets    = local.file_match.tensorflow_logs.targets
  forward_to = [loki.write.xscaler.receiver]
}

loki.write "xscaler" {
  endpoint {
    url = "https://euw1-01.l.xscalerlabs.com/api/v1/push"

    http_client_config {
      authorization {
        type        = "Bearer"
        credentials = env("XSCALER_TOKEN")
      }
    }

    headers = { "X-Scope-OrgID" = env("XSCALER_TENANT_ID") }
  }
}

Key metrics

Metric	Description
`:tensorflow_serving_request_count`	Total prediction requests
`:tensorflow_serving_request_latency`	Request latency histogram
`:tensorflow_serving_runtime_latency`	Model runtime latency
`tensorflow_core_graph_run_count`	TF graph execution count
`tensorflow_serving_model_handle_count`	Loaded model handles
`:tensorflow_serving_inference_count`	Total inference requests

Prerequisites​

Enable Metrics​

Option A — Prometheus​

Option B — Grafana Alloy​

Option C — OpenTelemetry Collector​

Logs​

Key metrics​