Prometheus Metrics Collection for ChatGPT Apps

Production ChatGPT applications require robust observability to maintain performance, detect anomalies, and ensure optimal user experiences. Prometheus has become the de facto standard for metrics collection in cloud-native environments, offering a powerful pull-based model, flexible query language, and seamless integration with modern architectures.

Unlike traditional push-based monitoring systems, Prometheus scrapes metrics from instrumented applications at regular intervals, storing them in a time-series database optimized for querying and alerting. This approach provides several advantages for ChatGPT apps: reduced coupling between services, centralized configuration management, and automatic service discovery in dynamic environments like Kubernetes.

For MCP (Model Context Protocol) servers powering ChatGPT applications, Prometheus enables tracking of critical metrics like tool invocation rates, widget rendering performance, authentication latency, and business KPIs such as user engagement and conversion rates. Combined with alert rules and visualization dashboards, Prometheus transforms raw metrics into actionable insights that prevent incidents before they impact users.

This guide demonstrates production-ready Prometheus instrumentation for ChatGPT applications, covering metrics types, custom collectors, alert rules, and service discovery configurations that scale from development to enterprise deployments.

Understanding Prometheus Metrics Types

Prometheus defines four fundamental metric types, each suited for different measurement scenarios. Selecting the appropriate type ensures accurate data collection and enables meaningful queries for alerting and visualization.

Counter metrics monotonically increase over time and reset only on application restart. Use counters for tracking cumulative values like total MCP tool invocations, HTTP requests, or authentication attempts. Counters work perfectly for rate calculations using PromQL's rate() function, revealing trends like requests per second or error frequency. For example, mcp_tool_invocations_total tracks how many times each tool executes, enabling queries like rate(mcp_tool_invocations_total[5m]) to show invocation velocity.

Gauge metrics represent point-in-time measurements that can increase or decrease. Active WebSocket connections, memory usage, queue depth, and concurrent users are ideal gauge candidates. Unlike counters, gauges reflect current state rather than cumulative totals. The metric mcp_active_connections might show 247 concurrent WebSocket sessions, dropping to 198 as users disconnect. Gauges enable threshold alerts like "notify when active connections exceed 1000."

Histogram metrics sample observations and bucket them into configurable ranges, automatically providing count, sum, and bucket metrics. Histograms excel at measuring request durations, response sizes, and other distributions where percentiles matter. A histogram like mcp_tool_duration_seconds with buckets [0.1, 0.5, 1.0, 2.5, 5.0] enables calculating P95 latency: histogram_quantile(0.95, rate(mcp_tool_duration_seconds_bucket[5m])). This reveals that 95% of tool invocations complete within X seconds.

Summary metrics resemble histograms but calculate quantiles client-side during collection. While summaries provide exact percentiles without bucket configuration, they consume more resources and prevent server-side aggregation across instances. Histograms generally prove more flexible for distributed systems. Use summaries when you need precise quantiles for a single instance and can't predict appropriate bucket boundaries.

Understanding these types ensures your ChatGPT application exports metrics that accurately represent system behavior and support meaningful operational queries.

Implementing Custom Metrics for MCP Servers

ChatGPT applications built on MCP servers require specialized metrics beyond generic HTTP instrumentation. Custom metrics track business logic, protocol-specific operations, and application-level performance indicators that standard libraries cannot capture.

Here's a comprehensive TypeScript implementation of custom Prometheus metrics for an MCP server:

// src/metrics/prometheus.ts
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';

/**
 * Centralized Prometheus metrics registry for MCP server
 * Tracks tool invocations, widget operations, authentication, and business KPIs
 */
export class MCPMetricsCollector {
  private registry: Registry;

  // Tool invocation metrics
  public toolInvocationsTotal: Counter<string>;
  public toolDuration: Histogram<string>;
  public toolErrors: Counter<string>;

  // Widget metrics
  public widgetRenders: Counter<string>;
  public widgetRenderDuration: Histogram<string>;
  public widgetStateUpdates: Counter<string>;

  // Connection metrics
  public activeConnections: Gauge<string>;
  public connectionDuration: Histogram<string>;
  public authenticationAttempts: Counter<string>;
  public authenticationFailures: Counter<string>;

  // Business metrics
  public userSignups: Counter<string>;
  public subscriptionUpgrades: Counter<string>;
  public revenueTotal: Gauge<string>;

  // Resource metrics
  public databaseQueryDuration: Histogram<string>;
  public cacheHitRatio: Gauge<string>;
  public queueDepth: Gauge<string>;

  constructor(prefix: string = 'mcp') {
    this.registry = new Registry();

    // Enable default system metrics (CPU, memory, event loop, etc.)
    collectDefaultMetrics({
      register: this.registry,
      prefix: `${prefix}_`,
      gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5]
    });

    // Tool invocation counter
    this.toolInvocationsTotal = new Counter({
      name: `${prefix}_tool_invocations_total`,
      help: 'Total number of MCP tool invocations',
      labelNames: ['tool_name', 'status', 'user_tier'],
      registers: [this.registry]
    });

    // Tool duration histogram (optimized buckets for typical API latency)
    this.toolDuration = new Histogram({
      name: `${prefix}_tool_duration_seconds`,
      help: 'Duration of MCP tool executions in seconds',
      labelNames: ['tool_name', 'status'],
      buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
      registers: [this.registry]
    });

    // Tool error counter
    this.toolErrors = new Counter({
      name: `${prefix}_tool_errors_total`,
      help: 'Total number of MCP tool execution errors',
      labelNames: ['tool_name', 'error_type'],
      registers: [this.registry]
    });

    // Widget rendering metrics
    this.widgetRenders = new Counter({
      name: `${prefix}_widget_renders_total`,
      help: 'Total number of widget renders',
      labelNames: ['widget_type', 'display_mode'],
      registers: [this.registry]
    });

    this.widgetRenderDuration = new Histogram({
      name: `${prefix}_widget_render_duration_seconds`,
      help: 'Widget rendering duration in seconds',
      labelNames: ['widget_type', 'display_mode'],
      buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1],
      registers: [this.registry]
    });

    this.widgetStateUpdates = new Counter({
      name: `${prefix}_widget_state_updates_total`,
      help: 'Total number of widget state updates',
      labelNames: ['widget_type', 'update_source'],
      registers: [this.registry]
    });

    // Connection tracking
    this.activeConnections = new Gauge({
      name: `${prefix}_active_connections`,
      help: 'Current number of active WebSocket connections',
      labelNames: ['protocol', 'authenticated'],
      registers: [this.registry]
    });

    this.connectionDuration = new Histogram({
      name: `${prefix}_connection_duration_seconds`,
      help: 'WebSocket connection duration in seconds',
      labelNames: ['protocol', 'disconnect_reason'],
      buckets: [10, 30, 60, 300, 600, 1800, 3600],
      registers: [this.registry]
    });

    // Authentication metrics
    this.authenticationAttempts = new Counter({
      name: `${prefix}_authentication_attempts_total`,
      help: 'Total authentication attempts',
      labelNames: ['method', 'status'],
      registers: [this.registry]
    });

    this.authenticationFailures = new Counter({
      name: `${prefix}_authentication_failures_total`,
      help: 'Total authentication failures',
      labelNames: ['method', 'reason'],
      registers: [this.registry]
    });

    // Business KPIs
    this.userSignups = new Counter({
      name: `${prefix}_user_signups_total`,
      help: 'Total user signups',
      labelNames: ['source', 'tier'],
      registers: [this.registry]
    });

    this.subscriptionUpgrades = new Counter({
      name: `${prefix}_subscription_upgrades_total`,
      help: 'Total subscription tier upgrades',
      labelNames: ['from_tier', 'to_tier'],
      registers: [this.registry]
    });

    this.revenueTotal = new Gauge({
      name: `${prefix}_revenue_total_usd`,
      help: 'Total revenue in USD',
      labelNames: ['tier'],
      registers: [this.registry]
    });

    // Resource utilization metrics
    this.databaseQueryDuration = new Histogram({
      name: `${prefix}_database_query_duration_seconds`,
      help: 'Database query duration in seconds',
      labelNames: ['operation', 'collection'],
      buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5],
      registers: [this.registry]
    });

    this.cacheHitRatio = new Gauge({
      name: `${prefix}_cache_hit_ratio`,
      help: 'Cache hit ratio (0-1)',
      labelNames: ['cache_type'],
      registers: [this.registry]
    });

    this.queueDepth = new Gauge({
      name: `${prefix}_queue_depth`,
      help: 'Current queue depth for async operations',
      labelNames: ['queue_name'],
      registers: [this.registry]
    });
  }

  /**
   * Get metrics in Prometheus text exposition format
   */
  public async getMetrics(): Promise<string> {
    return this.registry.metrics();
  }

  /**
   * Get registry for custom metric registration
   */
  public getRegistry(): Registry {
    return this.registry;
  }

  /**
   * Track tool invocation with automatic timing
   */
  public async trackToolInvocation<T>(
    toolName: string,
    userTier: string,
    handler: () => Promise<T>
  ): Promise<T> {
    const endTimer = this.toolDuration.startTimer({ tool_name: toolName });

    try {
      const result = await handler();
      this.toolInvocationsTotal.inc({
        tool_name: toolName,
        status: 'success',
        user_tier: userTier
      });
      endTimer({ status: 'success' });
      return result;
    } catch (error) {
      this.toolInvocationsTotal.inc({
        tool_name: toolName,
        status: 'error',
        user_tier: userTier
      });
      this.toolErrors.inc({
        tool_name: toolName,
        error_type: error.name || 'UnknownError'
      });
      endTimer({ status: 'error' });
      throw error;
    }
  }

  /**
   * Track widget rendering with automatic timing
   */
  public async trackWidgetRender<T>(
    widgetType: string,
    displayMode: string,
    renderer: () => Promise<T>
  ): Promise<T> {
    const endTimer = this.widgetRenderDuration.startTimer({
      widget_type: widgetType,
      display_mode: displayMode
    });

    try {
      const result = await renderer();
      this.widgetRenders.inc({
        widget_type: widgetType,
        display_mode: displayMode
      });
      endTimer();
      return result;
    } catch (error) {
      endTimer();
      throw error;
    }
  }

  /**
   * Track connection lifecycle
   */
  public trackConnection(protocol: string, authenticated: boolean): () => void {
    this.activeConnections.inc({
      protocol,
      authenticated: authenticated.toString()
    });

    const startTime = Date.now();

    return (disconnectReason: string = 'normal') => {
      this.activeConnections.dec({
        protocol,
        authenticated: authenticated.toString()
      });

      const durationSeconds = (Date.now() - startTime) / 1000;
      this.connectionDuration.observe(
        { protocol, disconnect_reason: disconnectReason },
        durationSeconds
      );
    };
  }
}

// Singleton instance
export const metrics = new MCPMetricsCollector('mcp');

This metrics collector provides type-safe, production-ready instrumentation covering all aspects of MCP server operations. The singleton pattern ensures consistent metrics across your application modules.

Express Middleware for HTTP Metrics

Integrate Prometheus metrics into your Express.js MCP server with custom middleware that tracks HTTP performance, errors, and throughput:

// src/middleware/prometheus-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { metrics } from '../metrics/prometheus';
import { Counter, Histogram } from 'prom-client';

/**
 * Prometheus metrics middleware for Express applications
 * Tracks HTTP request metrics with automatic labeling
 */
export class PrometheusMiddleware {
  private httpRequestsTotal: Counter<string>;
  private httpRequestDuration: Histogram<string>;
  private httpRequestSize: Histogram<string>;
  private httpResponseSize: Histogram<string>;

  constructor() {
    const registry = metrics.getRegistry();

    this.httpRequestsTotal = new Counter({
      name: 'mcp_http_requests_total',
      help: 'Total number of HTTP requests',
      labelNames: ['method', 'route', 'status_code'],
      registers: [registry]
    });

    this.httpRequestDuration = new Histogram({
      name: 'mcp_http_request_duration_seconds',
      help: 'HTTP request duration in seconds',
      labelNames: ['method', 'route', 'status_code'],
      buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
      registers: [registry]
    });

    this.httpRequestSize = new Histogram({
      name: 'mcp_http_request_size_bytes',
      help: 'HTTP request size in bytes',
      labelNames: ['method', 'route'],
      buckets: [100, 1000, 5000, 10000, 50000, 100000, 500000],
      registers: [registry]
    });

    this.httpResponseSize = new Histogram({
      name: 'mcp_http_response_size_bytes',
      help: 'HTTP response size in bytes',
      labelNames: ['method', 'route', 'status_code'],
      buckets: [100, 1000, 5000, 10000, 50000, 100000, 500000],
      registers: [registry]
    });
  }

  /**
   * Express middleware function
   */
  public middleware() {
    return (req: Request, res: Response, next: NextFunction): void => {
      const startTime = Date.now();

      // Track request size
      const requestSize = parseInt(req.get('content-length') || '0', 10);
      if (requestSize > 0) {
        this.httpRequestSize.observe(
          { method: req.method, route: this.normalizeRoute(req.route?.path || req.path) },
          requestSize
        );
      }

      // Intercept response finish to capture metrics
      const originalEnd = res.end;
      const self = this;

      res.end = function(this: Response, ...args: any[]): Response {
        const duration = (Date.now() - startTime) / 1000;
        const route = self.normalizeRoute(req.route?.path || req.path);
        const statusCode = res.statusCode.toString();

        // Track request count
        self.httpRequestsTotal.inc({
          method: req.method,
          route,
          status_code: statusCode
        });

        // Track request duration
        self.httpRequestDuration.observe(
          { method: req.method, route, status_code: statusCode },
          duration
        );

        // Track response size
        const responseSize = parseInt(res.get('content-length') || '0', 10);
        if (responseSize > 0) {
          self.httpResponseSize.observe(
            { method: req.method, route, status_code: statusCode },
            responseSize
          );
        }

        return originalEnd.apply(this, args);
      };

      next();
    };
  }

  /**
   * Normalize route paths to prevent high cardinality labels
   * Converts /api/users/123/apps/456 -> /api/users/:id/apps/:id
   */
  private normalizeRoute(path: string): string {
    if (!path) return 'unknown';

    // Use Express route if available (already normalized)
    if (path.includes(':')) return path;

    // Basic normalization for dynamic paths
    return path
      .replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/gi, '/:uuid')
      .replace(/\/\d+/g, '/:id')
      .replace(/\/[a-f0-9]{24}/g, '/:objectid');
  }

  /**
   * Endpoint to expose metrics
   */
  public async metricsEndpoint(req: Request, res: Response): Promise<void> {
    try {
      res.set('Content-Type', 'text/plain; version=0.0.4; charset=utf-8');
      res.send(await metrics.getMetrics());
    } catch (error) {
      res.status(500).send('Error collecting metrics');
    }
  }
}

export const prometheusMiddleware = new PrometheusMiddleware();

Integrate this middleware into your Express application:

// src/index.ts
import express from 'express';
import { prometheusMiddleware } from './middleware/prometheus-middleware';

const app = express();

// Apply Prometheus middleware globally
app.use(prometheusMiddleware.middleware());

// Expose metrics endpoint (typically on /metrics)
app.get('/metrics', (req, res) => prometheusMiddleware.metricsEndpoint(req, res));

// Your MCP server routes
app.use('/mcp', mcpRouter);

app.listen(3000, () => console.log('MCP server running on port 3000'));

This configuration automatically tracks all HTTP traffic with minimal performance overhead, providing foundation metrics for Prometheus scraping.

Configuring Alert Rules for ChatGPT Applications

Prometheus alert rules transform metrics into actionable notifications when thresholds breach or anomalies occur. Well-designed alerts prevent incidents by catching issues early while minimizing false positives that cause alert fatigue.

Here's a comprehensive alerting configuration for MCP servers:

# prometheus/alerts/mcp-alerts.yml
groups:
  - name: mcp_availability
    interval: 30s
    rules:
      # Critical: MCP server down
      - alert: MCPServerDown
        expr: up{job="mcp-server"} == 0
        for: 1m
        labels:
          severity: critical
          component: mcp-server
        annotations:
          summary: "MCP server {{ $labels.instance }} is down"
          description: "MCP server at {{ $labels.instance }} has been unreachable for 1 minute. Immediate action required."
          runbook: "https://docs.makeaihq.com/runbooks/mcp-server-down"

      # High error rate on tool invocations
      - alert: HighToolErrorRate
        expr: |
          (
            rate(mcp_tool_invocations_total{status="error"}[5m])
            /
            rate(mcp_tool_invocations_total[5m])
          ) > 0.05
        for: 3m
        labels:
          severity: warning
          component: tool-execution
        annotations:
          summary: "High error rate on tool {{ $labels.tool_name }}"
          description: "Tool {{ $labels.tool_name }} error rate is {{ $value | humanizePercentage }} over the last 5 minutes (threshold: 5%)."
          runbook: "https://docs.makeaihq.com/runbooks/high-tool-error-rate"

      # Tool execution latency exceeds SLA
      - alert: ToolExecutionLatencyHigh
        expr: |
          histogram_quantile(0.95,
            rate(mcp_tool_duration_seconds_bucket[5m])
          ) > 2.5
        for: 5m
        labels:
          severity: warning
          component: tool-execution
        annotations:
          summary: "Tool {{ $labels.tool_name }} P95 latency exceeds SLA"
          description: "Tool {{ $labels.tool_name }} P95 latency is {{ $value }}s (SLA: 2.5s)."
          runbook: "https://docs.makeaihq.com/runbooks/tool-latency-high"

  - name: mcp_authentication
    interval: 30s
    rules:
      # High authentication failure rate
      - alert: HighAuthenticationFailureRate
        expr: |
          (
            rate(mcp_authentication_failures_total[5m])
            /
            rate(mcp_authentication_attempts_total[5m])
          ) > 0.20
        for: 5m
        labels:
          severity: warning
          component: authentication
        annotations:
          summary: "High authentication failure rate"
          description: "Authentication failure rate is {{ $value | humanizePercentage }} (threshold: 20%). Possible credential stuffing attack."
          runbook: "https://docs.makeaihq.com/runbooks/auth-failure-spike"

      # Sudden spike in authentication attempts (potential DDoS/brute force)
      - alert: AuthenticationAttemptSpike
        expr: |
          rate(mcp_authentication_attempts_total[1m]) > 100
        for: 2m
        labels:
          severity: critical
          component: authentication
        annotations:
          summary: "Spike in authentication attempts detected"
          description: "Authentication attempts spiked to {{ $value }}/sec. Possible brute force attack."
          runbook: "https://docs.makeaihq.com/runbooks/auth-spike"

  - name: mcp_capacity
    interval: 1m
    rules:
      # Active connections approaching limit
      - alert: ActiveConnectionsHigh
        expr: mcp_active_connections > 900
        for: 5m
        labels:
          severity: warning
          component: capacity
        annotations:
          summary: "Active connections approaching limit"
          description: "Active connections ({{ $value }}) approaching limit of 1000. Consider scaling."
          runbook: "https://docs.makeaihq.com/runbooks/scale-connections"

      # Database query latency degradation
      - alert: DatabaseQueryLatencyHigh
        expr: |
          histogram_quantile(0.95,
            rate(mcp_database_query_duration_seconds_bucket[5m])
          ) > 1.0
        for: 5m
        labels:
          severity: warning
          component: database
        annotations:
          summary: "Database query latency degraded"
          description: "P95 database query latency is {{ $value }}s for {{ $labels.collection }} (threshold: 1s)."
          runbook: "https://docs.makeaihq.com/runbooks/db-latency-high"

      # Memory usage high
      - alert: MemoryUsageHigh
        expr: |
          (
            process_resident_memory_bytes{job="mcp-server"}
            /
            machine_memory_bytes
          ) > 0.85
        for: 5m
        labels:
          severity: warning
          component: resources
        annotations:
          summary: "Memory usage high on {{ $labels.instance }}"
          description: "Memory usage is {{ $value | humanizePercentage }} (threshold: 85%)."
          runbook: "https://docs.makeaihq.com/runbooks/memory-usage-high"

  - name: mcp_business_metrics
    interval: 5m
    rules:
      # Signup rate drop
      - alert: SignupRateDrop
        expr: |
          rate(mcp_user_signups_total[1h]) < 0.5
          and
          hour() >= 9 and hour() <= 17
        for: 30m
        labels:
          severity: warning
          component: business
        annotations:
          summary: "User signup rate dropped significantly"
          description: "Signup rate dropped to {{ $value }}/sec during business hours (expected: >0.5/sec)."
          runbook: "https://docs.makeaihq.com/runbooks/signup-rate-drop"

      # Widget rendering failures
      - alert: WidgetRenderingIssues
        expr: |
          increase(mcp_widget_render_duration_seconds_count[5m]) == 0
          and
          increase(mcp_tool_invocations_total[5m]) > 10
        for: 5m
        labels:
          severity: critical
          component: widgets
        annotations:
          summary: "Widget rendering completely failed"
          description: "No widgets rendered in 5 minutes despite tool invocations. Critical rendering issue."
          runbook: "https://docs.makeaihq.com/runbooks/widget-render-failure"

# Recording rules for performance optimization
  - name: mcp_recording_rules
    interval: 30s
    rules:
      # Pre-aggregate error rate by tool
      - record: job:mcp_tool_error_rate:5m
        expr: |
          rate(mcp_tool_invocations_total{status="error"}[5m])
          /
          rate(mcp_tool_invocations_total[5m])

      # Pre-aggregate P95 latency by tool
      - record: job:mcp_tool_p95_latency:5m
        expr: |
          histogram_quantile(0.95,
            rate(mcp_tool_duration_seconds_bucket[5m])
          )

      # Pre-aggregate request rate by status code
      - record: job:mcp_http_request_rate:1m
        expr: |
          rate(mcp_http_requests_total[1m])

      # Pre-aggregate authentication success rate
      - record: job:mcp_auth_success_rate:5m
        expr: |
          1 - (
            rate(mcp_authentication_failures_total[5m])
            /
            rate(mcp_authentication_attempts_total[5m])
          )

Deploy alert rules to Prometheus:

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 30s
  external_labels:
    cluster: 'production'
    environment: 'prod'

# Load alert rules
rule_files:
  - '/etc/prometheus/alerts/*.yml'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

# Scrape configurations
scrape_configs:
  - job_name: 'mcp-server'
    static_configs:
      - targets:
          - 'mcp-server-1:3000'
          - 'mcp-server-2:3000'
          - 'mcp-server-3:3000'
    metrics_path: /metrics
    scrape_interval: 15s
    scrape_timeout: 10s

Recording rules pre-aggregate expensive queries, reducing dashboard load times and alert evaluation overhead. Alert rules fire when conditions persist beyond the for duration, preventing flapping from transient spikes.

Service Discovery and Kubernetes Integration

Prometheus excels at monitoring dynamic environments through service discovery mechanisms that automatically detect new instances, update targets, and remove terminated services without manual configuration changes.

For Kubernetes deployments, use ServiceMonitor custom resources (requires Prometheus Operator):

# k8s/prometheus/servicemonitor.yml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mcp-server
  namespace: monitoring
  labels:
    app: mcp-server
    prometheus: kube-prometheus
spec:
  # Select services to monitor
  selector:
    matchLabels:
      app: mcp-server

  # Namespace to discover services
  namespaceSelector:
    matchNames:
      - production
      - staging

  # Endpoint configuration
  endpoints:
    - port: metrics
      path: /metrics
      interval: 15s
      scrapeTimeout: 10s

      # Relabeling to add custom labels
      relabelings:
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - sourceLabels: [__meta_kubernetes_pod_node_name]
          targetLabel: node
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: namespace
        - sourceLabels: [__meta_kubernetes_pod_label_version]
          targetLabel: version

      # Metric relabeling (drop unnecessary metrics)
      metricRelabelings:
        - sourceLabels: [__name__]
          regex: 'go_gc_.*'
          action: drop

Corresponding Kubernetes Service definition:

# k8s/mcp-server-service.yml
apiVersion: v1
kind: Service
metadata:
  name: mcp-server
  namespace: production
  labels:
    app: mcp-server
spec:
  selector:
    app: mcp-server
  ports:
    - name: http
      port: 3000
      targetPort: 3000
    - name: metrics
      port: 9090
      targetPort: 3000
  type: ClusterIP

For non-Kubernetes environments, configure static targets or file-based service discovery:

# prometheus/prometheus.yml - File-based service discovery
scrape_configs:
  - job_name: 'mcp-server'
    file_sd_configs:
      - files:
          - '/etc/prometheus/targets/mcp-*.json'
        refresh_interval: 30s

# /etc/prometheus/targets/mcp-production.json
[
  {
    "targets": [
      "10.0.1.10:3000",
      "10.0.1.11:3000",
      "10.0.1.12:3000"
    ],
    "labels": {
      "environment": "production",
      "region": "us-east-1",
      "version": "v2.3.1"
    }
  }
]

Service discovery eliminates manual target management, ensuring Prometheus always monitors your complete fleet as it scales up or down dynamically.

Sample PromQL Queries for ChatGPT Apps

Understanding PromQL (Prometheus Query Language) enables effective dashboard creation and ad-hoc investigation. Here are production-ready queries for MCP server monitoring:

# Tool invocation rate by tool name (requests per second)
rate(mcp_tool_invocations_total[5m])

# Total invocations in last hour by status
sum by (status) (increase(mcp_tool_invocations_total[1h]))

# P95 latency by tool (top 5 slowest tools)
topk(5,
  histogram_quantile(0.95,
    rate(mcp_tool_duration_seconds_bucket[5m])
  )
)

# Error rate percentage by tool
(
  rate(mcp_tool_invocations_total{status="error"}[5m])
  /
  rate(mcp_tool_invocations_total[5m])
) * 100

# Active connections by protocol
sum by (protocol) (mcp_active_connections)

# HTTP request rate by status code
sum by (status_code) (rate(mcp_http_requests_total[1m]))

# Average request duration by route
avg by (route) (rate(mcp_http_request_duration_seconds_sum[5m]) / rate(mcp_http_request_duration_seconds_count[5m]))

# Authentication success rate (percentage)
(
  1 - (
    rate(mcp_authentication_failures_total[5m])
    /
    rate(mcp_authentication_attempts_total[5m])
  )
) * 100

# Widget rendering throughput by type
sum by (widget_type) (rate(mcp_widget_renders_total[5m]))

# Cache hit ratio by cache type
mcp_cache_hit_ratio * 100

# Database query P99 latency by collection
histogram_quantile(0.99,
  sum by (collection, le) (
    rate(mcp_database_query_duration_seconds_bucket[5m])
  )
)

# Memory usage percentage
(process_resident_memory_bytes / machine_memory_bytes) * 100

# CPU usage rate
rate(process_cpu_seconds_total[1m]) * 100

# Signup conversion rate (signups per 100 visitors)
(
  rate(mcp_user_signups_total[1h])
  /
  rate(mcp_http_requests_total{route="/"}[1h])
) * 100

# Revenue by subscription tier
sum by (tier) (mcp_revenue_total_usd)

These queries form the foundation for Grafana monitoring dashboards that visualize your ChatGPT application's health and performance in real-time.

Build Production ChatGPT Apps with MakeAIHQ

Implementing comprehensive Prometheus monitoring is crucial for production ChatGPT applications, but it's just one component of a robust observability strategy. Combining metrics collection with distributed tracing via OpenTelemetry integration, structured logging, and proactive alerting strategies creates a complete observability platform.

MakeAIHQ streamlines ChatGPT app development by providing production-ready templates with built-in observability best practices. Our platform generates MCP servers with pre-configured Prometheus instrumentation, alert rules, and Grafana dashboards, eliminating weeks of infrastructure setup. Focus on building unique ChatGPT experiences while we handle the operational complexity.

Ready to launch your production ChatGPT application with enterprise-grade monitoring? Start building with MakeAIHQ's no-code platform and deploy to the ChatGPT App Store in 48 hours with confidence that your observability stack scales from prototype to millions of users.

For comprehensive guidance on building, monitoring, and scaling ChatGPT applications, explore our complete guide to building ChatGPT applications.


Related Resources: