Prometheus Metrics Collection for ChatGPT Apps
Production ChatGPT applications require robust observability to maintain performance, detect anomalies, and ensure optimal user experiences. Prometheus has become the de facto standard for metrics collection in cloud-native environments, offering a powerful pull-based model, flexible query language, and seamless integration with modern architectures.
Unlike traditional push-based monitoring systems, Prometheus scrapes metrics from instrumented applications at regular intervals, storing them in a time-series database optimized for querying and alerting. This approach provides several advantages for ChatGPT apps: reduced coupling between services, centralized configuration management, and automatic service discovery in dynamic environments like Kubernetes.
For MCP (Model Context Protocol) servers powering ChatGPT applications, Prometheus enables tracking of critical metrics like tool invocation rates, widget rendering performance, authentication latency, and business KPIs such as user engagement and conversion rates. Combined with alert rules and visualization dashboards, Prometheus transforms raw metrics into actionable insights that prevent incidents before they impact users.
This guide demonstrates production-ready Prometheus instrumentation for ChatGPT applications, covering metrics types, custom collectors, alert rules, and service discovery configurations that scale from development to enterprise deployments.
Understanding Prometheus Metrics Types
Prometheus defines four fundamental metric types, each suited for different measurement scenarios. Selecting the appropriate type ensures accurate data collection and enables meaningful queries for alerting and visualization.
Counter metrics monotonically increase over time and reset only on application restart. Use counters for tracking cumulative values like total MCP tool invocations, HTTP requests, or authentication attempts. Counters work perfectly for rate calculations using PromQL's rate() function, revealing trends like requests per second or error frequency. For example, mcp_tool_invocations_total tracks how many times each tool executes, enabling queries like rate(mcp_tool_invocations_total[5m]) to show invocation velocity.
Gauge metrics represent point-in-time measurements that can increase or decrease. Active WebSocket connections, memory usage, queue depth, and concurrent users are ideal gauge candidates. Unlike counters, gauges reflect current state rather than cumulative totals. The metric mcp_active_connections might show 247 concurrent WebSocket sessions, dropping to 198 as users disconnect. Gauges enable threshold alerts like "notify when active connections exceed 1000."
Histogram metrics sample observations and bucket them into configurable ranges, automatically providing count, sum, and bucket metrics. Histograms excel at measuring request durations, response sizes, and other distributions where percentiles matter. A histogram like mcp_tool_duration_seconds with buckets [0.1, 0.5, 1.0, 2.5, 5.0] enables calculating P95 latency: histogram_quantile(0.95, rate(mcp_tool_duration_seconds_bucket[5m])). This reveals that 95% of tool invocations complete within X seconds.
Summary metrics resemble histograms but calculate quantiles client-side during collection. While summaries provide exact percentiles without bucket configuration, they consume more resources and prevent server-side aggregation across instances. Histograms generally prove more flexible for distributed systems. Use summaries when you need precise quantiles for a single instance and can't predict appropriate bucket boundaries.
Understanding these types ensures your ChatGPT application exports metrics that accurately represent system behavior and support meaningful operational queries.
Implementing Custom Metrics for MCP Servers
ChatGPT applications built on MCP servers require specialized metrics beyond generic HTTP instrumentation. Custom metrics track business logic, protocol-specific operations, and application-level performance indicators that standard libraries cannot capture.
Here's a comprehensive TypeScript implementation of custom Prometheus metrics for an MCP server:
// src/metrics/prometheus.ts
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';
/**
* Centralized Prometheus metrics registry for MCP server
* Tracks tool invocations, widget operations, authentication, and business KPIs
*/
export class MCPMetricsCollector {
private registry: Registry;
// Tool invocation metrics
public toolInvocationsTotal: Counter<string>;
public toolDuration: Histogram<string>;
public toolErrors: Counter<string>;
// Widget metrics
public widgetRenders: Counter<string>;
public widgetRenderDuration: Histogram<string>;
public widgetStateUpdates: Counter<string>;
// Connection metrics
public activeConnections: Gauge<string>;
public connectionDuration: Histogram<string>;
public authenticationAttempts: Counter<string>;
public authenticationFailures: Counter<string>;
// Business metrics
public userSignups: Counter<string>;
public subscriptionUpgrades: Counter<string>;
public revenueTotal: Gauge<string>;
// Resource metrics
public databaseQueryDuration: Histogram<string>;
public cacheHitRatio: Gauge<string>;
public queueDepth: Gauge<string>;
constructor(prefix: string = 'mcp') {
this.registry = new Registry();
// Enable default system metrics (CPU, memory, event loop, etc.)
collectDefaultMetrics({
register: this.registry,
prefix: `${prefix}_`,
gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5]
});
// Tool invocation counter
this.toolInvocationsTotal = new Counter({
name: `${prefix}_tool_invocations_total`,
help: 'Total number of MCP tool invocations',
labelNames: ['tool_name', 'status', 'user_tier'],
registers: [this.registry]
});
// Tool duration histogram (optimized buckets for typical API latency)
this.toolDuration = new Histogram({
name: `${prefix}_tool_duration_seconds`,
help: 'Duration of MCP tool executions in seconds',
labelNames: ['tool_name', 'status'],
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
registers: [this.registry]
});
// Tool error counter
this.toolErrors = new Counter({
name: `${prefix}_tool_errors_total`,
help: 'Total number of MCP tool execution errors',
labelNames: ['tool_name', 'error_type'],
registers: [this.registry]
});
// Widget rendering metrics
this.widgetRenders = new Counter({
name: `${prefix}_widget_renders_total`,
help: 'Total number of widget renders',
labelNames: ['widget_type', 'display_mode'],
registers: [this.registry]
});
this.widgetRenderDuration = new Histogram({
name: `${prefix}_widget_render_duration_seconds`,
help: 'Widget rendering duration in seconds',
labelNames: ['widget_type', 'display_mode'],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1],
registers: [this.registry]
});
this.widgetStateUpdates = new Counter({
name: `${prefix}_widget_state_updates_total`,
help: 'Total number of widget state updates',
labelNames: ['widget_type', 'update_source'],
registers: [this.registry]
});
// Connection tracking
this.activeConnections = new Gauge({
name: `${prefix}_active_connections`,
help: 'Current number of active WebSocket connections',
labelNames: ['protocol', 'authenticated'],
registers: [this.registry]
});
this.connectionDuration = new Histogram({
name: `${prefix}_connection_duration_seconds`,
help: 'WebSocket connection duration in seconds',
labelNames: ['protocol', 'disconnect_reason'],
buckets: [10, 30, 60, 300, 600, 1800, 3600],
registers: [this.registry]
});
// Authentication metrics
this.authenticationAttempts = new Counter({
name: `${prefix}_authentication_attempts_total`,
help: 'Total authentication attempts',
labelNames: ['method', 'status'],
registers: [this.registry]
});
this.authenticationFailures = new Counter({
name: `${prefix}_authentication_failures_total`,
help: 'Total authentication failures',
labelNames: ['method', 'reason'],
registers: [this.registry]
});
// Business KPIs
this.userSignups = new Counter({
name: `${prefix}_user_signups_total`,
help: 'Total user signups',
labelNames: ['source', 'tier'],
registers: [this.registry]
});
this.subscriptionUpgrades = new Counter({
name: `${prefix}_subscription_upgrades_total`,
help: 'Total subscription tier upgrades',
labelNames: ['from_tier', 'to_tier'],
registers: [this.registry]
});
this.revenueTotal = new Gauge({
name: `${prefix}_revenue_total_usd`,
help: 'Total revenue in USD',
labelNames: ['tier'],
registers: [this.registry]
});
// Resource utilization metrics
this.databaseQueryDuration = new Histogram({
name: `${prefix}_database_query_duration_seconds`,
help: 'Database query duration in seconds',
labelNames: ['operation', 'collection'],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5],
registers: [this.registry]
});
this.cacheHitRatio = new Gauge({
name: `${prefix}_cache_hit_ratio`,
help: 'Cache hit ratio (0-1)',
labelNames: ['cache_type'],
registers: [this.registry]
});
this.queueDepth = new Gauge({
name: `${prefix}_queue_depth`,
help: 'Current queue depth for async operations',
labelNames: ['queue_name'],
registers: [this.registry]
});
}
/**
* Get metrics in Prometheus text exposition format
*/
public async getMetrics(): Promise<string> {
return this.registry.metrics();
}
/**
* Get registry for custom metric registration
*/
public getRegistry(): Registry {
return this.registry;
}
/**
* Track tool invocation with automatic timing
*/
public async trackToolInvocation<T>(
toolName: string,
userTier: string,
handler: () => Promise<T>
): Promise<T> {
const endTimer = this.toolDuration.startTimer({ tool_name: toolName });
try {
const result = await handler();
this.toolInvocationsTotal.inc({
tool_name: toolName,
status: 'success',
user_tier: userTier
});
endTimer({ status: 'success' });
return result;
} catch (error) {
this.toolInvocationsTotal.inc({
tool_name: toolName,
status: 'error',
user_tier: userTier
});
this.toolErrors.inc({
tool_name: toolName,
error_type: error.name || 'UnknownError'
});
endTimer({ status: 'error' });
throw error;
}
}
/**
* Track widget rendering with automatic timing
*/
public async trackWidgetRender<T>(
widgetType: string,
displayMode: string,
renderer: () => Promise<T>
): Promise<T> {
const endTimer = this.widgetRenderDuration.startTimer({
widget_type: widgetType,
display_mode: displayMode
});
try {
const result = await renderer();
this.widgetRenders.inc({
widget_type: widgetType,
display_mode: displayMode
});
endTimer();
return result;
} catch (error) {
endTimer();
throw error;
}
}
/**
* Track connection lifecycle
*/
public trackConnection(protocol: string, authenticated: boolean): () => void {
this.activeConnections.inc({
protocol,
authenticated: authenticated.toString()
});
const startTime = Date.now();
return (disconnectReason: string = 'normal') => {
this.activeConnections.dec({
protocol,
authenticated: authenticated.toString()
});
const durationSeconds = (Date.now() - startTime) / 1000;
this.connectionDuration.observe(
{ protocol, disconnect_reason: disconnectReason },
durationSeconds
);
};
}
}
// Singleton instance
export const metrics = new MCPMetricsCollector('mcp');
This metrics collector provides type-safe, production-ready instrumentation covering all aspects of MCP server operations. The singleton pattern ensures consistent metrics across your application modules.
Express Middleware for HTTP Metrics
Integrate Prometheus metrics into your Express.js MCP server with custom middleware that tracks HTTP performance, errors, and throughput:
// src/middleware/prometheus-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { metrics } from '../metrics/prometheus';
import { Counter, Histogram } from 'prom-client';
/**
* Prometheus metrics middleware for Express applications
* Tracks HTTP request metrics with automatic labeling
*/
export class PrometheusMiddleware {
private httpRequestsTotal: Counter<string>;
private httpRequestDuration: Histogram<string>;
private httpRequestSize: Histogram<string>;
private httpResponseSize: Histogram<string>;
constructor() {
const registry = metrics.getRegistry();
this.httpRequestsTotal = new Counter({
name: 'mcp_http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [registry]
});
this.httpRequestDuration = new Histogram({
name: 'mcp_http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [registry]
});
this.httpRequestSize = new Histogram({
name: 'mcp_http_request_size_bytes',
help: 'HTTP request size in bytes',
labelNames: ['method', 'route'],
buckets: [100, 1000, 5000, 10000, 50000, 100000, 500000],
registers: [registry]
});
this.httpResponseSize = new Histogram({
name: 'mcp_http_response_size_bytes',
help: 'HTTP response size in bytes',
labelNames: ['method', 'route', 'status_code'],
buckets: [100, 1000, 5000, 10000, 50000, 100000, 500000],
registers: [registry]
});
}
/**
* Express middleware function
*/
public middleware() {
return (req: Request, res: Response, next: NextFunction): void => {
const startTime = Date.now();
// Track request size
const requestSize = parseInt(req.get('content-length') || '0', 10);
if (requestSize > 0) {
this.httpRequestSize.observe(
{ method: req.method, route: this.normalizeRoute(req.route?.path || req.path) },
requestSize
);
}
// Intercept response finish to capture metrics
const originalEnd = res.end;
const self = this;
res.end = function(this: Response, ...args: any[]): Response {
const duration = (Date.now() - startTime) / 1000;
const route = self.normalizeRoute(req.route?.path || req.path);
const statusCode = res.statusCode.toString();
// Track request count
self.httpRequestsTotal.inc({
method: req.method,
route,
status_code: statusCode
});
// Track request duration
self.httpRequestDuration.observe(
{ method: req.method, route, status_code: statusCode },
duration
);
// Track response size
const responseSize = parseInt(res.get('content-length') || '0', 10);
if (responseSize > 0) {
self.httpResponseSize.observe(
{ method: req.method, route, status_code: statusCode },
responseSize
);
}
return originalEnd.apply(this, args);
};
next();
};
}
/**
* Normalize route paths to prevent high cardinality labels
* Converts /api/users/123/apps/456 -> /api/users/:id/apps/:id
*/
private normalizeRoute(path: string): string {
if (!path) return 'unknown';
// Use Express route if available (already normalized)
if (path.includes(':')) return path;
// Basic normalization for dynamic paths
return path
.replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/gi, '/:uuid')
.replace(/\/\d+/g, '/:id')
.replace(/\/[a-f0-9]{24}/g, '/:objectid');
}
/**
* Endpoint to expose metrics
*/
public async metricsEndpoint(req: Request, res: Response): Promise<void> {
try {
res.set('Content-Type', 'text/plain; version=0.0.4; charset=utf-8');
res.send(await metrics.getMetrics());
} catch (error) {
res.status(500).send('Error collecting metrics');
}
}
}
export const prometheusMiddleware = new PrometheusMiddleware();
Integrate this middleware into your Express application:
// src/index.ts
import express from 'express';
import { prometheusMiddleware } from './middleware/prometheus-middleware';
const app = express();
// Apply Prometheus middleware globally
app.use(prometheusMiddleware.middleware());
// Expose metrics endpoint (typically on /metrics)
app.get('/metrics', (req, res) => prometheusMiddleware.metricsEndpoint(req, res));
// Your MCP server routes
app.use('/mcp', mcpRouter);
app.listen(3000, () => console.log('MCP server running on port 3000'));
This configuration automatically tracks all HTTP traffic with minimal performance overhead, providing foundation metrics for Prometheus scraping.
Configuring Alert Rules for ChatGPT Applications
Prometheus alert rules transform metrics into actionable notifications when thresholds breach or anomalies occur. Well-designed alerts prevent incidents by catching issues early while minimizing false positives that cause alert fatigue.
Here's a comprehensive alerting configuration for MCP servers:
# prometheus/alerts/mcp-alerts.yml
groups:
- name: mcp_availability
interval: 30s
rules:
# Critical: MCP server down
- alert: MCPServerDown
expr: up{job="mcp-server"} == 0
for: 1m
labels:
severity: critical
component: mcp-server
annotations:
summary: "MCP server {{ $labels.instance }} is down"
description: "MCP server at {{ $labels.instance }} has been unreachable for 1 minute. Immediate action required."
runbook: "https://docs.makeaihq.com/runbooks/mcp-server-down"
# High error rate on tool invocations
- alert: HighToolErrorRate
expr: |
(
rate(mcp_tool_invocations_total{status="error"}[5m])
/
rate(mcp_tool_invocations_total[5m])
) > 0.05
for: 3m
labels:
severity: warning
component: tool-execution
annotations:
summary: "High error rate on tool {{ $labels.tool_name }}"
description: "Tool {{ $labels.tool_name }} error rate is {{ $value | humanizePercentage }} over the last 5 minutes (threshold: 5%)."
runbook: "https://docs.makeaihq.com/runbooks/high-tool-error-rate"
# Tool execution latency exceeds SLA
- alert: ToolExecutionLatencyHigh
expr: |
histogram_quantile(0.95,
rate(mcp_tool_duration_seconds_bucket[5m])
) > 2.5
for: 5m
labels:
severity: warning
component: tool-execution
annotations:
summary: "Tool {{ $labels.tool_name }} P95 latency exceeds SLA"
description: "Tool {{ $labels.tool_name }} P95 latency is {{ $value }}s (SLA: 2.5s)."
runbook: "https://docs.makeaihq.com/runbooks/tool-latency-high"
- name: mcp_authentication
interval: 30s
rules:
# High authentication failure rate
- alert: HighAuthenticationFailureRate
expr: |
(
rate(mcp_authentication_failures_total[5m])
/
rate(mcp_authentication_attempts_total[5m])
) > 0.20
for: 5m
labels:
severity: warning
component: authentication
annotations:
summary: "High authentication failure rate"
description: "Authentication failure rate is {{ $value | humanizePercentage }} (threshold: 20%). Possible credential stuffing attack."
runbook: "https://docs.makeaihq.com/runbooks/auth-failure-spike"
# Sudden spike in authentication attempts (potential DDoS/brute force)
- alert: AuthenticationAttemptSpike
expr: |
rate(mcp_authentication_attempts_total[1m]) > 100
for: 2m
labels:
severity: critical
component: authentication
annotations:
summary: "Spike in authentication attempts detected"
description: "Authentication attempts spiked to {{ $value }}/sec. Possible brute force attack."
runbook: "https://docs.makeaihq.com/runbooks/auth-spike"
- name: mcp_capacity
interval: 1m
rules:
# Active connections approaching limit
- alert: ActiveConnectionsHigh
expr: mcp_active_connections > 900
for: 5m
labels:
severity: warning
component: capacity
annotations:
summary: "Active connections approaching limit"
description: "Active connections ({{ $value }}) approaching limit of 1000. Consider scaling."
runbook: "https://docs.makeaihq.com/runbooks/scale-connections"
# Database query latency degradation
- alert: DatabaseQueryLatencyHigh
expr: |
histogram_quantile(0.95,
rate(mcp_database_query_duration_seconds_bucket[5m])
) > 1.0
for: 5m
labels:
severity: warning
component: database
annotations:
summary: "Database query latency degraded"
description: "P95 database query latency is {{ $value }}s for {{ $labels.collection }} (threshold: 1s)."
runbook: "https://docs.makeaihq.com/runbooks/db-latency-high"
# Memory usage high
- alert: MemoryUsageHigh
expr: |
(
process_resident_memory_bytes{job="mcp-server"}
/
machine_memory_bytes
) > 0.85
for: 5m
labels:
severity: warning
component: resources
annotations:
summary: "Memory usage high on {{ $labels.instance }}"
description: "Memory usage is {{ $value | humanizePercentage }} (threshold: 85%)."
runbook: "https://docs.makeaihq.com/runbooks/memory-usage-high"
- name: mcp_business_metrics
interval: 5m
rules:
# Signup rate drop
- alert: SignupRateDrop
expr: |
rate(mcp_user_signups_total[1h]) < 0.5
and
hour() >= 9 and hour() <= 17
for: 30m
labels:
severity: warning
component: business
annotations:
summary: "User signup rate dropped significantly"
description: "Signup rate dropped to {{ $value }}/sec during business hours (expected: >0.5/sec)."
runbook: "https://docs.makeaihq.com/runbooks/signup-rate-drop"
# Widget rendering failures
- alert: WidgetRenderingIssues
expr: |
increase(mcp_widget_render_duration_seconds_count[5m]) == 0
and
increase(mcp_tool_invocations_total[5m]) > 10
for: 5m
labels:
severity: critical
component: widgets
annotations:
summary: "Widget rendering completely failed"
description: "No widgets rendered in 5 minutes despite tool invocations. Critical rendering issue."
runbook: "https://docs.makeaihq.com/runbooks/widget-render-failure"
# Recording rules for performance optimization
- name: mcp_recording_rules
interval: 30s
rules:
# Pre-aggregate error rate by tool
- record: job:mcp_tool_error_rate:5m
expr: |
rate(mcp_tool_invocations_total{status="error"}[5m])
/
rate(mcp_tool_invocations_total[5m])
# Pre-aggregate P95 latency by tool
- record: job:mcp_tool_p95_latency:5m
expr: |
histogram_quantile(0.95,
rate(mcp_tool_duration_seconds_bucket[5m])
)
# Pre-aggregate request rate by status code
- record: job:mcp_http_request_rate:1m
expr: |
rate(mcp_http_requests_total[1m])
# Pre-aggregate authentication success rate
- record: job:mcp_auth_success_rate:5m
expr: |
1 - (
rate(mcp_authentication_failures_total[5m])
/
rate(mcp_authentication_attempts_total[5m])
)
Deploy alert rules to Prometheus:
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 30s
external_labels:
cluster: 'production'
environment: 'prod'
# Load alert rules
rule_files:
- '/etc/prometheus/alerts/*.yml'
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Scrape configurations
scrape_configs:
- job_name: 'mcp-server'
static_configs:
- targets:
- 'mcp-server-1:3000'
- 'mcp-server-2:3000'
- 'mcp-server-3:3000'
metrics_path: /metrics
scrape_interval: 15s
scrape_timeout: 10s
Recording rules pre-aggregate expensive queries, reducing dashboard load times and alert evaluation overhead. Alert rules fire when conditions persist beyond the for duration, preventing flapping from transient spikes.
Service Discovery and Kubernetes Integration
Prometheus excels at monitoring dynamic environments through service discovery mechanisms that automatically detect new instances, update targets, and remove terminated services without manual configuration changes.
For Kubernetes deployments, use ServiceMonitor custom resources (requires Prometheus Operator):
# k8s/prometheus/servicemonitor.yml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mcp-server
namespace: monitoring
labels:
app: mcp-server
prometheus: kube-prometheus
spec:
# Select services to monitor
selector:
matchLabels:
app: mcp-server
# Namespace to discover services
namespaceSelector:
matchNames:
- production
- staging
# Endpoint configuration
endpoints:
- port: metrics
path: /metrics
interval: 15s
scrapeTimeout: 10s
# Relabeling to add custom labels
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: pod
- sourceLabels: [__meta_kubernetes_pod_node_name]
targetLabel: node
- sourceLabels: [__meta_kubernetes_namespace]
targetLabel: namespace
- sourceLabels: [__meta_kubernetes_pod_label_version]
targetLabel: version
# Metric relabeling (drop unnecessary metrics)
metricRelabelings:
- sourceLabels: [__name__]
regex: 'go_gc_.*'
action: drop
Corresponding Kubernetes Service definition:
# k8s/mcp-server-service.yml
apiVersion: v1
kind: Service
metadata:
name: mcp-server
namespace: production
labels:
app: mcp-server
spec:
selector:
app: mcp-server
ports:
- name: http
port: 3000
targetPort: 3000
- name: metrics
port: 9090
targetPort: 3000
type: ClusterIP
For non-Kubernetes environments, configure static targets or file-based service discovery:
# prometheus/prometheus.yml - File-based service discovery
scrape_configs:
- job_name: 'mcp-server'
file_sd_configs:
- files:
- '/etc/prometheus/targets/mcp-*.json'
refresh_interval: 30s
# /etc/prometheus/targets/mcp-production.json
[
{
"targets": [
"10.0.1.10:3000",
"10.0.1.11:3000",
"10.0.1.12:3000"
],
"labels": {
"environment": "production",
"region": "us-east-1",
"version": "v2.3.1"
}
}
]
Service discovery eliminates manual target management, ensuring Prometheus always monitors your complete fleet as it scales up or down dynamically.
Sample PromQL Queries for ChatGPT Apps
Understanding PromQL (Prometheus Query Language) enables effective dashboard creation and ad-hoc investigation. Here are production-ready queries for MCP server monitoring:
# Tool invocation rate by tool name (requests per second)
rate(mcp_tool_invocations_total[5m])
# Total invocations in last hour by status
sum by (status) (increase(mcp_tool_invocations_total[1h]))
# P95 latency by tool (top 5 slowest tools)
topk(5,
histogram_quantile(0.95,
rate(mcp_tool_duration_seconds_bucket[5m])
)
)
# Error rate percentage by tool
(
rate(mcp_tool_invocations_total{status="error"}[5m])
/
rate(mcp_tool_invocations_total[5m])
) * 100
# Active connections by protocol
sum by (protocol) (mcp_active_connections)
# HTTP request rate by status code
sum by (status_code) (rate(mcp_http_requests_total[1m]))
# Average request duration by route
avg by (route) (rate(mcp_http_request_duration_seconds_sum[5m]) / rate(mcp_http_request_duration_seconds_count[5m]))
# Authentication success rate (percentage)
(
1 - (
rate(mcp_authentication_failures_total[5m])
/
rate(mcp_authentication_attempts_total[5m])
)
) * 100
# Widget rendering throughput by type
sum by (widget_type) (rate(mcp_widget_renders_total[5m]))
# Cache hit ratio by cache type
mcp_cache_hit_ratio * 100
# Database query P99 latency by collection
histogram_quantile(0.99,
sum by (collection, le) (
rate(mcp_database_query_duration_seconds_bucket[5m])
)
)
# Memory usage percentage
(process_resident_memory_bytes / machine_memory_bytes) * 100
# CPU usage rate
rate(process_cpu_seconds_total[1m]) * 100
# Signup conversion rate (signups per 100 visitors)
(
rate(mcp_user_signups_total[1h])
/
rate(mcp_http_requests_total{route="/"}[1h])
) * 100
# Revenue by subscription tier
sum by (tier) (mcp_revenue_total_usd)
These queries form the foundation for Grafana monitoring dashboards that visualize your ChatGPT application's health and performance in real-time.
Build Production ChatGPT Apps with MakeAIHQ
Implementing comprehensive Prometheus monitoring is crucial for production ChatGPT applications, but it's just one component of a robust observability strategy. Combining metrics collection with distributed tracing via OpenTelemetry integration, structured logging, and proactive alerting strategies creates a complete observability platform.
MakeAIHQ streamlines ChatGPT app development by providing production-ready templates with built-in observability best practices. Our platform generates MCP servers with pre-configured Prometheus instrumentation, alert rules, and Grafana dashboards, eliminating weeks of infrastructure setup. Focus on building unique ChatGPT experiences while we handle the operational complexity.
Ready to launch your production ChatGPT application with enterprise-grade monitoring? Start building with MakeAIHQ's no-code platform and deploy to the ChatGPT App Store in 48 hours with confidence that your observability stack scales from prototype to millions of users.
For comprehensive guidance on building, monitoring, and scaling ChatGPT applications, explore our complete guide to building ChatGPT applications.
Related Resources:
- OpenTelemetry Integration for ChatGPT Apps
- Grafana Monitoring Dashboards for ChatGPT Apps
- Alerting Strategies for ChatGPT Applications
- MCP Server Production Deployment Guide