Distributed Tracing with Jaeger for ChatGPT Apps
Building ChatGPT apps with MCP servers, widgets, and external APIs creates complex distributed systems where a single user interaction can trigger dozens of service calls. When latency issues or errors occur, traditional logging falls short—you need distributed tracing to understand the full request flow across all components.
Jaeger is an open-source, end-to-end distributed tracing system originally developed by Uber. It helps you monitor and troubleshoot transactions in complex microservices environments by tracking requests as they flow through your ChatGPT app infrastructure. With Jaeger, you can visualize the entire request lifecycle—from when ChatGPT invokes your MCP server tool, through database queries, external API calls, and widget rendering—all in a single timeline view.
This comprehensive guide demonstrates how to implement production-ready Jaeger tracing for ChatGPT applications. You'll learn how to instrument MCP servers with OpenTelemetry, trace tool invocations and widget renders, configure sampling strategies for high-traffic applications, and deploy Jaeger at scale using Kubernetes and Elasticsearch. By the end, you'll have complete visibility into your ChatGPT app's performance characteristics and the ability to diagnose issues that span multiple services.
Whether you're debugging slow tool invocations, tracking down intermittent errors, or optimizing your MCP server's performance profile, Jaeger distributed tracing provides the observability foundation you need. For a complete overview of ChatGPT app architecture, see our Complete Guide to Building ChatGPT Applications.
Understanding Jaeger Architecture for ChatGPT Apps
Jaeger's architecture consists of several key components working together to collect, store, and visualize traces:
Jaeger Client Libraries instrument your MCP server code using OpenTelemetry SDKs. These libraries create spans (individual operations) and traces (collections of spans representing a complete request flow). When ChatGPT invokes a tool, the client library automatically creates a root span and propagates trace context to downstream services.
Jaeger Agent is a network daemon that listens for spans sent over UDP, batches them, and forwards them to the collector. Running agents as sidecars in Kubernetes provides efficient local trace collection without adding network latency to your MCP server.
Jaeger Collector receives traces from agents, validates them, runs processing pipelines (sampling, enrichment), and writes them to storage. Collectors can scale horizontally to handle high ingestion rates from production ChatGPT apps serving millions of requests.
Storage Backend persists trace data for querying. Development environments typically use in-memory storage, while production deployments use Elasticsearch, Cassandra, or other scalable databases to store billions of spans with configurable retention policies.
Jaeger Query Service provides APIs and a React-based UI for searching, filtering, and visualizing traces. You can search by trace ID, service name, operation name, tags, or time range to find specific requests through your ChatGPT app.
For ChatGPT applications, the typical trace flow looks like this:
- ChatGPT sends request → Your MCP server receives it (root span created)
- MCP server calls tool handler → Database query executed (child span)
- Tool handler calls external API → HTTP request tracked (child span)
- Widget template rendered → Template processing measured (child span)
- Response returned to ChatGPT → Trace completed and sent to Jaeger
This end-to-end visibility is critical for understanding where time is spent in complex ChatGPT app workflows. Learn more about performance optimization in our MCP Server Performance Optimization guide.
Setting Up Jaeger Infrastructure
The fastest way to get started with Jaeger is using the all-in-one Docker image, which bundles all components into a single container. This is perfect for development and testing:
# docker-compose.yml - Jaeger All-in-One Development Setup
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one:1.52
container_name: jaeger-aio
restart: unless-stopped
environment:
# Collector configuration
COLLECTOR_ZIPKIN_HOST_PORT: ':9411'
COLLECTOR_OTLP_ENABLED: 'true'
# Storage configuration (in-memory)
SPAN_STORAGE_TYPE: 'memory'
# Sampling configuration
SAMPLING_STRATEGIES_FILE: '/etc/jaeger/sampling.json'
# Query configuration
QUERY_BASE_PATH: '/jaeger'
# Metrics configuration
METRICS_BACKEND: 'prometheus'
METRICS_HTTP_ROUTE: '/metrics'
ports:
# Jaeger UI
- '16686:16686'
# Collector endpoints
- '14268:14268' # HTTP collector
- '14250:14250' # gRPC collector
- '4317:4317' # OTLP gRPC
- '4318:4318' # OTLP HTTP
# Agent endpoints
- '6831:6831/udp' # Thrift compact
- '6832:6832/udp' # Thrift binary
- '5778:5778' # Serve configs
# Zipkin compatibility
- '9411:9411' # Zipkin HTTP
# Health check
- '14269:14269' # Admin port
volumes:
- ./jaeger-config:/etc/jaeger:ro
- jaeger-data:/tmp
healthcheck:
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:14269/']
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
networks:
- chatgpt-app-network
# Your MCP server
mcp-server:
build: ./mcp-server
container_name: chatgpt-mcp-server
restart: unless-stopped
environment:
# Jaeger configuration
JAEGER_AGENT_HOST: 'jaeger'
JAEGER_AGENT_PORT: '6831'
JAEGER_SAMPLER_TYPE: 'probabilistic'
JAEGER_SAMPLER_PARAM: '0.1'
JAEGER_SERVICE_NAME: 'chatgpt-mcp-server'
# OTLP configuration (alternative to Jaeger agent)
OTEL_EXPORTER_OTLP_ENDPOINT: 'http://jaeger:4318'
OTEL_EXPORTER_OTLP_PROTOCOL: 'http/protobuf'
OTEL_SERVICE_NAME: 'chatgpt-mcp-server'
depends_on:
jaeger:
condition: service_healthy
networks:
- chatgpt-app-network
ports:
- '3000:3000'
volumes:
jaeger-data:
driver: local
networks:
chatgpt-app-network:
driver: bridge
Create a sampling configuration file to control trace collection rates:
{
"service_strategies": [
{
"service": "chatgpt-mcp-server",
"type": "probabilistic",
"param": 0.1,
"operation_strategies": [
{
"operation": "tool:search_knowledge_base",
"type": "probabilistic",
"param": 1.0
},
{
"operation": "tool:generate_report",
"type": "probabilistic",
"param": 0.5
},
{
"operation": "widget:render_chart",
"type": "probabilistic",
"param": 0.2
}
]
}
],
"default_strategy": {
"type": "probabilistic",
"param": 0.001
}
}
Sampling strategies explained:
- Probabilistic: Sample a percentage of traces (0.1 = 10%, 1.0 = 100%)
- Rate Limiting: Sample up to N traces per second
- Remote: Fetch sampling decisions from Jaeger backend dynamically
- Const: Always sample (1) or never sample (0)
For ChatGPT apps, use operation-level sampling to trace critical tools at 100% while sampling less important operations at lower rates. This balances observability with storage costs.
Start Jaeger and verify it's running:
# Start all services
docker-compose up -d
# Check Jaeger health
curl http://localhost:14269/
# Open Jaeger UI
open http://localhost:16686
The Jaeger UI provides search, trace visualization, dependency graphs, and service performance analytics. For production deployments, see the Production Deployment section below.
Instrumenting Your MCP Server with OpenTelemetry
Jaeger uses OpenTelemetry as its instrumentation standard. OpenTelemetry provides vendor-neutral APIs and SDKs for generating, collecting, and exporting telemetry data. Here's how to instrument a TypeScript MCP server:
// src/instrumentation.ts - OpenTelemetry Tracer Configuration
import { NodeSDK } from '@opentelemetry/sdk-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
import { PgInstrumentation } from '@opentelemetry/instrumentation-pg';
import { RedisInstrumentation } from '@opentelemetry/instrumentation-redis-4';
import { trace, context, SpanStatusCode, SpanKind } from '@opentelemetry/api';
/**
* Initialize OpenTelemetry instrumentation for Jaeger tracing
*
* This configures auto-instrumentation for HTTP, Express, PostgreSQL, and Redis
* and sets up trace export to Jaeger via OTLP or Jaeger agent.
*/
export function initializeTracing() {
const serviceName = process.env.OTEL_SERVICE_NAME || 'chatgpt-mcp-server';
const serviceVersion = process.env.SERVICE_VERSION || '1.0.0';
const deploymentEnvironment = process.env.DEPLOYMENT_ENV || 'development';
// Create resource with service metadata
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: serviceName,
[SemanticResourceAttributes.SERVICE_VERSION]: serviceVersion,
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: deploymentEnvironment,
[SemanticResourceAttributes.SERVICE_NAMESPACE]: 'chatgpt-apps',
[SemanticResourceAttributes.SERVICE_INSTANCE_ID]: process.env.HOSTNAME || 'local',
});
// Configure trace exporter (OTLP or Jaeger)
let traceExporter;
if (process.env.OTEL_EXPORTER_OTLP_ENDPOINT) {
// Use OTLP exporter (recommended for Jaeger 1.35+)
traceExporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
headers: {
'Authorization': process.env.OTEL_EXPORTER_OTLP_HEADERS || '',
},
});
} else {
// Use legacy Jaeger exporter
traceExporter = new JaegerExporter({
agentHost: process.env.JAEGER_AGENT_HOST || 'localhost',
agentPort: parseInt(process.env.JAEGER_AGENT_PORT || '6831', 10),
});
}
// Initialize SDK with auto-instrumentation
const sdk = new NodeSDK({
resource,
spanProcessor: new BatchSpanProcessor(traceExporter, {
maxQueueSize: 2048,
maxExportBatchSize: 512,
scheduledDelayMillis: 5000,
exportTimeoutMillis: 30000,
}),
instrumentations: [
// HTTP/HTTPS instrumentation
new HttpInstrumentation({
ignoreIncomingRequestHook: (req) => {
// Don't trace health checks
return req.url === '/health' || req.url === '/metrics';
},
requestHook: (span, request) => {
span.setAttribute('http.user_agent', request.headers['user-agent'] || 'unknown');
},
}),
// Express instrumentation
new ExpressInstrumentation({
requestHook: (span, info) => {
span.setAttribute('express.type', info.layerType);
},
}),
// PostgreSQL instrumentation
new PgInstrumentation({
enhancedDatabaseReporting: true,
}),
// Redis instrumentation
new RedisInstrumentation(),
],
});
// Start SDK
sdk.start();
// Graceful shutdown
process.on('SIGTERM', () => {
sdk.shutdown()
.then(() => console.log('Tracing terminated'))
.catch((error) => console.error('Error shutting down tracing', error))
.finally(() => process.exit(0));
});
return sdk;
}
// Initialize at application startup
initializeTracing();
// Export tracer for manual instrumentation
export const tracer = trace.getTracer(
process.env.OTEL_SERVICE_NAME || 'chatgpt-mcp-server',
process.env.SERVICE_VERSION || '1.0.0'
);
Install the required dependencies:
npm install --save \
@opentelemetry/sdk-node \
@opentelemetry/api \
@opentelemetry/resources \
@opentelemetry/semantic-conventions \
@opentelemetry/exporter-jaeger \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/sdk-trace-base \
@opentelemetry/instrumentation-http \
@opentelemetry/instrumentation-express \
@opentelemetry/instrumentation-pg \
@opentelemetry/instrumentation-redis-4
Import the instrumentation module before any other imports in your application entry point:
// src/index.ts
import './instrumentation'; // MUST be first import
import express from 'express';
import { MCPServer } from './mcp-server';
const app = express();
const mcpServer = new MCPServer();
// Your application code...
The auto-instrumentation libraries will automatically create spans for HTTP requests, database queries, and cache operations. For custom operations like MCP tool invocations, you'll need manual instrumentation (see next section).
For more details on OpenTelemetry integration patterns, see our OpenTelemetry Integration guide.
Tracing MCP Server Tool Invocations
Auto-instrumentation covers infrastructure-level operations, but you need manual spans to trace MCP-specific operations like tool invocations and widget rendering. Here's a production-ready middleware for tracing MCP tools:
// src/middleware/tracing.ts - MCP Tool Tracing Middleware
import { tracer } from '../instrumentation';
import { context, trace, SpanStatusCode, SpanKind } from '@opentelemetry/api';
import type { MCPRequest, MCPResponse, MCPTool } from '../types/mcp';
/**
* Create a traced wrapper around an MCP tool handler
*
* This middleware automatically creates spans for tool invocations,
* captures input/output metadata, and propagates trace context.
*/
export function traceMCPTool<TInput, TOutput>(
tool: MCPTool<TInput, TOutput>
) {
return async (request: MCPRequest<TInput>): Promise<MCPResponse<TOutput>> => {
// Extract trace context from ChatGPT request headers
const activeContext = context.active();
return tracer.startActiveSpan(
`tool:${tool.name}`,
{
kind: SpanKind.SERVER,
attributes: {
// MCP-specific attributes
'mcp.tool.name': tool.name,
'mcp.tool.description': tool.description,
'mcp.tool.version': tool.version || '1.0.0',
// Request metadata
'mcp.request.id': request.id,
'mcp.request.timestamp': new Date().toISOString(),
// Input metadata (sanitized - no PII/secrets)
'mcp.tool.input.keys': Object.keys(request.params).join(','),
'mcp.tool.input.size': JSON.stringify(request.params).length,
// User context (if available)
'user.id': request.user?.id || 'anonymous',
'user.locale': request.user?.locale || 'unknown',
},
},
activeContext,
async (span) => {
try {
// Add custom span events
span.addEvent('tool:invocation:started', {
'tool.name': tool.name,
'input.validation': 'pending',
});
// Execute tool handler
const startTime = Date.now();
const result = await tool.handler(request.params, request.user);
const duration = Date.now() - startTime;
// Record successful execution
span.addEvent('tool:invocation:completed', {
'execution.duration_ms': duration,
'output.size': JSON.stringify(result).length,
});
// Add output metadata
span.setAttributes({
'mcp.tool.execution.duration_ms': duration,
'mcp.tool.execution.status': 'success',
'mcp.tool.output.type': typeof result,
'mcp.tool.output.size': JSON.stringify(result).length,
});
// Mark span as successful
span.setStatus({ code: SpanStatusCode.OK });
// Return response with trace context
return {
id: request.id,
result,
metadata: {
trace_id: span.spanContext().traceId,
span_id: span.spanContext().spanId,
execution_time_ms: duration,
},
};
} catch (error) {
// Record error in span
span.recordException(error as Error);
span.setStatus({
code: SpanStatusCode.ERROR,
message: (error as Error).message,
});
span.addEvent('tool:invocation:failed', {
'error.type': (error as Error).name,
'error.message': (error as Error).message,
'error.stack': (error as Error).stack || '',
});
// Re-throw error after recording
throw error;
} finally {
// End span (automatically recorded to Jaeger)
span.end();
}
}
);
};
}
/**
* Trace widget rendering operations
*/
export function traceWidgetRender(
widgetName: string,
renderFn: () => Promise<string>
) {
return tracer.startActiveSpan(
`widget:render:${widgetName}`,
{
kind: SpanKind.INTERNAL,
attributes: {
'widget.name': widgetName,
'widget.type': 'html+skybridge',
},
},
async (span) => {
try {
const startTime = Date.now();
const html = await renderFn();
const duration = Date.now() - startTime;
span.setAttributes({
'widget.render.duration_ms': duration,
'widget.output.size': html.length,
'widget.output.lines': html.split('\n').length,
});
span.setStatus({ code: SpanStatusCode.OK });
return html;
} catch (error) {
span.recordException(error as Error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw error;
} finally {
span.end();
}
}
);
}
/**
* Trace external API calls with propagation
*/
export async function traceExternalAPI<T>(
serviceName: string,
operation: string,
apiFn: () => Promise<T>
): Promise<T> {
return tracer.startActiveSpan(
`external:${serviceName}:${operation}`,
{
kind: SpanKind.CLIENT,
attributes: {
'peer.service': serviceName,
'external.operation': operation,
},
},
async (span) => {
try {
// Propagate trace context via headers
const traceHeaders = {};
trace.getSpanContext(context.active());
const result = await apiFn();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
span.recordException(error as Error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw error;
} finally {
span.end();
}
}
);
}
Use the tracing middleware in your MCP tools:
// src/tools/search.ts - Example Traced MCP Tool
import { traceMCPTool, traceExternalAPI, traceWidgetRender } from '../middleware/tracing';
import type { MCPTool } from '../types/mcp';
interface SearchInput {
query: string;
filters?: Record<string, any>;
limit?: number;
}
interface SearchOutput {
results: Array<{
id: string;
title: string;
score: number;
}>;
total: number;
}
const searchTool: MCPTool<SearchInput, SearchOutput> = {
name: 'search_knowledge_base',
description: 'Search knowledge base with semantic search',
version: '1.0.0',
handler: async (input, user) => {
// External API call is automatically traced
const results = await traceExternalAPI(
'elasticsearch',
'semantic_search',
async () => {
const response = await fetch('http://elasticsearch:9200/_search', {
method: 'POST',
body: JSON.stringify({
query: { match: { content: input.query } },
size: input.limit || 10,
}),
});
return response.json();
}
);
// Widget rendering is traced
const widgetHTML = await traceWidgetRender(
'search_results',
async () => {
return `<div class="search-results">...</div>`;
}
);
return {
results: results.hits.hits.map(hit => ({
id: hit._id,
title: hit._source.title,
score: hit._score,
})),
total: results.hits.total.value,
};
},
};
// Wrap with tracing middleware
export const tracedSearchTool = traceMCPTool(searchTool);
This creates a hierarchical trace:
tool:search_knowledge_base (150ms)
├─ external:elasticsearch:semantic_search (120ms)
│ └─ http:POST (115ms)
└─ widget:render:search_results (25ms)
Each span captures timing, attributes, and errors for complete request observability.
Analyzing Traces in the Jaeger UI
Once you have traces flowing into Jaeger, the UI provides powerful analysis capabilities:
Finding Traces
Search for traces using multiple criteria:
# Search by service
Service: chatgpt-mcp-server
Operation: tool:search_knowledge_base
Lookback: Last 1 hour
# Search by tags
Tags: error=true
Tags: user.id=user_12345
Tags: mcp.tool.execution.duration_ms>1000
# Search by trace ID
Trace ID: 3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f
Trace Timeline Visualization
Each trace shows a waterfall timeline of all spans:
tool:search_knowledge_base [=================] 150ms
external:elasticsearch:semantic_search [============ ] 120ms
http:POST /elasticsearch:9200/_search [=========== ] 115ms
pg:query SELECT * FROM cache [== ] 30ms
widget:render:search_results [==== ] 25ms
This immediately reveals:
- Bottlenecks: The Elasticsearch query takes 80% of total time
- Parallelization opportunities: Widget rendering could happen concurrently
- Unexpected delays: 30ms cache query within HTTP request
Dependency Graph
The System Architecture view shows service dependencies:
ChatGPT
↓
chatgpt-mcp-server (150ms avg)
↓
elasticsearch (120ms avg)
↓
postgresql (30ms avg)
This helps you understand:
- Service call patterns
- Latency contribution by service
- Critical path dependencies
Error Analysis
Filter traces with errors and examine failure patterns:
// Traces with errors show red highlights
tool:search_knowledge_base [ERROR]
external:elasticsearch:semantic_search [ERROR]
http:POST [500 Internal Server Error]
Error: Connection timeout after 30s
Jaeger captures the full error context including stack traces and request parameters.
Performance Metrics
The Compare view shows latency distributions:
P50: 120ms
P95: 350ms
P99: 890ms
Error rate: 0.3%
Use this to set SLOs and track performance regressions. For example, if P95 latency for tool:search_knowledge_base increases from 350ms to 600ms after a deployment, you can compare traces before/after to identify the cause.
Query traces programmatically via the Jaeger API:
#!/bin/bash
# query-traces.sh - Query Jaeger API for Trace Analysis
JAEGER_URL="http://localhost:16686"
SERVICE="chatgpt-mcp-server"
OPERATION="tool:search_knowledge_base"
START_TIME=$(date -u -d '1 hour ago' +%s%6N)
END_TIME=$(date -u +%s%6N)
# Find traces by service and operation
TRACES=$(curl -s "${JAEGER_URL}/api/traces?service=${SERVICE}&operation=${OPERATION}&start=${START_TIME}&end=${END_TIME}&limit=100")
# Extract trace IDs
TRACE_IDS=$(echo "$TRACES" | jq -r '.data[].traceID')
# Get full trace details
for TRACE_ID in $TRACE_IDS; do
echo "Analyzing trace: $TRACE_ID"
curl -s "${JAEGER_URL}/api/traces/${TRACE_ID}" | jq '.data[0].spans[] | {
operationName: .operationName,
duration: .duration,
tags: .tags
}'
done
# Calculate average duration
echo "$TRACES" | jq '[.data[].spans[] | select(.operationName == "tool:search_knowledge_base") | .duration] | add / length'
For more advanced analysis, integrate Jaeger with Prometheus and Grafana to create custom dashboards. See our Kubernetes Deployment guide for production monitoring setups.
Production Deployment with Kubernetes and Elasticsearch
The all-in-one Jaeger image is great for development, but production deployments require distributed architecture with dedicated collector, query, and storage components:
# jaeger-production.yaml - Production Jaeger on Kubernetes
apiVersion: v1
kind: Namespace
metadata:
name: observability
---
# Elasticsearch for trace storage
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: observability
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
env:
- name: cluster.name
value: jaeger-cluster
- name: discovery.seed_hosts
value: elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch
- name: cluster.initial_master_nodes
value: elasticsearch-0,elasticsearch-1,elasticsearch-2
- name: ES_JAVA_OPTS
value: "-Xms2g -Xmx2g"
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
resources:
requests:
memory: 4Gi
cpu: 1000m
limits:
memory: 4Gi
cpu: 2000m
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
---
# Jaeger Collector (ingestion)
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger-collector
namespace: observability
spec:
replicas: 3
selector:
matchLabels:
app: jaeger-collector
template:
metadata:
labels:
app: jaeger-collector
spec:
containers:
- name: jaeger-collector
image: jaegertracing/jaeger-collector:1.52
env:
- name: SPAN_STORAGE_TYPE
value: elasticsearch
- name: ES_SERVER_URLS
value: http://elasticsearch:9200
- name: ES_NUM_SHARDS
value: "3"
- name: ES_NUM_REPLICAS
value: "1"
- name: COLLECTOR_QUEUE_SIZE
value: "10000"
- name: COLLECTOR_NUM_WORKERS
value: "50"
ports:
- containerPort: 14250
name: grpc
- containerPort: 14268
name: http
- containerPort: 4317
name: otlp-grpc
- containerPort: 4318
name: otlp-http
resources:
requests:
memory: 2Gi
cpu: 1000m
limits:
memory: 4Gi
cpu: 2000m
livenessProbe:
httpGet:
path: /
port: 14269
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 14269
initialDelaySeconds: 10
periodSeconds: 5
---
# Jaeger Query (UI and API)
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger-query
namespace: observability
spec:
replicas: 2
selector:
matchLabels:
app: jaeger-query
template:
metadata:
labels:
app: jaeger-query
spec:
containers:
- name: jaeger-query
image: jaegertracing/jaeger-query:1.52
env:
- name: SPAN_STORAGE_TYPE
value: elasticsearch
- name: ES_SERVER_URLS
value: http://elasticsearch:9200
- name: QUERY_BASE_PATH
value: /jaeger
ports:
- containerPort: 16686
name: ui
- containerPort: 16687
name: grpc
resources:
requests:
memory: 512Mi
cpu: 500m
limits:
memory: 1Gi
cpu: 1000m
---
# Jaeger Agent (sidecar - deployed via DaemonSet)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: jaeger-agent
namespace: observability
spec:
selector:
matchLabels:
app: jaeger-agent
template:
metadata:
labels:
app: jaeger-agent
spec:
hostNetwork: true
containers:
- name: jaeger-agent
image: jaegertracing/jaeger-agent:1.52
env:
- name: REPORTER_GRPC_HOST_PORT
value: jaeger-collector:14250
ports:
- containerPort: 6831
protocol: UDP
name: thrift-compact
- containerPort: 6832
protocol: UDP
name: thrift-binary
- containerPort: 5778
name: config
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
---
# Services
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: observability
spec:
clusterIP: None
selector:
app: elasticsearch
ports:
- port: 9200
name: http
- port: 9300
name: transport
---
apiVersion: v1
kind: Service
metadata:
name: jaeger-collector
namespace: observability
spec:
selector:
app: jaeger-collector
ports:
- port: 14250
name: grpc
- port: 14268
name: http
- port: 4317
name: otlp-grpc
- port: 4318
name: otlp-http
---
apiVersion: v1
kind: Service
metadata:
name: jaeger-query
namespace: observability
spec:
type: LoadBalancer
selector:
app: jaeger-query
ports:
- port: 80
targetPort: 16686
name: ui
- port: 16687
name: grpc
Deploy to Kubernetes:
# Apply Jaeger production deployment
kubectl apply -f jaeger-production.yaml
# Wait for Elasticsearch to be ready
kubectl wait --for=condition=ready pod -l app=elasticsearch -n observability --timeout=300s
# Check Jaeger collector status
kubectl get pods -n observability -l app=jaeger-collector
# Port-forward Jaeger UI (or use LoadBalancer IP)
kubectl port-forward -n observability svc/jaeger-query 16686:80
# Open UI
open http://localhost:16686
Production configuration best practices:
- Elasticsearch tuning: Use 3+ node cluster with replication for high availability
- Collector scaling: Autoscale based on ingestion rate (CPU/memory metrics)
- Agent sidecars: Deploy agents as DaemonSet for low-latency local collection
- Retention policies: Configure index lifecycle management to delete old traces (7-30 days)
- Sampling strategies: Use adaptive sampling to balance cost and coverage
- Security: Enable authentication, TLS, and network policies
For complete Kubernetes deployment guides including monitoring and alerting, see our Kubernetes Deployment for ChatGPT Apps article.
Conclusion: Complete Observability for ChatGPT Applications
Distributed tracing with Jaeger transforms how you understand and optimize ChatGPT applications. By instrumenting your MCP servers with OpenTelemetry, you gain end-to-end visibility into every tool invocation, widget render, database query, and external API call. This observability foundation enables you to:
- Debug faster: Find the exact service and operation causing errors or latency
- Optimize smarter: Identify bottlenecks and parallelization opportunities with data
- Scale confidently: Understand dependency chains and failure modes before production issues arise
- Meet SLOs: Track P95/P99 latency and error rates across all ChatGPT app operations
The production-ready examples in this guide give you everything you need to implement Jaeger tracing: Docker Compose for development, TypeScript instrumentation for MCP servers, middleware for tool tracing, and Kubernetes deployment for production scale. Start with the all-in-one setup, instrument your critical tools, and analyze traces in the Jaeger UI to build high-performance ChatGPT applications your users will love.
For more advanced observability patterns, explore our related guides on OpenTelemetry Integration and MCP Server Performance Optimization.
Ready to Build High-Performance ChatGPT Apps?
Implementing distributed tracing is just one piece of building production-ready ChatGPT applications. MakeAIHQ provides a complete no-code platform for creating, deploying, and monitoring ChatGPT apps—with built-in observability, performance optimization, and one-click deployment to the ChatGPT App Store.
No infrastructure setup. No complex instrumentation. Just describe your app and deploy in 48 hours.
Start Building Your ChatGPT App →
Related Guides:
- Complete Guide to Building ChatGPT Applications
- OpenTelemetry Integration for ChatGPT Apps
- MCP Server Performance Optimization
- Kubernetes Deployment for ChatGPT Apps