Zero-Downtime Deployments for ChatGPT Apps: Complete Production Guide

Deploying ChatGPT applications to production requires careful orchestration to maintain 100% uptime during updates. A single deployment failure can disconnect thousands of active conversations, violate SLA agreements, and damage user trust. This comprehensive guide provides production-ready implementations of zero-downtime deployment strategies specifically optimized for ChatGPT apps.

Modern ChatGPT applications face unique deployment challenges: long-lived WebSocket connections for streaming responses, stateful conversation contexts, database schema migrations that must remain backward-compatible with old application versions, and strict latency requirements from OpenAI's runtime. Traditional "stop-deploy-start" approaches are unacceptable for production systems serving real-time AI conversations.

Zero-downtime deployments eliminate service interruptions through sophisticated traffic management, health verification, and progressive rollout strategies. This guide covers three primary deployment patterns—rolling updates, blue-green deployments, and canary releases—with complete code examples for Kubernetes orchestration, comprehensive health checks, graceful connection draining, database migration coordination, and deployment monitoring. Whether you're running a single MCP server or a distributed ChatGPT application platform, these patterns ensure seamless updates without disrupting active users.

By implementing proper readiness probes, liveness checks, and graceful shutdown handlers, your ChatGPT apps can update multiple times per day while maintaining five-nines availability (99.999% uptime). Learn how to coordinate database migrations with application deployments, implement feature flags for phased rollouts, and monitor deployment health in real-time using Prometheus metrics.

Understanding Deployment Strategies

Rolling Updates: Progressive Pod Replacement

Rolling updates gradually replace old application pods with new versions, maintaining minimum replica counts throughout the process. Kubernetes manages the rollout automatically, ensuring new pods pass health checks before terminating old ones.

Advantages for ChatGPT Apps:

  • No infrastructure duplication required (cost-effective)
  • Automatic rollback on health check failures
  • Gradual traffic shift allows early error detection
  • Maintains conversation continuity during updates

Best Use Cases:

  • MCP server updates with backward-compatible protocol changes
  • Minor ChatGPT widget refinements
  • Database schema migrations with dual-write strategies
  • Incremental feature rollouts behind feature flags

Blue-Green Deployments: Complete Environment Swap

Blue-green deployments maintain two identical production environments ("blue" and "green"). Traffic routes to one environment while the other receives updates. After validation, traffic switches instantly to the updated environment.

Advantages for ChatGPT Apps:

  • Instant rollback capability (switch back to previous environment)
  • Full validation before production traffic exposure
  • Zero-risk database migration testing
  • Perfect for major OpenAI Apps SDK version upgrades

Best Use Cases:

  • Major MCP protocol version changes
  • ChatGPT app architecture refactors
  • High-risk database schema migrations
  • Compliance-sensitive deployments requiring full validation

Canary Releases: Gradual Traffic Shifting

Canary releases route a small percentage of production traffic to the new version, monitoring error rates and performance metrics before gradually increasing traffic exposure.

Advantages for ChatGPT Apps:

  • Minimal blast radius for deployment issues
  • Real production traffic validation
  • Data-driven rollout decisions based on metrics
  • Ideal for A/B testing widget designs

Best Use Cases:

  • New AI model integrations
  • Experimental ChatGPT widget features
  • Performance optimization validation
  • Third-party API integration changes

Kubernetes Rolling Updates Implementation

Production Deployment Configuration

This comprehensive Kubernetes deployment configuration implements zero-downtime rolling updates with sophisticated health checks and resource management:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: chatgpt-mcp-server
  namespace: production
  labels:
    app: chatgpt-mcp
    version: v2.1.0
    tier: backend
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2          # Create 2 extra pods during rollout
      maxUnavailable: 1    # Allow max 1 pod unavailable
  selector:
    matchLabels:
      app: chatgpt-mcp
  template:
    metadata:
      labels:
        app: chatgpt-mcp
        version: v2.1.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      terminationGracePeriodSeconds: 60  # Wait 60s for graceful shutdown

      # Pre-stop hook: drain connections before termination
      containers:
      - name: mcp-server
        image: gcr.io/your-project/chatgpt-mcp:v2.1.0
        imagePullPolicy: Always

        ports:
        - containerPort: 3000
          name: http
          protocol: TCP
        - containerPort: 9090
          name: metrics
          protocol: TCP

        env:
        - name: NODE_ENV
          value: "production"
        - name: MCP_VERSION
          value: "2024-11-05"
        - name: GRACEFUL_SHUTDOWN_TIMEOUT
          value: "55000"  # 55s (less than terminationGracePeriodSeconds)

        # Resource limits prevent noisy neighbor issues
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

        # Readiness probe: verify pod ready for traffic
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 10   # Wait 10s after start
          periodSeconds: 5          # Check every 5s
          timeoutSeconds: 3         # Fail if no response in 3s
          successThreshold: 2       # Require 2 consecutive successes
          failureThreshold: 3       # Remove from service after 3 failures

        # Liveness probe: detect crashed/hung pods
        livenessProbe:
          httpGet:
            path: /health/live
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 30   # Grace period for startup
          periodSeconds: 10         # Check every 10s
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3       # Restart after 3 consecutive failures

        # Startup probe: handle slow initialization
        startupProbe:
          httpGet:
            path: /health/startup
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 30      # Allow 150s total startup time

        # Graceful shutdown lifecycle hook
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                # Signal application to stop accepting new connections
                kill -SIGTERM 1
                # Wait for existing connections to drain
                sleep 30
---
apiVersion: v1
kind: Service
metadata:
  name: chatgpt-mcp-service
  namespace: production
spec:
  type: ClusterIP
  sessionAffinity: ClientIP  # Route same client to same pod
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600   # 1 hour session stickiness
  selector:
    app: chatgpt-mcp
  ports:
  - name: http
    port: 80
    targetPort: 3000
    protocol: TCP
  - name: metrics
    port: 9090
    targetPort: 9090
    protocol: TCP

Readiness Probe Implementation

The readiness probe determines when a pod can receive production traffic. This comprehensive implementation validates all critical dependencies:

// src/health/readiness.ts
import express from 'express';
import { Server } from 'http';
import { MCPProtocol } from '../mcp/protocol.js';
import { FirestoreClient } from '../database/firestore.js';
import { RedisClient } from '../cache/redis.js';

interface HealthStatus {
  ready: boolean;
  checks: {
    [key: string]: {
      status: 'pass' | 'fail' | 'warn';
      time: number;
      message?: string;
    };
  };
  timestamp: string;
}

export class ReadinessChecker {
  private mcpProtocol: MCPProtocol;
  private firestoreClient: FirestoreClient;
  private redisClient: RedisClient;
  private isShuttingDown = false;

  constructor(
    mcpProtocol: MCPProtocol,
    firestoreClient: FirestoreClient,
    redisClient: RedisClient
  ) {
    this.mcpProtocol = mcpProtocol;
    this.firestoreClient = firestoreClient;
    this.redisClient = redisClient;
  }

  /**
   * Express handler for readiness probe endpoint
   */
  async handler(
    req: express.Request,
    res: express.Response
  ): Promise<void> {
    const status = await this.check();

    const httpStatus = status.ready ? 200 : 503;
    res.status(httpStatus).json(status);
  }

  /**
   * Comprehensive readiness check
   */
  async check(): Promise<HealthStatus> {
    const startTime = Date.now();

    // If shutting down, immediately return not ready
    if (this.isShuttingDown) {
      return {
        ready: false,
        checks: {
          shutdown: {
            status: 'fail',
            time: 0,
            message: 'Pod is draining connections for shutdown'
          }
        },
        timestamp: new Date().toISOString()
      };
    }

    const checks = await Promise.all([
      this.checkMCPProtocol(),
      this.checkFirestore(),
      this.checkRedis(),
      this.checkMemory(),
      this.checkActiveConnections()
    ]);

    const checkResults = Object.fromEntries(checks);
    const allPassed = Object.values(checkResults)
      .every(c => c.status === 'pass');

    return {
      ready: allPassed,
      checks: checkResults,
      timestamp: new Date().toISOString()
    };
  }

  private async checkMCPProtocol(): Promise<[string, any]> {
    const start = Date.now();
    try {
      const isInitialized = this.mcpProtocol.isInitialized();

      return ['mcp_protocol', {
        status: isInitialized ? 'pass' : 'fail',
        time: Date.now() - start,
        message: isInitialized ? 'MCP protocol initialized' : 'MCP protocol not ready'
      }];
    } catch (error) {
      return ['mcp_protocol', {
        status: 'fail',
        time: Date.now() - start,
        message: `MCP error: ${error.message}`
      }];
    }
  }

  private async checkFirestore(): Promise<[string, any]> {
    const start = Date.now();
    try {
      // Lightweight query to verify Firestore connectivity
      await this.firestoreClient.collection('_health')
        .limit(1)
        .get();

      return ['firestore', {
        status: 'pass',
        time: Date.now() - start
      }];
    } catch (error) {
      return ['firestore', {
        status: 'fail',
        time: Date.now() - start,
        message: `Firestore unavailable: ${error.message}`
      }];
    }
  }

  private async checkRedis(): Promise<[string, any]> {
    const start = Date.now();
    try {
      await this.redisClient.ping();

      return ['redis', {
        status: 'pass',
        time: Date.now() - start
      }];
    } catch (error) {
      return ['redis', {
        status: 'warn',  // Redis failures degraded, not fatal
        time: Date.now() - start,
        message: `Redis degraded: ${error.message}`
      }];
    }
  }

  private async checkMemory(): Promise<[string, any]> {
    const start = Date.now();
    const memUsage = process.memoryUsage();
    const heapUsedPercent = (memUsage.heapUsed / memUsage.heapTotal) * 100;

    return ['memory', {
      status: heapUsedPercent < 90 ? 'pass' : 'fail',
      time: Date.now() - start,
      message: `Heap usage: ${heapUsedPercent.toFixed(1)}%`
    }];
  }

  private async checkActiveConnections(): Promise<[string, any]> {
    const start = Date.now();
    const activeCount = this.mcpProtocol.getActiveConnectionCount();
    const maxConnections = 1000;

    return ['active_connections', {
      status: activeCount < maxConnections ? 'pass' : 'fail',
      time: Date.now() - start,
      message: `${activeCount}/${maxConnections} connections`
    }];
  }

  /**
   * Mark pod as shutting down (removes from load balancer)
   */
  markShuttingDown(): void {
    this.isShuttingDown = true;
  }
}

Liveness Probe Implementation

The liveness probe detects crashed or deadlocked pods. Unlike readiness, liveness failures trigger pod restarts:

// src/health/liveness.ts
import express from 'express';
import { EventEmitter } from 'events';

interface LivenessStatus {
  alive: boolean;
  uptime: number;
  checks: {
    [key: string]: {
      status: 'pass' | 'fail';
      message?: string;
    };
  };
  timestamp: string;
}

export class LivenessChecker extends EventEmitter {
  private startTime: number;
  private lastSuccessfulRequest: number;
  private requestTimeoutMs = 30000; // 30 seconds

  constructor() {
    super();
    this.startTime = Date.now();
    this.lastSuccessfulRequest = Date.now();
  }

  /**
   * Express handler for liveness probe endpoint
   */
  async handler(
    req: express.Request,
    res: express.Response
  ): Promise<void> {
    const status = await this.check();

    const httpStatus = status.alive ? 200 : 503;
    res.status(httpStatus).json(status);
  }

  /**
   * Liveness check: detect crashed/deadlocked state
   */
  async check(): Promise<LivenessStatus> {
    const uptime = Date.now() - this.startTime;

    const checks = {
      process: this.checkProcess(),
      event_loop: this.checkEventLoop(),
      requests: this.checkRecentRequests()
    };

    const allPassed = Object.values(checks)
      .every(c => c.status === 'pass');

    return {
      alive: allPassed,
      uptime,
      checks,
      timestamp: new Date().toISOString()
    };
  }

  private checkProcess(): { status: 'pass' | 'fail'; message?: string } {
    try {
      // Verify process still responsive
      const memUsage = process.memoryUsage();

      return {
        status: 'pass',
        message: `RSS: ${(memUsage.rss / 1024 / 1024).toFixed(0)}MB`
      };
    } catch (error) {
      return {
        status: 'fail',
        message: `Process check failed: ${error.message}`
      };
    }
  }

  private checkEventLoop(): { status: 'pass' | 'fail'; message?: string } {
    const start = Date.now();

    // Synchronous operation should complete instantly
    // If event loop blocked, this will take time
    let sum = 0;
    for (let i = 0; i < 1000; i++) {
      sum += i;
    }

    const elapsed = Date.now() - start;

    return {
      status: elapsed < 100 ? 'pass' : 'fail',
      message: `Event loop lag: ${elapsed}ms`
    };
  }

  private checkRecentRequests(): { status: 'pass' | 'fail'; message?: string } {
    const timeSinceLastRequest = Date.now() - this.lastSuccessfulRequest;

    return {
      status: timeSinceLastRequest < this.requestTimeoutMs ? 'pass' : 'fail',
      message: `Last request: ${(timeSinceLastRequest / 1000).toFixed(0)}s ago`
    };
  }

  /**
   * Call this on every successful request
   */
  recordSuccessfulRequest(): void {
    this.lastSuccessfulRequest = Date.now();
  }
}

Comprehensive Health Check Endpoint

This production-grade health check endpoint provides detailed diagnostics for monitoring systems and load balancers:

// src/routes/health.ts
import express from 'express';
import { ReadinessChecker } from '../health/readiness.js';
import { LivenessChecker } from '../health/liveness.js';
import { DependencyHealthChecker } from '../health/dependencies.js';

export function createHealthRouter(
  readinessChecker: ReadinessChecker,
  livenessChecker: LivenessChecker,
  dependencyChecker: DependencyHealthChecker
): express.Router {
  const router = express.Router();

  /**
   * Kubernetes readiness probe
   * Returns 200 when pod ready for traffic
   */
  router.get('/health/ready', async (req, res) => {
    await readinessChecker.handler(req, res);
  });

  /**
   * Kubernetes liveness probe
   * Returns 200 when pod alive (not deadlocked)
   */
  router.get('/health/live', async (req, res) => {
    await livenessChecker.handler(req, res);
  });

  /**
   * Kubernetes startup probe
   * Returns 200 when initialization complete
   */
  router.get('/health/startup', async (req, res) => {
    // Simple check: if we can respond, startup succeeded
    res.status(200).json({
      status: 'started',
      timestamp: new Date().toISOString()
    });
  });

  /**
   * Comprehensive health check with all dependencies
   * Used by monitoring systems (not Kubernetes probes)
   */
  router.get('/health', async (req, res) => {
    const [readiness, liveness, dependencies] = await Promise.all([
      readinessChecker.check(),
      livenessChecker.check(),
      dependencyChecker.checkAll()
    ]);

    const overallHealthy = readiness.ready && liveness.alive;

    res.status(overallHealthy ? 200 : 503).json({
      status: overallHealthy ? 'healthy' : 'unhealthy',
      readiness,
      liveness,
      dependencies,
      version: process.env.APP_VERSION || 'unknown',
      timestamp: new Date().toISOString()
    });
  });

  return router;
}

Dependency Health Checker

This module validates external service connectivity with circuit breaker patterns:

// src/health/dependencies.ts
import { FirestoreClient } from '../database/firestore.js';
import { RedisClient } from '../cache/redis.js';
import axios from 'axios';

interface DependencyStatus {
  [key: string]: {
    status: 'healthy' | 'degraded' | 'unavailable';
    latency: number;
    message?: string;
    lastChecked: string;
  };
}

export class DependencyHealthChecker {
  private firestoreClient: FirestoreClient;
  private redisClient: RedisClient;
  private openaiApiKey: string;

  constructor(
    firestoreClient: FirestoreClient,
    redisClient: RedisClient,
    openaiApiKey: string
  ) {
    this.firestoreClient = firestoreClient;
    this.redisClient = redisClient;
    this.openaiApiKey = openaiApiKey;
  }

  async checkAll(): Promise<DependencyStatus> {
    const checks = await Promise.allSettled([
      this.checkFirestore(),
      this.checkRedis(),
      this.checkOpenAI()
    ]);

    return {
      firestore: checks[0].status === 'fulfilled'
        ? checks[0].value
        : this.createFailedCheck('Firestore check threw exception'),
      redis: checks[1].status === 'fulfilled'
        ? checks[1].value
        : this.createFailedCheck('Redis check threw exception'),
      openai: checks[2].status === 'fulfilled'
        ? checks[2].value
        : this.createFailedCheck('OpenAI check threw exception')
    };
  }

  private async checkFirestore() {
    const start = Date.now();
    try {
      await this.firestoreClient
        .collection('_health')
        .limit(1)
        .get();

      const latency = Date.now() - start;

      return {
        status: latency < 500 ? 'healthy' : 'degraded',
        latency,
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      return {
        status: 'unavailable',
        latency: Date.now() - start,
        message: error.message,
        lastChecked: new Date().toISOString()
      };
    }
  }

  private async checkRedis() {
    const start = Date.now();
    try {
      await this.redisClient.ping();

      const latency = Date.now() - start;

      return {
        status: latency < 100 ? 'healthy' : 'degraded',
        latency,
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      return {
        status: 'unavailable',
        latency: Date.now() - start,
        message: error.message,
        lastChecked: new Date().toISOString()
      };
    }
  }

  private async checkOpenAI() {
    const start = Date.now();
    try {
      // Lightweight API call to verify OpenAI connectivity
      const response = await axios.get('https://api.openai.com/v1/models', {
        headers: {
          'Authorization': `Bearer ${this.openaiApiKey}`
        },
        timeout: 5000
      });

      const latency = Date.now() - start;

      return {
        status: response.status === 200 ? 'healthy' : 'degraded',
        latency,
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      return {
        status: 'unavailable',
        latency: Date.now() - start,
        message: error.message,
        lastChecked: new Date().toISOString()
      };
    }
  }

  private createFailedCheck(message: string) {
    return {
      status: 'unavailable' as const,
      latency: 0,
      message,
      lastChecked: new Date().toISOString()
    };
  }
}

Graceful Shutdown Handler

This implementation drains connections cleanly before pod termination:

// src/graceful-shutdown.ts
import { Server } from 'http';
import { MCPProtocol } from './mcp/protocol.js';
import { ReadinessChecker } from './health/readiness.js';

export class GracefulShutdownHandler {
  private server: Server;
  private mcpProtocol: MCPProtocol;
  private readinessChecker: ReadinessChecker;
  private shutdownTimeout: number;
  private isShuttingDown = false;

  constructor(
    server: Server,
    mcpProtocol: MCPProtocol,
    readinessChecker: ReadinessChecker,
    shutdownTimeout = 55000  // 55 seconds
  ) {
    this.server = server;
    this.mcpProtocol = mcpProtocol;
    this.readinessChecker = readinessChecker;
    this.shutdownTimeout = shutdownTimeout;

    this.registerSignalHandlers();
  }

  private registerSignalHandlers(): void {
    // Kubernetes sends SIGTERM for graceful shutdown
    process.on('SIGTERM', () => this.shutdown('SIGTERM'));
    process.on('SIGINT', () => this.shutdown('SIGINT'));
  }

  private async shutdown(signal: string): Promise<void> {
    if (this.isShuttingDown) {
      console.log('Shutdown already in progress, ignoring signal');
      return;
    }

    this.isShuttingDown = true;
    console.log(`Received ${signal}, starting graceful shutdown...`);

    // Step 1: Mark readiness probe as failed (removes from load balancer)
    this.readinessChecker.markShuttingDown();
    console.log('✓ Removed from load balancer (readiness = false)');

    // Step 2: Wait for load balancer to propagate (typically 5-10s)
    await this.sleep(10000);
    console.log('✓ Load balancer propagation complete');

    // Step 3: Stop accepting new connections
    this.server.close(() => {
      console.log('✓ HTTP server closed (no new connections)');
    });

    // Step 4: Drain existing MCP connections
    const drainStart = Date.now();
    const drainTimeout = this.shutdownTimeout - 15000; // Reserve 15s for final cleanup

    console.log(`Draining ${this.mcpProtocol.getActiveConnectionCount()} active connections...`);

    const drainPromise = this.mcpProtocol.drainConnections(drainTimeout);
    const timeoutPromise = this.sleep(drainTimeout).then(() => {
      throw new Error('Drain timeout exceeded');
    });

    try {
      await Promise.race([drainPromise, timeoutPromise]);
      console.log(`✓ All connections drained (${Date.now() - drainStart}ms)`);
    } catch (error) {
      const remaining = this.mcpProtocol.getActiveConnectionCount();
      console.warn(`⚠ Drain timeout: ${remaining} connections remain, force closing`);
      await this.mcpProtocol.forceCloseConnections();
    }

    // Step 5: Final cleanup
    console.log('Performing final cleanup...');
    await this.cleanup();

    console.log('✓ Graceful shutdown complete');
    process.exit(0);
  }

  private async cleanup(): Promise<void> {
    // Close database connections, flush logs, etc.
    await Promise.all([
      this.mcpProtocol.close(),
      // Add other cleanup tasks here
    ]);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Database Migration Coordination

Zero-downtime deployments require careful database migration coordination to maintain backward compatibility during rolling updates.

Migration Lock Manager

This distributed lock prevents simultaneous migrations from multiple pods:

// src/database/migration-lock.ts
import { FirestoreClient } from './firestore.js';

interface MigrationLock {
  migrationId: string;
  podName: string;
  acquiredAt: Date;
  expiresAt: Date;
}

export class MigrationLockManager {
  private firestore: FirestoreClient;
  private lockCollection = '_migration_locks';
  private lockTimeout = 300000; // 5 minutes

  constructor(firestore: FirestoreClient) {
    this.firestore = firestore;
  }

  /**
   * Acquire distributed lock for migration
   * Returns true if lock acquired, false if another pod holds lock
   */
  async acquireLock(migrationId: string): Promise<boolean> {
    const podName = process.env.HOSTNAME || 'unknown-pod';
    const lockDoc = this.firestore
      .collection(this.lockCollection)
      .doc(migrationId);

    try {
      // Atomic transaction to prevent race conditions
      await this.firestore.runTransaction(async (transaction) => {
        const doc = await transaction.get(lockDoc);

        if (doc.exists) {
          const lock = doc.data() as MigrationLock;
          const now = new Date();

          // Check if existing lock expired
          if (new Date(lock.expiresAt) > now) {
            throw new Error(`Migration locked by ${lock.podName}`);
          }

          // Lock expired, we can take it
          console.log(`Taking over expired lock from ${lock.podName}`);
        }

        // Acquire/renew lock
        const expiresAt = new Date(Date.now() + this.lockTimeout);
        transaction.set(lockDoc, {
          migrationId,
          podName,
          acquiredAt: new Date(),
          expiresAt
        });
      });

      console.log(`✓ Acquired migration lock: ${migrationId}`);
      return true;

    } catch (error) {
      console.log(`Failed to acquire migration lock: ${error.message}`);
      return false;
    }
  }

  /**
   * Release migration lock
   */
  async releaseLock(migrationId: string): Promise<void> {
    const lockDoc = this.firestore
      .collection(this.lockCollection)
      .doc(migrationId);

    await lockDoc.delete();
    console.log(`✓ Released migration lock: ${migrationId}`);
  }

  /**
   * Wait for migration lock with exponential backoff
   */
  async waitForLock(
    migrationId: string,
    maxWaitTime = 300000  // 5 minutes
  ): Promise<boolean> {
    const startTime = Date.now();
    let backoff = 1000; // Start with 1 second

    while (Date.now() - startTime < maxWaitTime) {
      const acquired = await this.acquireLock(migrationId);
      if (acquired) return true;

      // Exponential backoff (max 30s)
      await this.sleep(Math.min(backoff, 30000));
      backoff *= 2;
    }

    return false;
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Backward-Compatible Schema Migration

This SQL migration adds new columns while maintaining compatibility with old application versions:

-- Migration: add_conversation_metadata.sql
-- Version: 2026-01-15-001
-- Strategy: Expand-Contract pattern for zero-downtime

-- Phase 1: EXPAND (safe with old app version)
-- Add new nullable columns (old app ignores them)

ALTER TABLE conversations
ADD COLUMN metadata_json TEXT DEFAULT '{}',
ADD COLUMN model_version VARCHAR(50) DEFAULT 'gpt-4o',
ADD COLUMN token_count INTEGER DEFAULT 0,
ADD COLUMN created_at_v2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD INDEX idx_model_version (model_version),
ADD INDEX idx_created_at_v2 (created_at_v2);

-- Phase 2: BACKFILL (run after old pods terminated)
-- Populate new columns from existing data
-- Run this as a separate job, NOT during deployment

-- UPDATE conversations
-- SET
--   metadata_json = COALESCE(old_metadata_column, '{}'),
--   token_count = COALESCE(old_token_column, 0),
--   created_at_v2 = COALESCE(created_at, CURRENT_TIMESTAMP)
-- WHERE metadata_json = '{}';

-- Phase 3: CONTRACT (after new app version deployed)
-- Remove old columns in future migration
-- ONLY after 100% of pods running new version

-- ALTER TABLE conversations
-- DROP COLUMN old_metadata_column,
-- DROP COLUMN old_token_column;

-- Verification query (run after migration)
SELECT
  COUNT(*) as total_conversations,
  COUNT(metadata_json) as with_metadata,
  AVG(token_count) as avg_tokens
FROM conversations;

Feature Flag Implementation

Feature flags decouple deployments from feature releases, enabling gradual rollouts:

// src/feature-flags/manager.ts
import { FirestoreClient } from '../database/firestore.js';

interface FeatureFlag {
  name: string;
  enabled: boolean;
  rolloutPercentage: number;  // 0-100
  enabledForUsers?: string[];  // Specific user IDs
  disabledForUsers?: string[];
  metadata?: Record<string, any>;
}

export class FeatureFlagManager {
  private firestore: FirestoreClient;
  private cache = new Map<string, FeatureFlag>();
  private cacheTTL = 60000; // 1 minute
  private lastCacheUpdate = 0;

  constructor(firestore: FirestoreClient) {
    this.firestore = firestore;
  }

  /**
   * Check if feature enabled for specific user
   */
  async isEnabled(featureName: string, userId: string): Promise<boolean> {
    const flag = await this.getFlag(featureName);
    if (!flag) return false;

    // Global disable
    if (!flag.enabled) return false;

    // Explicit user disable
    if (flag.disabledForUsers?.includes(userId)) return false;

    // Explicit user enable
    if (flag.enabledForUsers?.includes(userId)) return true;

    // Percentage-based rollout (deterministic hash)
    const userHash = this.hashUserId(userId);
    return userHash < flag.rolloutPercentage;
  }

  private async getFlag(featureName: string): Promise<FeatureFlag | null> {
    // Check cache
    if (Date.now() - this.lastCacheUpdate < this.cacheTTL) {
      return this.cache.get(featureName) || null;
    }

    // Refresh cache
    await this.refreshCache();
    return this.cache.get(featureName) || null;
  }

  private async refreshCache(): Promise<void> {
    const snapshot = await this.firestore
      .collection('feature_flags')
      .get();

    this.cache.clear();
    snapshot.forEach(doc => {
      const flag = doc.data() as FeatureFlag;
      this.cache.set(flag.name, flag);
    });

    this.lastCacheUpdate = Date.now();
  }

  /**
   * Hash user ID to percentage (0-100)
   * Ensures consistent assignment across requests
   */
  private hashUserId(userId: string): number {
    let hash = 0;
    for (let i = 0; i < userId.length; i++) {
      hash = ((hash << 5) - hash) + userId.charCodeAt(i);
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash % 100);
  }
}

// Usage example
const featureFlags = new FeatureFlagManager(firestoreClient);

// In your MCP server endpoint
app.post('/mcp/tools/call', async (req, res) => {
  const { userId, toolName } = req.body;

  // Check if new tool version enabled for this user
  const useNewVersion = await featureFlags.isEnabled(
    'new_tool_implementation',
    userId
  );

  if (useNewVersion) {
    return handleToolCallV2(req, res);
  } else {
    return handleToolCallV1(req, res);
  }
});

Monitoring Deployments with Prometheus

Real-time deployment monitoring detects issues before they impact users.

Deployment Metrics Instrumentation

// src/metrics/deployment.ts
import promClient from 'prom-client';

export class DeploymentMetrics {
  private static deploymentInfo = new promClient.Gauge({
    name: 'app_deployment_info',
    help: 'Deployment metadata (version, timestamp)',
    labelNames: ['version', 'pod_name', 'deployment_time']
  });

  private static requestDuration = new promClient.Histogram({
    name: 'http_request_duration_seconds',
    help: 'HTTP request duration in seconds',
    labelNames: ['method', 'route', 'status_code', 'app_version'],
    buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10]
  });

  private static activeConnections = new promClient.Gauge({
    name: 'mcp_active_connections',
    help: 'Number of active MCP connections',
    labelNames: ['app_version']
  });

  private static errorRate = new promClient.Counter({
    name: 'http_errors_total',
    help: 'Total HTTP errors',
    labelNames: ['method', 'route', 'status_code', 'app_version']
  });

  static recordDeployment(version: string): void {
    const podName = process.env.HOSTNAME || 'unknown';
    const deploymentTime = new Date().toISOString();

    this.deploymentInfo.set(
      { version, pod_name: podName, deployment_time: deploymentTime },
      1
    );
  }

  static recordRequest(
    method: string,
    route: string,
    statusCode: number,
    durationSeconds: number
  ): void {
    const version = process.env.APP_VERSION || 'unknown';

    this.requestDuration.observe(
      { method, route, status_code: statusCode.toString(), app_version: version },
      durationSeconds
    );

    if (statusCode >= 500) {
      this.errorRate.inc({
        method,
        route,
        status_code: statusCode.toString(),
        app_version: version
      });
    }
  }

  static recordActiveConnections(count: number): void {
    const version = process.env.APP_VERSION || 'unknown';
    this.activeConnections.set({ app_version: version }, count);
  }

  static getRegistry(): promClient.Registry {
    return promClient.register;
  }
}

Learn more about deployment best practices in our ChatGPT App DevOps Guide and explore Blue-Green Deployment Strategies. For database migration patterns, see Database Schema Migrations for ChatGPT Apps.

Production Deployment Checklist

Before deploying ChatGPT apps to production, verify these critical requirements:

Pre-Deployment:

  • All health check endpoints return 200 OK
  • Database migrations tested in staging environment
  • Feature flags configured for gradual rollout
  • Prometheus metrics instrumentation validated
  • Load testing completed (minimum 2x expected traffic)
  • Rollback plan documented and tested

During Deployment:

  • Monitor Kubernetes pod rollout status (kubectl rollout status)
  • Watch Prometheus dashboards for error rate spikes
  • Verify new pods pass readiness probes before old pods terminate
  • Check application logs for startup errors
  • Validate active connection counts remain stable

Post-Deployment:

  • Smoke test critical MCP tools on production
  • Verify database migration completed successfully
  • Confirm zero increase in error rates
  • Check average response latency unchanged
  • Run full regression test suite
  • Monitor for 24 hours before declaring deployment successful

Explore our Enterprise ChatGPT App Deployment solutions for managed deployment pipelines with automated rollback and compliance validation. For comprehensive monitoring setup, see Production Monitoring for ChatGPT Apps.

Conclusion: Achieving Five-Nines Reliability

Zero-downtime deployments transform ChatGPT applications from fragile prototypes to enterprise-grade production systems. By implementing rolling updates with comprehensive health checks, graceful connection draining, and backward-compatible database migrations, your MCP servers can update continuously while maintaining 99.999% uptime.

The key to successful zero-downtime deployments lies in three principles: verify before promoting (readiness probes validate pods before traffic exposure), fail fast and rollback (automated health checks detect issues within seconds), and coordinate across layers (application code, database schema, and infrastructure must align during transitions).

Start with rolling updates for low-risk deployments, graduate to blue-green deployments for high-stakes releases, and implement canary releases for data-driven rollout decisions. With proper monitoring, feature flags, and graceful shutdown handlers, you'll deploy ChatGPT apps with confidence—multiple times per day, without disrupting a single active conversation.

Ready to deploy ChatGPT apps with enterprise-grade reliability? Start building with MakeAIHQ and get production-ready Kubernetes configurations, health check templates, and deployment automation included. From your first MCP server to scaling to millions of ChatGPT users, our platform ensures zero-downtime deployments at every stage.