MCP Server Deployment: Docker, Kubernetes & Cloud Run Production Setup

Deploying Model Context Protocol (MCP) servers to production requires careful consideration of containerization, orchestration, scaling, and reliability. While local development with npx @modelcontextprotocol/inspector works great for testing, production workloads demand robust infrastructure, health monitoring, graceful shutdowns, and zero-downtime updates.

In this comprehensive guide, we'll walk through production-grade deployment patterns for MCP servers using Docker, Kubernetes, and Google Cloud Run. Whether you're deploying a simple fitness studio booking assistant or a complex multi-tenant ChatGPT app, these patterns ensure your MCP server can handle real-world traffic, scale automatically, and recover from failures gracefully.

We'll cover multi-stage Docker builds that reduce image sizes by 70%, Kubernetes deployments with horizontal pod autoscaling, Cloud Run configurations that scale to zero when idle, health check implementations that prevent traffic to unhealthy containers, and blue-green deployment strategies that eliminate downtime during updates.

By the end of this guide, you'll have production-ready deployment configurations you can adapt for your own MCP servers, complete with security best practices, monitoring hooks, and automated rollback mechanisms. Let's get started.

Docker Containerization: Building Production-Ready Images

Docker containerization is the foundation of modern MCP deployment. A well-designed Dockerfile creates lightweight, secure, and reproducible container images that run consistently across development, staging, and production environments.

Multi-Stage Builds for Minimal Image Size

Multi-stage builds separate the build environment from the runtime environment, dramatically reducing final image size. This approach installs build tools (TypeScript compiler, webpack, etc.) in a temporary build stage, then copies only the compiled artifacts to the final runtime stage.

Here's a production-ready Dockerfile for a TypeScript-based MCP server:

# ========================================
# STAGE 1: Build Stage
# ========================================
FROM node:20-alpine AS builder

# Install build dependencies
RUN apk add --no-cache python3 make g++ git

# Set working directory
WORKDIR /build

# Copy package files
COPY package*.json ./
COPY tsconfig.json ./

# Install ALL dependencies (including devDependencies)
RUN npm ci

# Copy source code
COPY src/ ./src/

# Build TypeScript to JavaScript
RUN npm run build

# Prune dev dependencies
RUN npm prune --production

# ========================================
# STAGE 2: Runtime Stage
# ========================================
FROM node:20-alpine

# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init

# Create non-root user
RUN addgroup -g 1001 mcpserver && \
    adduser -D -u 1001 -G mcpserver mcpserver

# Set working directory
WORKDIR /app

# Copy built artifacts from builder stage
COPY --from=builder --chown=mcpserver:mcpserver /build/dist ./dist
COPY --from=builder --chown=mcpserver:mcpserver /build/node_modules ./node_modules
COPY --from=builder --chown=mcpserver:mcpserver /build/package*.json ./

# Switch to non-root user
USER mcpserver

# Expose MCP server port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1); });"

# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]

# Start MCP server
CMD ["node", "dist/index.js"]

Key optimizations:

Multi-stage build: Reduces final image from 450MB to 120MB (73% reduction)
Alpine Linux: Minimal base image (5MB vs 200MB for Debian)
Non-root user: Security best practice (prevents privilege escalation)
dumb-init: Proper signal handling for graceful shutdowns
Built-in health check: Docker automatically monitors container health

Security Scanning and Hardening

Production images should be scanned for vulnerabilities before deployment. Integrate Trivy or Snyk into your CI/CD pipeline:

# Build image
docker build -t mcp-server:latest .

# Scan for vulnerabilities
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy:latest image --severity HIGH,CRITICAL mcp-server:latest

# If scan passes, tag for production
docker tag mcp-server:latest gcr.io/your-project/mcp-server:v1.2.3
docker push gcr.io/your-project/mcp-server:v1.2.3

Additional hardening measures:

Read-only filesystem: Add --read-only flag to prevent runtime modifications
No new privileges: Add --security-opt=no-new-privileges:true
Drop capabilities: Add --cap-drop=ALL to remove unnecessary Linux capabilities

For more Docker best practices, see the official Docker security documentation.

Kubernetes Deployment: Orchestrating MCP Servers at Scale

Kubernetes provides robust orchestration for MCP servers, enabling horizontal scaling, self-healing, and zero-downtime updates. A production Kubernetes deployment includes Deployment, Service, Ingress, HorizontalPodAutoscaler, and ConfigMap resources.

Complete Kubernetes Deployment Manifest

# ========================================
# ConfigMap: Environment Configuration
# ========================================
apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-server-config
  namespace: production
data:
  NODE_ENV: "production"
  LOG_LEVEL: "info"
  MCP_PORT: "3000"
  REDIS_HOST: "redis-master.production.svc.cluster.local"
  REDIS_PORT: "6379"

---
# ========================================
# Secret: Sensitive Credentials
# ========================================
apiVersion: v1
kind: Secret
metadata:
  name: mcp-server-secrets
  namespace: production
type: Opaque
data:
  # Base64 encoded values (use: echo -n "value" | base64)
  OPENAI_API_KEY: "c2stcHJvai14eHh4eHh4eHh4eHh4eHh4"
  DATABASE_URL: "cG9zdGdyZXNxbDovL3VzZXI6cGFzc0BkYi5leGFtcGxlLmNvbS9kYg=="

---
# ========================================
# Deployment: MCP Server Pods
# ========================================
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
  namespace: production
  labels:
    app: mcp-server
    version: v1.2.3
spec:
  replicas: 3
  revisionHistoryLimit: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
        version: v1.2.3
    spec:
      serviceAccountName: mcp-server
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        fsGroup: 1001

      containers:
      - name: mcp-server
        image: gcr.io/your-project/mcp-server:v1.2.3
        imagePullPolicy: IfNotPresent

        ports:
        - name: http
          containerPort: 3000
          protocol: TCP

        env:
        - name: NODE_ENV
          valueFrom:
            configMapKeyRef:
              name: mcp-server-config
              key: NODE_ENV
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: mcp-server-config
              key: LOG_LEVEL
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: mcp-server-secrets
              key: OPENAI_API_KEY

        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"

        livenessProbe:
          httpGet:
            path: /health/live
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2

        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]

---
# ========================================
# Service: Internal Load Balancer
# ========================================
apiVersion: v1
kind: Service
metadata:
  name: mcp-server
  namespace: production
  labels:
    app: mcp-server
spec:
  type: ClusterIP
  selector:
    app: mcp-server
  ports:
  - name: http
    port: 80
    targetPort: http
    protocol: TCP
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800

---
# ========================================
# HorizontalPodAutoscaler: Auto-Scaling
# ========================================
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120

---
# ========================================
# Ingress: External HTTPS Access
# ========================================
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-server-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.yourapp.com
    secretName: mcp-server-tls
  rules:
  - host: api.yourapp.com
    http:
      paths:
      - path: /mcp
        pathType: Prefix
        backend:
          service:
            name: mcp-server
            port:
              name: http

Key features:

Rolling updates: Zero-downtime deployments with maxUnavailable: 0
Auto-scaling: Automatically scales from 3 to 20 pods based on CPU/memory
Health checks: Separate liveness (container alive) and readiness (ready for traffic) probes
Resource limits: Prevents resource starvation and ensures fair scheduling
Security: Non-root user, read-only filesystem, secret management

For complete Kubernetes deployment guides, see the official Kubernetes documentation.

Cloud Run Deployment: Serverless Simplicity

Google Cloud Run offers a fully managed serverless platform for MCP servers. It automatically scales to zero when idle (saving costs) and scales up to handle traffic spikes, without managing infrastructure.

Cloud Run Configuration

# cloud-run-service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: mcp-server
  namespace: production
  annotations:
    run.googleapis.com/ingress: "all"
    run.googleapis.com/launch-stage: "GA"
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "100"
        run.googleapis.com/cpu-throttling: "false"
        run.googleapis.com/startup-cpu-boost: "true"
    spec:
      containerConcurrency: 80
      timeoutSeconds: 300
      serviceAccountName: mcp-server@your-project.iam.gserviceaccount.com

      containers:
      - name: mcp-server
        image: gcr.io/your-project/mcp-server:v1.2.3
        ports:
        - name: http1
          containerPort: 3000

        env:
        - name: NODE_ENV
          value: "production"
        - name: LOG_LEVEL
          value: "info"
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: api-key

        resources:
          limits:
            memory: "512Mi"
            cpu: "1000m"

        startupProbe:
          httpGet:
            path: /health/startup
            port: 3000
          initialDelaySeconds: 0
          periodSeconds: 1
          failureThreshold: 30

        livenessProbe:
          httpGet:
            path: /health/live
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10

Deployment Script

#!/bin/bash
# deploy-cloud-run.sh

set -euo pipefail

PROJECT_ID="your-project"
REGION="us-central1"
SERVICE_NAME="mcp-server"
IMAGE="gcr.io/${PROJECT_ID}/${SERVICE_NAME}:latest"

echo "🔨 Building Docker image..."
docker build -t "${IMAGE}" .

echo "📤 Pushing to Google Container Registry..."
docker push "${IMAGE}"

echo "🚀 Deploying to Cloud Run..."
gcloud run deploy "${SERVICE_NAME}" \
  --image="${IMAGE}" \
  --platform=managed \
  --region="${REGION}" \
  --project="${PROJECT_ID}" \
  --allow-unauthenticated \
  --min-instances=1 \
  --max-instances=100 \
  --memory=512Mi \
  --cpu=1 \
  --timeout=300s \
  --concurrency=80 \
  --set-env-vars="NODE_ENV=production,LOG_LEVEL=info" \
  --set-secrets="OPENAI_API_KEY=openai-credentials:latest" \
  --service-account="mcp-server@${PROJECT_ID}.iam.gserviceaccount.com"

echo "✅ Deployment complete!"

# Get the service URL
SERVICE_URL=$(gcloud run services describe "${SERVICE_NAME}" \
  --platform=managed \
  --region="${REGION}" \
  --project="${PROJECT_ID}" \
  --format="value(status.url)")

echo "🌐 Service URL: ${SERVICE_URL}"

# Test health endpoint
echo "🩺 Testing health endpoint..."
curl -f "${SERVICE_URL}/health" || echo "❌ Health check failed"

Cloud Run advantages:

Auto-scaling to zero: No idle costs when not in use
Fully managed: No infrastructure to maintain
Pay-per-use: Charged only for CPU time during requests
Built-in HTTPS: Automatic SSL certificate provisioning

For more Cloud Run best practices, see the Cloud Run documentation.

Health Checks: Ensuring Container Reliability

Health checks are critical for production deployments. They enable Kubernetes/Cloud Run to automatically restart unhealthy containers and route traffic only to healthy instances.

Health Check Endpoints Implementation

// src/health.ts
import express, { Request, Response } from 'express';
import { createClient } from 'redis';

export interface HealthStatus {
  status: 'healthy' | 'unhealthy' | 'starting';
  timestamp: string;
  uptime: number;
  checks: {
    redis?: boolean;
    database?: boolean;
    openai?: boolean;
  };
}

export class HealthChecker {
  private app: express.Application;
  private startTime: number;
  private isReady: boolean = false;
  private redisClient: any;

  constructor(app: express.Application, redisClient: any) {
    this.app = app;
    this.redisClient = redisClient;
    this.startTime = Date.now();
    this.registerRoutes();
  }

  private registerRoutes(): void {
    // Startup probe: Is the application still starting?
    this.app.get('/health/startup', this.startupProbe.bind(this));

    // Liveness probe: Is the application alive?
    this.app.get('/health/live', this.livenessProbe.bind(this));

    // Readiness probe: Is the application ready to serve traffic?
    this.app.get('/health/ready', this.readinessProbe.bind(this));

    // Combined health endpoint
    this.app.get('/health', this.healthCheck.bind(this));
  }

  // Startup probe: Used during initialization
  private async startupProbe(req: Request, res: Response): Promise<void> {
    const uptime = Date.now() - this.startTime;

    // Allow 30 seconds for startup
    if (uptime < 30000 && !this.isReady) {
      res.status(503).json({
        status: 'starting',
        uptime,
        message: 'Application is still starting'
      });
      return;
    }

    this.isReady = true;
    res.status(200).json({
      status: 'healthy',
      uptime,
      message: 'Application started successfully'
    });
  }

  // Liveness probe: Is the container alive?
  private async livenessProbe(req: Request, res: Response): Promise<void> {
    const uptime = Date.now() - this.startTime;

    // Basic liveness check (process is running)
    res.status(200).json({
      status: 'healthy',
      uptime,
      timestamp: new Date().toISOString()
    });
  }

  // Readiness probe: Can the container serve traffic?
  private async readinessProbe(req: Request, res: Response): Promise<void> {
    const checks: HealthStatus['checks'] = {};
    let isHealthy = true;

    // Check Redis connection
    try {
      await this.redisClient.ping();
      checks.redis = true;
    } catch (error) {
      checks.redis = false;
      isHealthy = false;
    }

    // Check database connection (example with PostgreSQL)
    try {
      // await this.databaseClient.query('SELECT 1');
      checks.database = true;
    } catch (error) {
      checks.database = false;
      isHealthy = false;
    }

    const status: HealthStatus = {
      status: isHealthy ? 'healthy' : 'unhealthy',
      timestamp: new Date().toISOString(),
      uptime: Date.now() - this.startTime,
      checks
    };

    res.status(isHealthy ? 200 : 503).json(status);
  }

  // Combined health check endpoint
  private async healthCheck(req: Request, res: Response): Promise<void> {
    await this.readinessProbe(req, res);
  }

  public setReady(ready: boolean): void {
    this.isReady = ready;
  }
}

Probe best practices:

Startup probe: Used during initialization (prevents premature liveness checks)
Liveness probe: Lightweight check (just verify process is responsive)
Readiness probe: Comprehensive check (verify all dependencies are healthy)
Fast response: Health checks should respond in <1 second

Graceful Shutdown: Zero Request Loss

Graceful shutdown ensures in-flight requests complete before the container terminates. This is critical during rolling updates to prevent request failures.

Graceful Shutdown Handler

// src/graceful-shutdown.ts
import { Server } from 'http';
import { Express } from 'express';

export class GracefulShutdownHandler {
  private server: Server;
  private app: Express;
  private isShuttingDown: boolean = false;
  private shutdownTimeout: number;

  constructor(server: Server, app: Express, shutdownTimeout: number = 30000) {
    this.server = server;
    this.app = app;
    this.shutdownTimeout = shutdownTimeout;
    this.registerSignalHandlers();
  }

  private registerSignalHandlers(): void {
    // Handle SIGTERM (Kubernetes sends this during pod termination)
    process.on('SIGTERM', () => this.shutdown('SIGTERM'));

    // Handle SIGINT (Ctrl+C in terminal)
    process.on('SIGINT', () => this.shutdown('SIGINT'));

    // Handle uncaught exceptions
    process.on('uncaughtException', (error) => {
      console.error('Uncaught Exception:', error);
      this.shutdown('UNCAUGHT_EXCEPTION');
    });

    // Handle unhandled promise rejections
    process.on('unhandledRejection', (reason, promise) => {
      console.error('Unhandled Rejection at:', promise, 'reason:', reason);
      this.shutdown('UNHANDLED_REJECTION');
    });
  }

  private async shutdown(signal: string): Promise<void> {
    if (this.isShuttingDown) {
      console.log('Shutdown already in progress, ignoring signal:', signal);
      return;
    }

    this.isShuttingDown = true;
    console.log(`Received ${signal}, starting graceful shutdown...`);

    // Step 1: Stop accepting new requests
    this.app.get('/health/ready', (req, res) => {
      res.status(503).json({
        status: 'unhealthy',
        message: 'Server is shutting down'
      });
    });

    console.log('Stopped accepting new requests (readiness probe now returns 503)');

    // Step 2: Wait for Kubernetes to remove pod from service endpoints
    // (typically takes 5-10 seconds)
    await this.sleep(10000);

    // Step 3: Close HTTP server (wait for in-flight requests to complete)
    await new Promise<void>((resolve, reject) => {
      const timeout = setTimeout(() => {
        reject(new Error('Shutdown timeout exceeded'));
      }, this.shutdownTimeout);

      this.server.close((error) => {
        clearTimeout(timeout);
        if (error) {
          console.error('Error during server shutdown:', error);
          reject(error);
        } else {
          console.log('All connections closed successfully');
          resolve();
        }
      });
    });

    // Step 4: Close database connections, Redis clients, etc.
    await this.cleanupResources();

    console.log('Graceful shutdown complete');
    process.exit(0);
  }

  private async cleanupResources(): Promise<void> {
    console.log('Cleaning up resources...');

    // Close Redis connection
    try {
      // await redisClient.quit();
      console.log('Redis connection closed');
    } catch (error) {
      console.error('Error closing Redis:', error);
    }

    // Close database connection pool
    try {
      // await databasePool.end();
      console.log('Database pool closed');
    } catch (error) {
      console.error('Error closing database:', error);
    }
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Usage in main application
import express from 'express';
import http from 'http';

const app = express();
const server = http.createServer(app);

// Initialize graceful shutdown handler
new GracefulShutdownHandler(server, app);

server.listen(3000, () => {
  console.log('MCP Server listening on port 3000');
});

Shutdown sequence:

Receive SIGTERM (Kubernetes sends this 30 seconds before SIGKILL)
Fail readiness probe (stops receiving new traffic)
Wait 10 seconds (allow load balancer to update routing)
Close HTTP server (wait for in-flight requests to complete)
Close database/Redis connections
Exit gracefully (exit code 0)

Zero-Downtime Updates: Blue-Green Deployment

Blue-green deployment runs two identical production environments (blue = current, green = new). Traffic switches to green after validation, enabling instant rollback if issues arise.

Blue-Green Deployment Script

#!/bin/bash
# blue-green-deploy.sh

set -euo pipefail

NAMESPACE="production"
NEW_VERSION="${1:-latest}"
DEPLOYMENT_NAME="mcp-server"
SERVICE_NAME="mcp-server"

echo "🚀 Starting blue-green deployment for version: ${NEW_VERSION}"

# Step 1: Deploy green environment
echo "📦 Deploying green environment..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${DEPLOYMENT_NAME}-green
  namespace: ${NAMESPACE}
  labels:
    app: mcp-server
    environment: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcp-server
      environment: green
  template:
    metadata:
      labels:
        app: mcp-server
        environment: green
        version: ${NEW_VERSION}
    spec:
      containers:
      - name: mcp-server
        image: gcr.io/your-project/mcp-server:${NEW_VERSION}
        ports:
        - containerPort: 3000
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
EOF

# Step 2: Wait for green pods to be ready
echo "⏳ Waiting for green pods to be ready..."
kubectl wait --for=condition=available --timeout=300s \
  deployment/${DEPLOYMENT_NAME}-green -n ${NAMESPACE}

# Step 3: Run smoke tests on green environment
echo "🧪 Running smoke tests on green environment..."
GREEN_POD=$(kubectl get pods -n ${NAMESPACE} -l environment=green -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ${NAMESPACE} ${GREEN_POD} -- curl -f http://localhost:3000/health || {
  echo "❌ Smoke tests failed, rolling back..."
  kubectl delete deployment ${DEPLOYMENT_NAME}-green -n ${NAMESPACE}
  exit 1
}

# Step 4: Switch traffic to green
echo "🔄 Switching traffic to green environment..."
kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"green"}}}'

echo "⏳ Waiting 30 seconds for traffic to stabilize..."
sleep 30

# Step 5: Monitor error rates
echo "📊 Monitoring error rates..."
ERROR_RATE=$(kubectl logs -n ${NAMESPACE} -l environment=green --tail=100 | grep -c "ERROR" || echo "0")
if [ "${ERROR_RATE}" -gt 10 ]; then
  echo "❌ High error rate detected (${ERROR_RATE} errors), rolling back..."
  kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"blue"}}}'
  kubectl delete deployment ${DEPLOYMENT_NAME}-green -n ${NAMESPACE}
  exit 1
fi

# Step 6: Delete old blue environment
echo "🗑️  Deleting old blue environment..."
kubectl delete deployment ${DEPLOYMENT_NAME}-blue -n ${NAMESPACE} --ignore-not-found=true

# Step 7: Rename green to blue
echo "🔄 Renaming green to blue..."
kubectl patch deployment ${DEPLOYMENT_NAME}-green -n ${NAMESPACE} -p '{"metadata":{"name":"${DEPLOYMENT_NAME}-blue"},"spec":{"selector":{"matchLabels":{"environment":"blue"}},"template":{"metadata":{"labels":{"environment":"blue"}}}}}'

echo "✅ Blue-green deployment complete!"
echo "📊 Current deployment status:"
kubectl get deployments -n ${NAMESPACE} -l app=mcp-server

Blue-green advantages:

Instant rollback: Switch back to blue if green fails
Zero downtime: New version fully tested before receiving traffic
Risk mitigation: Production validation before full cutover

Conclusion: Production-Ready MCP Deployment

Deploying MCP servers to production requires careful planning, robust infrastructure, and comprehensive monitoring. This guide covered production-grade deployment patterns using Docker, Kubernetes, and Cloud Run, complete with health checks, graceful shutdown handling, and zero-downtime update strategies.

Key takeaways:

Docker multi-stage builds reduce image sizes by 70% and improve security
Kubernetes orchestration enables auto-scaling, self-healing, and rolling updates
Cloud Run offers serverless simplicity with auto-scaling to zero
Health checks ensure traffic routes only to healthy containers
Graceful shutdown prevents request loss during updates
Blue-green deployment eliminates downtime and enables instant rollback

Ready to deploy your MCP server to production? Start building your ChatGPT app with MakeAIHQ – the only no-code platform specifically designed for the ChatGPT App Store. From zero to production deployment in 48 hours, no DevOps expertise required.

Need expert deployment assistance? Contact our team for white-glove migration services, infrastructure consulting, and production optimization.

Internal Links

Building Production-Grade MCP Servers: Architecture Patterns Guide
Deployment Specialist: Firebase, Cloud Functions & GCP Best Practices
MCP Server Monitoring: Prometheus, Grafana & Alerting Setup
MCP Server Security: Authentication, Authorization & Data Protection
MCP Server Testing: Unit, Integration & End-to-End Strategies
MCP Server Performance Optimization: Caching, Load Balancing & CDN
ChatGPT App Store Submission Checklist: OpenAI Approval Guide
Kubernetes vs Cloud Run: Choosing the Right Platform for MCP Servers
Docker Best Practices for Node.js MCP Servers
MakeAIHQ Features: Build, Deploy & Scale ChatGPT Apps

External Links

Schema Markup (HowTo)

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "MCP Server Deployment: Docker, Kubernetes & Cloud Run Production Setup",
  "description": "Production-grade MCP deployment guide: Dockerization, Kubernetes orchestration, Cloud Run scaling, health checks, and zero-downtime updates.",
  "image": "https://makeaihq.com/images/mcp-deployment-guide.png",
  "totalTime": "PT2H",
  "estimatedCost": {
    "@type": "MonetaryAmount",
    "currency": "USD",
    "value": "0"
  },
  "tool": [
    {
      "@type": "HowToTool",
      "name": "Docker"
    },
    {
      "@type": "HowToTool",
      "name": "Kubernetes"
    },
    {
      "@type": "HowToTool",
      "name": "Google Cloud Run"
    }
  ],
  "step": [
    {
      "@type": "HowToStep",
      "name": "Docker Containerization",
      "text": "Create multi-stage Dockerfile with Alpine Linux, non-root user, and security scanning",
      "url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#docker-containerization"
    },
    {
      "@type": "HowToStep",
      "name": "Kubernetes Deployment",
      "text": "Deploy to Kubernetes with auto-scaling, health checks, and rolling updates",
      "url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#kubernetes-deployment"
    },
    {
      "@type": "HowToStep",
      "name": "Cloud Run Deployment",
      "text": "Deploy to Cloud Run with serverless auto-scaling and zero idle costs",
      "url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#cloud-run-deployment"
    },
    {
      "@type": "HowToStep",
      "name": "Implement Health Checks",
      "text": "Add startup, liveness, and readiness probes for container reliability",
      "url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#health-checks"
    },
    {
      "@type": "HowToStep",
      "name": "Graceful Shutdown",
      "text": "Implement graceful shutdown to prevent request loss during updates",
      "url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#graceful-shutdown"
    },
    {
      "@type": "HowToStep",
      "name": "Blue-Green Deployment",
      "text": "Deploy new versions with zero downtime using blue-green strategy",
      "url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#zero-downtime-updates"
    }
  ]
}