API Gateway Patterns for ChatGPT Apps: Production Architecture Guide

Building a production-grade ChatGPT application requires more than just connecting to OpenAI's APIs. As your app scales to hundreds or thousands of users, you need robust infrastructure to handle authentication, rate limiting, caching, request transformation, and observability. This is where API gateways become essential.

An API gateway acts as a reverse proxy that sits between your ChatGPT application and backend services, providing a unified entry point for all API traffic. Leading solutions like Kong, AWS API Gateway, and Azure API Management offer enterprise-grade features that solve common challenges in ChatGPT app development.

In this comprehensive guide, you'll learn how to implement production-ready API gateway patterns specifically designed for ChatGPT applications. We'll cover architecture fundamentals, hands-on implementations with Kong and AWS API Gateway, advanced patterns like circuit breakers and response caching, and monitoring strategies using OpenTelemetry.

Whether you're building a customer service chatbot serving 10,000 requests per day or an enterprise knowledge base handling millions of interactions, these patterns will help you build scalable, secure, and maintainable ChatGPT applications. For a broader understanding of ChatGPT app architecture, see our Complete Guide to Building ChatGPT Applications.

Why API Gateways Matter for ChatGPT Apps

ChatGPT applications face unique infrastructure challenges that API gateways are designed to solve:

Rate Limiting: OpenAI enforces strict rate limits on API calls (e.g., 10,000 tokens per minute for GPT-4). Without proper gateway-level rate limiting, your application could exhaust quotas during traffic spikes, causing service disruptions. API gateways implement intelligent rate limiting algorithms (token bucket, leaky bucket, sliding window) to smooth traffic and prevent quota violations.

Authentication Aggregation: Modern ChatGPT apps often integrate multiple services—OpenAI APIs, vector databases like Pinecone, analytics platforms, and internal microservices. Managing authentication across these services becomes complex. API gateways centralize authentication, validating tokens once at the gateway layer before routing requests to backend services. This reduces latency and improves security by minimizing credential exposure.

Request/Response Transformation: ChatGPT applications frequently need to transform data formats between frontend clients and backend services. For example, converting user input from a mobile app's JSON format to the OpenAI Chat Completions API format, or transforming streaming responses into server-sent events for real-time UI updates. Gateways handle these transformations declaratively, keeping business logic clean.

Caching: ChatGPT API calls are expensive—both in latency (200-1000ms) and cost ($0.03 per 1K tokens for GPT-4). Intelligent caching of common queries can reduce costs by 40-60% while dramatically improving response times. API gateways provide built-in caching layers with TTL management, cache invalidation, and distributed cache support via Redis.

Circuit Breaking & Failover: When OpenAI experiences outages or degraded performance, your ChatGPT app needs graceful degradation strategies. Circuit breakers detect failures and automatically route traffic to fallback services (e.g., cached responses, alternative LLM providers like Anthropic Claude). This improves reliability and user experience during incidents.

For ChatGPT apps specifically targeting the OpenAI App Store, proper gateway architecture ensures you meet OpenAI's security and performance requirements, including OAuth 2.1 PKCE authentication and sub-200ms API response times.

Gateway Architecture Fundamentals

Understanding the reverse proxy pattern is essential to implementing API gateways effectively. Here's how traffic flows through a gateway in a ChatGPT application:

Reverse Proxy Pattern:

Client sends request: POST https://api.yourapp.com/chat/completions
Gateway intercepts request at Layer 7 (HTTP)
Authentication plugin validates JWT token
Rate limiting plugin checks quota (e.g., 100 requests/minute per user)
Request transformation plugin formats payload for OpenAI API
Gateway forwards to backend: POST https://api.openai.com/v1/chat/completions
Backend responds with streaming ChatGPT response
Gateway applies response caching (if applicable)
Gateway returns response to client

This pattern provides a single point of control for cross-cutting concerns that would otherwise be duplicated across microservices.

Authentication Aggregation consolidates multiple authentication mechanisms into a unified gateway layer:

Client Request (OAuth 2.0 token)
    ↓
Gateway validates token (JWT verification)
    ↓
Gateway enriches request with service credentials:
  - OpenAI API key (from Vault)
  - Pinecone API key (for vector search)
  - Internal service tokens
    ↓
Backend services receive authenticated requests

This architecture eliminates the need for each backend service to implement OAuth validation independently, reducing attack surface and simplifying credential rotation.

Request/Response Transformation handles data format conversions declaratively. For example, transforming a mobile app's request format to OpenAI's Chat Completions API:

// Client request (simplified mobile format)
{
  "message": "What's the weather in SF?",
  "userId": "user_123"
}

// Gateway transforms to OpenAI format
{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in SF?"}
  ],
  "user": "user_123",
  "temperature": 0.7
}

This keeps mobile clients lightweight while maintaining compatibility with OpenAI's API specifications.

For more on architectural patterns in ChatGPT applications, see Microservices Architecture for ChatGPT Apps.

Kong Gateway Setup for ChatGPT Apps

Kong is an open-source API gateway built on Nginx and Lua, offering exceptional performance (50,000+ requests/second) and a rich plugin ecosystem. Here's a production-ready Kong setup for ChatGPT applications using declarative configuration.

Kong Declarative Configuration

# kong.yml - Declarative configuration for ChatGPT app gateway
_format_version: "3.0"
_transform: true

services:
  - name: openai-chat-service
    url: https://api.openai.com/v1/chat/completions
    protocol: https
    port: 443
    connect_timeout: 5000
    write_timeout: 60000
    read_timeout: 60000
    retries: 3

    routes:
      - name: chat-completions-route
        paths:
          - /v1/chat/completions
        methods:
          - POST
        strip_path: false
        preserve_host: false

    plugins:
      # Rate limiting: 100 requests/minute per consumer
      - name: rate-limiting
        config:
          minute: 100
          policy: redis
          redis_host: redis.yourapp.com
          redis_port: 6379
          redis_timeout: 2000
          fault_tolerant: true
          hide_client_headers: false

      # JWT authentication
      - name: jwt
        config:
          uri_param_names:
            - jwt
          cookie_names:
            - jwt
          key_claim_name: kid
          secret_is_base64: false
          claims_to_verify:
            - exp
          maximum_expiration: 3600

      # Request transformer: Add OpenAI API key
      - name: request-transformer
        config:
          add:
            headers:
              - Authorization:Bearer ${OPENAI_API_KEY}
            body:
              - model:gpt-4
              - temperature:0.7
          remove:
            headers:
              - X-Internal-User-ID

      # Response caching: Cache identical requests for 5 minutes
      - name: proxy-cache
        config:
          strategy: memory
          content_type:
            - application/json
          cache_ttl: 300
          cache_control: false
          memory:
            dictionary_name: kong_cache

      # CORS support for web clients
      - name: cors
        config:
          origins:
            - https://yourapp.com
            - https://app.yourapp.com
          methods:
            - GET
            - POST
            - OPTIONS
          headers:
            - Accept
            - Authorization
            - Content-Type
          exposed_headers:
            - X-RateLimit-Limit
            - X-RateLimit-Remaining
          credentials: true
          max_age: 3600

  - name: pinecone-vector-service
    url: https://your-index.pinecone.io
    protocol: https
    port: 443

    routes:
      - name: vector-search-route
        paths:
          - /v1/vector/search
        methods:
          - POST

    plugins:
      - name: rate-limiting
        config:
          minute: 500
          policy: redis
          redis_host: redis.yourapp.com

      - name: request-transformer
        config:
          add:
            headers:
              - Api-Key:${PINECONE_API_KEY}

consumers:
  - username: mobile-app
    custom_id: app_mobile_v1
    jwt_secrets:
      - key: mobile-app-key
        algorithm: HS256
        secret: your-jwt-secret-here

  - username: web-app
    custom_id: app_web_v1
    jwt_secrets:
      - key: web-app-key
        algorithm: HS256
        secret: your-jwt-secret-here

plugins:
  # Global request ID for tracing
  - name: correlation-id
    config:
      header_name: X-Request-ID
      generator: uuid
      echo_downstream: true

  # Global response compression
  - name: response-transformer
    config:
      add:
        headers:
          - X-Gateway-Version:1.0.0
          - X-Powered-By:Kong

This declarative configuration defines two services (OpenAI Chat Completions and Pinecone vector search) with comprehensive plugin configurations. The rate-limiting plugin uses Redis for distributed rate limiting across multiple Kong instances, critical for horizontal scaling.

Custom Rate Limiting Plugin

For advanced rate limiting scenarios (e.g., token-based limits for OpenAI APIs), create a custom Lua plugin:

-- kong/plugins/token-rate-limit/handler.lua
-- Custom rate limiting based on OpenAI token consumption

local kong = kong
local redis = require "resty.redis"
local cjson = require "cjson"

local TokenRateLimitHandler = {
  VERSION = "1.0.0",
  PRIORITY = 901, -- Execute before other rate limiting plugins
}

function TokenRateLimitHandler:access(conf)
  local consumer_id = kong.client.get_consumer()
  if not consumer_id then
    return kong.response.exit(401, {
      message = "Authentication required"
    })
  end

  local identifier = consumer_id.id
  local redis_client = redis:new()
  redis_client:set_timeout(conf.redis_timeout)

  local ok, err = redis_client:connect(conf.redis_host, conf.redis_port)
  if not ok then
    kong.log.err("Failed to connect to Redis: ", err)
    if conf.fault_tolerant then
      return -- Allow request through if Redis is down
    end
    return kong.response.exit(500, {
      message = "Rate limiting service unavailable"
    })
  end

  -- Check current token usage
  local cache_key = "token_limit:" .. identifier .. ":" .. os.date("%Y%m%d%H%M")
  local current_usage = redis_client:get(cache_key)

  if current_usage == ngx.null then
    current_usage = 0
  else
    current_usage = tonumber(current_usage)
  end

  -- Estimate tokens from request (rough approximation)
  local request_body = kong.request.get_raw_body()
  local estimated_tokens = 0

  if request_body then
    local body_json = cjson.decode(request_body)
    if body_json.messages then
      for _, message in ipairs(body_json.messages) do
        -- Rough estimate: 1 token ≈ 4 characters
        estimated_tokens = estimated_tokens + math.ceil(#message.content / 4)
      end
    end

    -- Add max_tokens if specified
    if body_json.max_tokens then
      estimated_tokens = estimated_tokens + body_json.max_tokens
    else
      estimated_tokens = estimated_tokens + 500 -- Default assumption
    end
  end

  -- Check if limit would be exceeded
  if current_usage + estimated_tokens > conf.tokens_per_minute then
    kong.response.set_header("X-RateLimit-Limit", conf.tokens_per_minute)
    kong.response.set_header("X-RateLimit-Remaining", 0)
    kong.response.set_header("X-RateLimit-Reset", os.time() + 60)

    return kong.response.exit(429, {
      message = "Token rate limit exceeded",
      limit = conf.tokens_per_minute,
      current_usage = current_usage,
      estimated_request_tokens = estimated_tokens,
      reset_at = os.time() + 60
    })
  end

  -- Increment usage counter
  local new_usage = current_usage + estimated_tokens
  redis_client:set(cache_key, new_usage)
  redis_client:expire(cache_key, 60) -- TTL: 1 minute

  -- Set rate limit headers
  kong.response.set_header("X-RateLimit-Limit", conf.tokens_per_minute)
  kong.response.set_header("X-RateLimit-Remaining", conf.tokens_per_minute - new_usage)
  kong.response.set_header("X-RateLimit-Reset", os.time() + 60)

  redis_client:set_keepalive(10000, 100)
end

return TokenRateLimitHandler

-- kong/plugins/token-rate-limit/schema.lua
local typedefs = require "kong.db.schema.typedefs"

return {
  name = "token-rate-limit",
  fields = {
    { config = {
        type = "record",
        fields = {
          { tokens_per_minute = {
              type = "number",
              default = 10000,
              required = true,
              gt = 0,
          }},
          { redis_host = typedefs.host({ required = true }) },
          { redis_port = typedefs.port({ required = true, default = 6379 }) },
          { redis_timeout = { type = "number", default = 2000 } },
          { fault_tolerant = { type = "boolean", default = true } },
        },
    }},
  },
}

This custom plugin implements token-based rate limiting, crucial for ChatGPT apps that need to respect OpenAI's token-per-minute quotas. The plugin estimates token usage from request payloads and tracks consumption in Redis with per-minute granularity.

For more on rate limiting strategies, see Rate Limiting Patterns for ChatGPT Applications.

JWT Authentication Plugin

Configure JWT authentication to validate tokens issued by your identity provider:

-- kong/plugins/jwt-validator/handler.lua
-- Custom JWT validation with OpenID Connect support

local jwt_decoder = require "kong.plugins.jwt.jwt_parser"
local http = require "resty.http"
local cjson = require "cjson"

local JWTValidatorHandler = {
  VERSION = "1.0.0",
  PRIORITY = 1005, -- Execute early in plugin chain
}

-- Cache for JWKS (JSON Web Key Set)
local jwks_cache = {}
local jwks_cache_ttl = 3600 -- 1 hour

function JWTValidatorHandler:fetch_jwks(conf)
  local cache_key = conf.jwks_uri
  local cached_jwks = jwks_cache[cache_key]

  if cached_jwks and cached_jwks.expires_at > ngx.time() then
    return cached_jwks.keys
  end

  local httpc = http.new()
  httpc:set_timeout(conf.http_timeout)

  local res, err = httpc:request_uri(conf.jwks_uri, {
    method = "GET",
    headers = {
      ["Accept"] = "application/json",
    },
  })

  if not res or res.status ~= 200 then
    kong.log.err("Failed to fetch JWKS: ", err or res.status)
    return nil, "JWKS fetch failed"
  end

  local jwks = cjson.decode(res.body)
  jwks_cache[cache_key] = {
    keys = jwks.keys,
    expires_at = ngx.time() + jwks_cache_ttl,
  }

  return jwks.keys
end

function JWTValidatorHandler:access(conf)
  local authorization = kong.request.get_header("Authorization")

  if not authorization then
    return kong.response.exit(401, {
      message = "Missing Authorization header"
    })
  end

  local token = authorization:match("Bearer%s+(.+)")
  if not token then
    return kong.response.exit(401, {
      message = "Invalid Authorization header format"
    })
  end

  -- Decode JWT without verification first to get key ID
  local jwt, err = jwt_decoder:new(token)
  if err then
    return kong.response.exit(401, {
      message = "Invalid JWT format",
      error = err
    })
  end

  local header = jwt.header
  local kid = header.kid

  if not kid then
    return kong.response.exit(401, {
      message = "Missing key ID in JWT header"
    })
  end

  -- Fetch JWKS to get public key
  local jwks, jwks_err = self:fetch_jwks(conf)
  if jwks_err then
    return kong.response.exit(500, {
      message = "Failed to validate JWT",
      error = jwks_err
    })
  end

  -- Find matching key
  local public_key = nil
  for _, key in ipairs(jwks) do
    if key.kid == kid then
      public_key = key
      break
    end
  end

  if not public_key then
    return kong.response.exit(401, {
      message = "Public key not found for key ID",
      kid = kid
    })
  end

  -- Verify JWT signature
  local verified = jwt:verify_signature(public_key)
  if not verified then
    return kong.response.exit(401, {
      message = "JWT signature verification failed"
    })
  end

  -- Validate claims
  local claims = jwt.claims
  local now = ngx.time()

  if claims.exp and claims.exp < now then
    return kong.response.exit(401, {
      message = "JWT has expired",
      expired_at = claims.exp,
      current_time = now
    })
  end

  if claims.nbf and claims.nbf > now then
    return kong.response.exit(401, {
      message = "JWT not yet valid",
      valid_from = claims.nbf,
      current_time = now
    })
  end

  if conf.audience_required and not claims.aud then
    return kong.response.exit(401, {
      message = "Missing audience claim"
    })
  end

  if conf.audience_required and claims.aud ~= conf.expected_audience then
    return kong.response.exit(401, {
      message = "Invalid audience",
      expected = conf.expected_audience,
      received = claims.aud
    })
  end

  -- Set consumer based on subject claim
  if claims.sub then
    kong.service.request.set_header("X-Consumer-ID", claims.sub)
    kong.service.request.set_header("X-Consumer-Email", claims.email or "")
  end

  -- Store validated claims for downstream plugins
  kong.ctx.shared.jwt_claims = claims
end

return JWTValidatorHandler

This production-grade JWT validator implements OpenID Connect support with JWKS (JSON Web Key Set) fetching, signature verification, and comprehensive claim validation. It's essential for ChatGPT apps using OAuth 2.1 PKCE authentication as required by the OpenAI App Store.

For complete authentication implementation guides, see OAuth 2.1 PKCE for ChatGPT Apps.

AWS API Gateway Implementation

AWS API Gateway provides a fully managed service for creating, deploying, and securing APIs at scale. Here's a production Terraform configuration for a ChatGPT application gateway:

Terraform Configuration

# terraform/api-gateway.tf
# AWS API Gateway for ChatGPT application

terraform {
  required_version = ">= 1.6"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# REST API Gateway
resource "aws_api_gateway_rest_api" "chatgpt_api" {
  name        = "chatgpt-app-gateway"
  description = "API Gateway for ChatGPT application with rate limiting and authentication"

  endpoint_configuration {
    types = ["REGIONAL"]
  }

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = "*"
        Action = "execute-api:Invoke"
        Resource = "*"
        Condition = {
          IpAddress = {
            "aws:SourceIp" = [
              "0.0.0.0/0" # Restrict to your IP ranges in production
            ]
          }
        }
      }
    ]
  })
}

# Lambda authorizer function
resource "aws_lambda_function" "jwt_authorizer" {
  filename      = "lambda/jwt-authorizer.zip"
  function_name = "chatgpt-jwt-authorizer"
  role          = aws_iam_role.lambda_authorizer_role.arn
  handler       = "index.handler"
  runtime       = "nodejs20.x"
  timeout       = 10
  memory_size   = 256

  environment {
    variables = {
      JWKS_URI = "https://your-auth-provider.com/.well-known/jwks.json"
      AUDIENCE = "https://api.yourapp.com"
      ISSUER   = "https://your-auth-provider.com"
    }
  }

  tags = {
    Environment = "production"
    Service     = "chatgpt-gateway"
  }
}

# API Gateway authorizer
resource "aws_api_gateway_authorizer" "jwt_authorizer" {
  name                   = "jwt-authorizer"
  rest_api_id            = aws_api_gateway_rest_api.chatgpt_api.id
  type                   = "TOKEN"
  authorizer_uri         = aws_lambda_function.jwt_authorizer.invoke_arn
  authorizer_credentials = aws_iam_role.api_gateway_authorizer_role.arn
  identity_source        = "method.request.header.Authorization"

  authorizer_result_ttl_in_seconds = 300 # Cache for 5 minutes
}

# /v1 resource
resource "aws_api_gateway_resource" "v1" {
  rest_api_id = aws_api_gateway_rest_api.chatgpt_api.id
  parent_id   = aws_api_gateway_rest_api.chatgpt_api.root_resource_id
  path_part   = "v1"
}

# /v1/chat resource
resource "aws_api_gateway_resource" "chat" {
  rest_api_id = aws_api_gateway_rest_api.chatgpt_api.id
  parent_id   = aws_api_gateway_resource.v1.id
  path_part   = "chat"
}

# /v1/chat/completions resource
resource "aws_api_gateway_resource" "completions" {
  rest_api_id = aws_api_gateway_rest_api.chatgpt_api.id
  parent_id   = aws_api_gateway_resource.chat.id
  path_part   = "completions"
}

# POST method with request validation
resource "aws_api_gateway_method" "completions_post" {
  rest_api_id   = aws_api_gateway_rest_api.chatgpt_api.id
  resource_id   = aws_api_gateway_resource.completions.id
  http_method   = "POST"
  authorization = "CUSTOM"
  authorizer_id = aws_api_gateway_authorizer.jwt_authorizer.id

  request_validator_id = aws_api_gateway_request_validator.chatgpt_validator.id

  request_models = {
    "application/json" = aws_api_gateway_model.chat_completion_request.name
  }

  request_parameters = {
    "method.request.header.Authorization" = true
  }
}

# Request validator
resource "aws_api_gateway_request_validator" "chatgpt_validator" {
  name                        = "chatgpt-request-validator"
  rest_api_id                 = aws_api_gateway_rest_api.chatgpt_api.id
  validate_request_body       = true
  validate_request_parameters = true
}

# Request model schema
resource "aws_api_gateway_model" "chat_completion_request" {
  rest_api_id  = aws_api_gateway_rest_api.chatgpt_api.id
  name         = "ChatCompletionRequest"
  description  = "Schema for chat completion requests"
  content_type = "application/json"

  schema = jsonencode({
    "$schema" = "http://json-schema.org/draft-04/schema#"
    type      = "object"
    required  = ["messages"]
    properties = {
      messages = {
        type = "array"
        minItems = 1
        items = {
          type = "object"
          required = ["role", "content"]
          properties = {
            role = {
              type = "string"
              enum = ["system", "user", "assistant"]
            }
            content = {
              type = "string"
              minLength = 1
            }
          }
        }
      }
      model = {
        type = "string"
        default = "gpt-4"
      }
      temperature = {
        type = "number"
        minimum = 0
        maximum = 2
      }
      max_tokens = {
        type = "integer"
        minimum = 1
        maximum = 4096
      }
    }
  })
}

# HTTP integration with OpenAI
resource "aws_api_gateway_integration" "openai_integration" {
  rest_api_id             = aws_api_gateway_rest_api.chatgpt_api.id
  resource_id             = aws_api_gateway_resource.completions.id
  http_method             = aws_api_gateway_method.completions_post.http_method
  type                    = "HTTP"
  integration_http_method = "POST"
  uri                     = "https://api.openai.com/v1/chat/completions"

  request_templates = {
    "application/json" = <<EOF
#set($inputRoot = $input.path('$'))
{
  "model": "$inputRoot.model",
  "messages": $inputRoot.messages,
  "temperature": $inputRoot.temperature,
  "max_tokens": $inputRoot.max_tokens,
  "user": "$context.authorizer.principalId"
}
EOF
  }

  request_parameters = {
    "integration.request.header.Authorization" = "'Bearer ${var.openai_api_key}'"
    "integration.request.header.Content-Type"  = "'application/json'"
  }

  timeout_milliseconds = 29000 # Max for API Gateway
}

# Usage plan for rate limiting
resource "aws_api_gateway_usage_plan" "chatgpt_usage_plan" {
  name        = "chatgpt-usage-plan"
  description = "Rate limiting for ChatGPT API"

  api_stages {
    api_id = aws_api_gateway_rest_api.chatgpt_api.id
    stage  = aws_api_gateway_stage.production.stage_name
  }

  quota_settings {
    limit  = 100000  # 100K requests per month
    period = "MONTH"
  }

  throttle_settings {
    burst_limit = 200  # Allow bursts up to 200 requests
    rate_limit  = 100  # 100 requests per second sustained
  }
}

# API Gateway stage
resource "aws_api_gateway_stage" "production" {
  deployment_id = aws_api_gateway_deployment.production.id
  rest_api_id   = aws_api_gateway_rest_api.chatgpt_api.id
  stage_name    = "production"

  cache_cluster_enabled = true
  cache_cluster_size    = "0.5" # 0.5 GB cache

  xray_tracing_enabled = true

  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.api_gateway_logs.arn
    format = jsonencode({
      requestId      = "$context.requestId"
      ip             = "$context.identity.sourceIp"
      caller         = "$context.identity.caller"
      user           = "$context.identity.user"
      requestTime    = "$context.requestTime"
      httpMethod     = "$context.httpMethod"
      resourcePath   = "$context.resourcePath"
      status         = "$context.status"
      protocol       = "$context.protocol"
      responseLength = "$context.responseLength"
    })
  }
}

# CloudWatch Logs
resource "aws_cloudwatch_log_group" "api_gateway_logs" {
  name              = "/aws/api-gateway/chatgpt-app"
  retention_in_days = 30
}

# IAM roles (simplified - expand for production)
resource "aws_iam_role" "lambda_authorizer_role" {
  name = "lambda-authorizer-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "lambda.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role" "api_gateway_authorizer_role" {
  name = "api-gateway-authorizer-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "apigateway.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

# Outputs
output "api_gateway_url" {
  value = "${aws_api_gateway_stage.production.invoke_url}/v1/chat/completions"
}

output "api_key_id" {
  value = aws_api_gateway_api_key.chatgpt_api_key.id
}

This Terraform configuration provisions a complete AWS API Gateway with JWT authentication, request validation, rate limiting via usage plans, and CloudWatch logging integration.

Lambda Authorizer Implementation

// lambda/jwt-authorizer/index.js
// Lambda authorizer for JWT validation with JWKS support

const jwt = require('jsonwebtoken');
const jwksClient = require('jwks-rsa');

// JWKS client with caching
const client = jwksClient({
  cache: true,
  cacheMaxAge: 3600000, // 1 hour
  rateLimit: true,
  jwksRequestsPerMinute: 10,
  jwksUri: process.env.JWKS_URI
});

/**
 * Get signing key from JWKS
 */
function getKey(header, callback) {
  client.getSigningKey(header.kid, (err, key) => {
    if (err) {
      console.error('Failed to get signing key:', err);
      return callback(err);
    }
    const signingKey = key.publicKey || key.rsaPublicKey;
    callback(null, signingKey);
  });
}

/**
 * Generate IAM policy document
 */
function generatePolicy(principalId, effect, resource, context = {}) {
  const authResponse = {
    principalId: principalId
  };

  if (effect && resource) {
    authResponse.policyDocument = {
      Version: '2012-10-17',
      Statement: [
        {
          Action: 'execute-api:Invoke',
          Effect: effect,
          Resource: resource
        }
      ]
    };
  }

  // Add context for downstream Lambda functions
  authResponse.context = context;

  return authResponse;
}

/**
 * Lambda handler
 */
exports.handler = async (event, context) => {
  console.log('Authorization request:', JSON.stringify(event, null, 2));

  // Extract token from Authorization header
  const token = event.authorizationToken?.replace(/^Bearer\s+/, '');

  if (!token) {
    console.error('No token provided');
    throw new Error('Unauthorized');
  }

  try {
    // Decode token without verification to get header
    const decoded = jwt.decode(token, { complete: true });

    if (!decoded || !decoded.header) {
      console.error('Invalid token format');
      throw new Error('Unauthorized');
    }

    // Verify token signature and claims
    const verified = await new Promise((resolve, reject) => {
      jwt.verify(
        token,
        (header, callback) => getKey(header, callback),
        {
          audience: process.env.AUDIENCE,
          issuer: process.env.ISSUER,
          algorithms: ['RS256']
        },
        (err, decoded) => {
          if (err) {
            console.error('Token verification failed:', err);
            return reject(err);
          }
          resolve(decoded);
        }
      );
    });

    console.log('Token verified:', verified);

    // Generate allow policy
    const policy = generatePolicy(
      verified.sub,
      'Allow',
      event.methodArn,
      {
        userId: verified.sub,
        email: verified.email || '',
        scope: verified.scope || '',
        tier: verified.tier || 'free'
      }
    );

    return policy;

  } catch (error) {
    console.error('Authorization error:', error);

    // Generate deny policy
    return generatePolicy(
      'unknown',
      'Deny',
      event.methodArn
    );
  }
};

This Lambda authorizer validates JWT tokens using JWKS (JSON Web Key Set) from your identity provider, verifies signature and claims, and generates IAM policies to allow/deny API Gateway invocations. The verified claims (user ID, email, tier) are passed as context to downstream integrations.

Request Validator

// lambda/request-validator/index.ts
// Advanced request validation with business logic

import { APIGatewayProxyEvent, APIGatewayProxyResult } from 'aws-lambda';
import Ajv, { JSONSchemaType } from 'ajv';
import addFormats from 'ajv-formats';

interface ChatMessage {
  role: 'system' | 'user' | 'assistant';
  content: string;
  name?: string;
}

interface ChatCompletionRequest {
  messages: ChatMessage[];
  model?: string;
  temperature?: number;
  max_tokens?: number;
  top_p?: number;
  frequency_penalty?: number;
  presence_penalty?: number;
  user?: string;
}

// JSON Schema for request validation
const chatCompletionSchema: JSONSchemaType<ChatCompletionRequest> = {
  type: 'object',
  properties: {
    messages: {
      type: 'array',
      minItems: 1,
      maxItems: 50, // Prevent abuse
      items: {
        type: 'object',
        properties: {
          role: {
            type: 'string',
            enum: ['system', 'user', 'assistant']
          },
          content: {
            type: 'string',
            minLength: 1,
            maxLength: 10000 // Prevent token abuse
          },
          name: {
            type: 'string',
            nullable: true
          }
        },
        required: ['role', 'content']
      }
    },
    model: {
      type: 'string',
      enum: ['gpt-4', 'gpt-4-turbo', 'gpt-3.5-turbo'],
      nullable: true
    },
    temperature: {
      type: 'number',
      minimum: 0,
      maximum: 2,
      nullable: true
    },
    max_tokens: {
      type: 'integer',
      minimum: 1,
      maximum: 4096,
      nullable: true
    },
    top_p: {
      type: 'number',
      minimum: 0,
      maximum: 1,
      nullable: true
    },
    frequency_penalty: {
      type: 'number',
      minimum: -2,
      maximum: 2,
      nullable: true
    },
    presence_penalty: {
      type: 'number',
      minimum: -2,
      maximum: 2,
      nullable: true
    },
    user: {
      type: 'string',
      nullable: true
    }
  },
  required: ['messages'],
  additionalProperties: false
};

const ajv = new Ajv({ allErrors: true });
addFormats(ajv);
const validate = ajv.compile(chatCompletionSchema);

export const handler = async (
  event: APIGatewayProxyEvent
): Promise<APIGatewayProxyResult> => {
  console.log('Request validation event:', JSON.stringify(event, null, 2));

  try {
    // Parse request body
    if (!event.body) {
      return {
        statusCode: 400,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          error: 'Missing request body'
        })
      };
    }

    const request: ChatCompletionRequest = JSON.parse(event.body);

    // Schema validation
    const valid = validate(request);
    if (!valid) {
      return {
        statusCode: 400,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          error: 'Request validation failed',
          details: validate.errors
        })
      };
    }

    // Business logic validation
    const userTier = event.requestContext.authorizer?.tier || 'free';
    const model = request.model || 'gpt-3.5-turbo';

    // Tier-based model restrictions
    if (userTier === 'free' && model === 'gpt-4') {
      return {
        statusCode: 403,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          error: 'GPT-4 requires Professional or Business tier',
          current_tier: userTier,
          upgrade_url: 'https://yourapp.com/pricing'
        })
      };
    }

    // Estimate token usage
    const estimatedTokens = request.messages.reduce((total, msg) => {
      return total + Math.ceil(msg.content.length / 4);
    }, 0) + (request.max_tokens || 500);

    // Tier-based token limits
    const tokenLimits: Record<string, number> = {
      free: 1000,
      starter: 10000,
      professional: 50000,
      business: 200000
    };

    if (estimatedTokens > tokenLimits[userTier]) {
      return {
        statusCode: 429,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          error: 'Estimated token usage exceeds tier limit',
          estimated_tokens: estimatedTokens,
          tier_limit: tokenLimits[userTier],
          current_tier: userTier
        })
      };
    }

    // Request is valid - pass through
    return {
      statusCode: 200,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        validated: true,
        estimated_tokens: estimatedTokens
      })
    };

  } catch (error) {
    console.error('Validation error:', error);
    return {
      statusCode: 500,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        error: 'Internal validation error'
      })
    };
  }
};

This request validator implements comprehensive validation including JSON schema validation, tier-based model restrictions, and token usage estimation. It prevents abuse by enforcing business rules before requests reach OpenAI.

For more on AWS Lambda patterns in ChatGPT apps, see Serverless Architecture for ChatGPT Applications.

Advanced Gateway Patterns

Beyond basic routing and authentication, production ChatGPT applications require sophisticated patterns for resilience and performance.

Circuit Breaker Middleware

// middleware/circuit-breaker.ts
// Circuit breaker pattern for resilient API gateway

interface CircuitBreakerConfig {
  failureThreshold: number;      // Number of failures before opening circuit
  successThreshold: number;      // Successes needed to close circuit
  timeout: number;               // Timeout before attempting retry (ms)
  monitoringPeriod: number;     // Time window for failure tracking (ms)
}

enum CircuitState {
  CLOSED = 'CLOSED',       // Normal operation
  OPEN = 'OPEN',           // Failures detected, blocking requests
  HALF_OPEN = 'HALF_OPEN'  // Testing if service recovered
}

class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private successCount: number = 0;
  private nextAttempt: number = Date.now();
  private failures: number[] = [];

  constructor(private config: CircuitBreakerConfig) {}

  async execute<T>(
    operation: () => Promise<T>,
    fallback?: () => Promise<T>
  ): Promise<T> {
    // Check if circuit is open
    if (this.state === CircuitState.OPEN) {
      if (Date.now() < this.nextAttempt) {
        console.log('Circuit is OPEN, using fallback');
        if (fallback) {
          return fallback();
        }
        throw new Error('Service unavailable (circuit breaker open)');
      }
      // Transition to half-open to test service
      this.state = CircuitState.HALF_OPEN;
      console.log('Circuit transitioning to HALF_OPEN');
    }

    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();

      if (fallback) {
        console.log('Operation failed, using fallback');
        return fallback();
      }
      throw error;
    }
  }

  private onSuccess(): void {
    this.failureCount = 0;

    if (this.state === CircuitState.HALF_OPEN) {
      this.successCount++;

      if (this.successCount >= this.config.successThreshold) {
        console.log('Circuit closing after successful recovery');
        this.state = CircuitState.CLOSED;
        this.successCount = 0;
      }
    }
  }

  private onFailure(): void {
    const now = Date.now();
    this.failures.push(now);

    // Remove old failures outside monitoring period
    this.failures = this.failures.filter(
      timestamp => now - timestamp < this.config.monitoringPeriod
    );

    this.failureCount = this.failures.length;

    if (this.failureCount >= this.config.failureThreshold) {
      console.log(`Circuit opening after ${this.failureCount} failures`);
      this.state = CircuitState.OPEN;
      this.nextAttempt = now + this.config.timeout;
      this.successCount = 0;
    }
  }

  getState(): CircuitState {
    return this.state;
  }

  getStats() {
    return {
      state: this.state,
      failureCount: this.failureCount,
      successCount: this.successCount,
      nextAttempt: this.nextAttempt
    };
  }
}

// Usage in API gateway middleware
export const createCircuitBreakerMiddleware = (
  config: CircuitBreakerConfig
) => {
  const breaker = new CircuitBreaker(config);

  return async (req: any, res: any, next: any) => {
    try {
      await breaker.execute(
        async () => {
          // Proxy request to backend
          return next();
        },
        async () => {
          // Fallback: Return cached response or error
          res.status(503).json({
            error: 'Service temporarily unavailable',
            circuit_state: breaker.getState(),
            retry_after: Math.ceil(
              (breaker.getStats().nextAttempt - Date.now()) / 1000
            )
          });
        }
      );
    } catch (error) {
      res.status(500).json({
        error: 'Request failed',
        circuit_state: breaker.getState()
      });
    }
  };
};

Circuit breakers prevent cascading failures when OpenAI or other backend services experience outages. The circuit "opens" after a threshold of failures, immediately returning cached responses or error messages instead of overwhelming the failing service.

Response Caching with Redis

// middleware/response-cache.ts
// Intelligent response caching for ChatGPT API calls

import { createClient, RedisClientType } from 'redis';
import crypto from 'crypto';

interface CacheConfig {
  ttl: number;                    // Time to live (seconds)
  keyPrefix: string;              // Redis key prefix
  varyOn: string[];              // Request properties to vary cache on
  excludePaths?: string[];       // Paths to exclude from caching
}

export class ResponseCache {
  private redis: RedisClientType;

  constructor(private config: CacheConfig) {
    this.redis = createClient({
      url: process.env.REDIS_URL || 'redis://localhost:6379'
    });

    this.redis.on('error', (err) => {
      console.error('Redis connection error:', err);
    });

    this.redis.connect();
  }

  /**
   * Generate cache key from request
   */
  private generateCacheKey(req: any): string {
    const varyData: any = {};

    for (const key of this.config.varyOn) {
      if (key === 'body') {
        varyData.body = req.body;
      } else if (key === 'user') {
        varyData.user = req.user?.id || 'anonymous';
      } else if (key.startsWith('header.')) {
        const headerName = key.substring(7);
        varyData[key] = req.headers[headerName];
      }
    }

    const hash = crypto
      .createHash('sha256')
      .update(JSON.stringify(varyData))
      .digest('hex');

    return `${this.config.keyPrefix}:${hash}`;
  }

  /**
   * Check if path should be cached
   */
  private shouldCache(path: string): boolean {
    if (this.config.excludePaths) {
      return !this.config.excludePaths.some(excluded =>
        path.startsWith(excluded)
      );
    }
    return true;
  }

  /**
   * Cache middleware
   */
  middleware() {
    return async (req: any, res: any, next: any) => {
      // Only cache GET and POST requests
      if (!['GET', 'POST'].includes(req.method)) {
        return next();
      }

      // Check if path should be cached
      if (!this.shouldCache(req.path)) {
        return next();
      }

      const cacheKey = this.generateCacheKey(req);

      try {
        // Check cache
        const cached = await this.redis.get(cacheKey);

        if (cached) {
          console.log(`Cache HIT: ${cacheKey}`);
          const response = JSON.parse(cached);

          res.setHeader('X-Cache', 'HIT');
          res.setHeader('X-Cache-Key', cacheKey);
          return res.status(200).json(response);
        }

        console.log(`Cache MISS: ${cacheKey}`);
        res.setHeader('X-Cache', 'MISS');

        // Intercept response to cache it
        const originalJson = res.json.bind(res);
        res.json = async (body: any) => {
          // Only cache successful responses
          if (res.statusCode === 200) {
            try {
              await this.redis.setEx(
                cacheKey,
                this.config.ttl,
                JSON.stringify(body)
              );
              console.log(`Cached response: ${cacheKey} (TTL: ${this.config.ttl}s)`);
            } catch (error) {
              console.error('Cache write error:', error);
            }
          }

          return originalJson(body);
        };

        next();

      } catch (error) {
        console.error('Cache read error:', error);
        // Fail open: Continue without cache
        next();
      }
    };
  }

  /**
   * Invalidate cache entries by pattern
   */
  async invalidate(pattern: string): Promise<number> {
    try {
      const keys = await this.redis.keys(`${this.config.keyPrefix}:${pattern}*`);

      if (keys.length === 0) {
        return 0;
      }

      await this.redis.del(keys);
      console.log(`Invalidated ${keys.length} cache entries`);
      return keys.length;
    } catch (error) {
      console.error('Cache invalidation error:', error);
      return 0;
    }
  }

  /**
   * Get cache statistics
   */
  async getStats(): Promise<any> {
    try {
      const info = await this.redis.info('stats');
      const keyspace = await this.redis.info('keyspace');

      return {
        info,
        keyspace,
        prefix: this.config.keyPrefix
      };
    } catch (error) {
      console.error('Failed to get cache stats:', error);
      return null;
    }
  }
}

// Usage example
const cache = new ResponseCache({
  ttl: 300, // 5 minutes
  keyPrefix: 'chatgpt-api',
  varyOn: ['body', 'user'],
  excludePaths: ['/v1/admin']
});

export default cache;

This caching middleware dramatically reduces OpenAI API costs (40-60% savings) and latency for common queries. It generates cache keys based on request body and user ID, stores responses in Redis with TTL, and provides cache invalidation for content updates.

Request Transformation Pipeline

// middleware/request-transformer.ts
// Advanced request transformation for ChatGPT gateway

interface TransformRule {
  match: (req: any) => boolean;
  transform: (req: any) => any;
}

export class RequestTransformer {
  private rules: TransformRule[] = [];

  /**
   * Add transformation rule
   */
  addRule(rule: TransformRule): void {
    this.rules.push(rule);
  }

  /**
   * Transform request through pipeline
   */
  async transform(req: any): Promise<any> {
    let transformed = { ...req };

    for (const rule of this.rules) {
      if (rule.match(transformed)) {
        transformed = await rule.transform(transformed);
      }
    }

    return transformed;
  }

  /**
   * Express middleware
   */
  middleware() {
    return async (req: any, res: any, next: any) => {
      try {
        req.body = await this.transform(req.body);
        next();
      } catch (error) {
        console.error('Request transformation error:', error);
        res.status(400).json({
          error: 'Invalid request format',
          details: error.message
        });
      }
    };
  }
}

// Create transformer with common ChatGPT rules
const transformer = new RequestTransformer();

// Rule 1: Add system message if missing
transformer.addRule({
  match: (req) => req.messages && !req.messages.some((m: any) => m.role === 'system'),
  transform: (req) => ({
    ...req,
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      ...req.messages
    ]
  })
});

// Rule 2: Set default model based on user tier
transformer.addRule({
  match: (req) => !req.model,
  transform: (req) => {
    const userTier = req.user?.tier || 'free';
    const modelMap: Record<string, string> = {
      free: 'gpt-3.5-turbo',
      starter: 'gpt-3.5-turbo',
      professional: 'gpt-4-turbo',
      business: 'gpt-4-turbo'
    };

    return {
      ...req,
      model: modelMap[userTier]
    };
  }
});

// Rule 3: Inject user ID for usage tracking
transformer.addRule({
  match: (req) => !req.user,
  transform: (req) => ({
    ...req,
    user: req.userId || 'anonymous'
  })
});

// Rule 4: Apply temperature constraints
transformer.addRule({
  match: (req) => req.temperature !== undefined,
  transform: (req) => ({
    ...req,
    temperature: Math.max(0, Math.min(2, req.temperature))
  })
});

export default transformer;

Request transformation pipelines standardize incoming requests, apply business rules (tier-based model selection), and enrich payloads with metadata. This keeps client code simple while enforcing consistent API usage.

For additional gateway patterns, see Microservices Architecture for ChatGPT Apps.

Monitoring & Observability with OpenTelemetry

Production API gateways require comprehensive observability to detect issues, optimize performance, and track costs.

// telemetry/opentelemetry.ts
// OpenTelemetry instrumentation for API gateway

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { trace, context, SpanStatusCode } from '@opentelemetry/api';

// Initialize OpenTelemetry SDK
const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'chatgpt-api-gateway',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development'
  }),

  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces'
  }),

  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/metrics'
    }),
    exportIntervalMillis: 60000 // 1 minute
  }),

  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-http': {
        ignoreIncomingPaths: ['/health', '/metrics']
      },
      '@opentelemetry/instrumentation-express': {
        enabled: true
      }
    })
  ]
});

sdk.start();

// Graceful shutdown
process.on('SIGTERM', () => {
  sdk.shutdown()
    .then(() => console.log('OpenTelemetry SDK shut down'))
    .catch((error) => console.error('Error shutting down OpenTelemetry SDK', error))
    .finally(() => process.exit(0));
});

/**
 * Custom span creation for ChatGPT API calls
 */
export const traceChatGPTRequest = async (
  operation: string,
  attributes: any,
  fn: () => Promise<any>
): Promise<any> => {
  const tracer = trace.getTracer('chatgpt-gateway');

  return tracer.startActiveSpan(operation, async (span) => {
    try {
      // Set span attributes
      span.setAttributes({
        'chatgpt.model': attributes.model || 'unknown',
        'chatgpt.user_id': attributes.userId || 'anonymous',
        'chatgpt.message_count': attributes.messageCount || 0,
        'chatgpt.estimated_tokens': attributes.estimatedTokens || 0
      });

      const result = await fn();

      // Record success
      span.setStatus({ code: SpanStatusCode.OK });

      // Add result attributes
      if (result.usage) {
        span.setAttributes({
          'chatgpt.tokens.prompt': result.usage.prompt_tokens,
          'chatgpt.tokens.completion': result.usage.completion_tokens,
          'chatgpt.tokens.total': result.usage.total_tokens
        });
      }

      return result;

    } catch (error: any) {
      // Record error
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message
      });
      span.recordException(error);
      throw error;

    } finally {
      span.end();
    }
  });
};

/**
 * Express middleware for request tracing
 */
export const tracingMiddleware = () => {
  return (req: any, res: any, next: any) => {
    const tracer = trace.getTracer('chatgpt-gateway');

    const span = tracer.startSpan(`${req.method} ${req.path}`, {
      attributes: {
        'http.method': req.method,
        'http.url': req.url,
        'http.target': req.path,
        'http.user_agent': req.headers['user-agent'],
        'user.id': req.user?.id || 'anonymous'
      }
    });

    // Store span in request context
    req.span = span;

    // Intercept response to record status
    const originalEnd = res.end.bind(res);
    res.end = (...args: any[]) => {
      span.setAttribute('http.status_code', res.statusCode);

      if (res.statusCode >= 400) {
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: `HTTP ${res.statusCode}`
        });
      } else {
        span.setStatus({ code: SpanStatusCode.OK });
      }

      span.end();
      return originalEnd(...args);
    };

    next();
  };
};

export default sdk;

This OpenTelemetry implementation provides distributed tracing across your API gateway, backend services, and OpenAI API calls. You'll see exact latency breakdowns, error rates, and token usage in tools like Jaeger, Grafana, or Datadog.

Key Metrics to Monitor:

Request Rate: Requests per second (track spikes that might trigger rate limits)
Latency Percentiles: p50, p95, p99 response times (target <500ms for p95)
Error Rate: 4xx and 5xx errors (alert on >5% error rate)
Token Usage: Total tokens consumed per hour (cost monitoring)
Cache Hit Rate: Percentage of cached responses (target >40%)
Circuit Breaker State: OPEN/CLOSED/HALF_OPEN counts (detect backend failures)

For complete monitoring strategies, see Observability Patterns for ChatGPT Applications.

Production Deployment Checklist

Before deploying your API gateway to production, validate these critical requirements:

Security:

JWT validation with JWKS rotation support
HTTPS/TLS 1.3 enforcement
API key rotation mechanism
Rate limiting per user/tier
DDoS protection (CloudFlare, AWS Shield)
Security headers (HSTS, CSP, X-Frame-Options)

Performance:

Response caching with Redis
Circuit breakers for OpenAI API
Request timeouts (30s max)
Connection pooling
Horizontal scaling (3+ gateway instances)

Observability:

OpenTelemetry instrumentation
Centralized logging (CloudWatch, Datadog)
Alerting on error rates >5%
Dashboard for key metrics

Cost Optimization:

Cache hit rate >40%
Token usage monitoring
Tier-based model routing

Compliance:

OAuth 2.1 PKCE implementation
Access token verification
Audit logging for sensitive operations

For comprehensive deployment guides, see ChatGPT Applications Guide.

Conclusion: Building Production-Grade ChatGPT Infrastructure

API gateways are the foundation of scalable, secure, and cost-effective ChatGPT applications. By implementing the patterns covered in this guide—Kong declarative configurations, AWS API Gateway with Lambda authorizers, circuit breakers, response caching, and OpenTelemetry observability—you'll build infrastructure that handles production traffic reliably.

Key Takeaways:

Kong Gateway excels for high-throughput scenarios (50,000+ requests/second) with rich Lua plugin ecosystem and declarative configuration
AWS API Gateway provides fully managed infrastructure with built-in rate limiting, request validation, and seamless AWS service integration
Circuit breakers prevent cascading failures when OpenAI experiences outages, improving overall reliability by 95%+
Response caching reduces OpenAI API costs by 40-60% while improving latency from 500ms to <50ms for cached requests
OpenTelemetry provides end-to-end observability with distributed tracing, metrics, and logging across your entire stack

For teams building ChatGPT applications targeting the OpenAI App Store, proper API gateway architecture is non-negotiable. It ensures you meet OpenAI's OAuth 2.1 PKCE security requirements, maintain sub-200ms response times, and provide enterprise-grade reliability.

Whether you choose Kong for maximum control and performance, or AWS API Gateway for managed simplicity, the patterns in this guide give you production-ready starting points. Combine them with comprehensive monitoring, intelligent caching, and resilience patterns to build ChatGPT applications that scale from 100 to 100 million users.

Ready to build your ChatGPT application with professional API gateway architecture? Start your free trial at MakeAIHQ.com and deploy production-ready ChatGPT apps in 48 hours—no coding required. Our platform automatically generates API gateways, authentication flows, and infrastructure based on best practices from this guide.

For more ChatGPT app architecture patterns, explore:

Complete Guide to Building ChatGPT Applications (pillar article)
Rate Limiting Patterns for ChatGPT Apps
Microservices Architecture for ChatGPT Apps
OAuth 2.1 PKCE Authentication for ChatGPT Apps

Building the future of conversational AI, one gateway at a time.

References:

Kong Gateway Documentation - Official Kong Gateway documentation with plugin guides
AWS API Gateway Best Practices - AWS best practices for production API Gateway deployments
OpenTelemetry Specification - Official OpenTelemetry specification for distributed tracing and metrics