ELK Stack Log Aggregation for ChatGPT Apps

Managing logs across distributed ChatGPT applications becomes exponentially complex as your deployment scales. When you're running multiple MCP servers, handling thousands of tool calls per minute, and debugging real-time conversation flows, traditional log files scattered across containers quickly become unmanageable. You need centralized log aggregation that provides real-time search, pattern recognition, and visual analytics.

The ELK Stack (Elasticsearch, Logstash, Kibana) has become the industry-standard solution for log aggregation and analysis at scale. This powerful combination enables you to collect logs from all your ChatGPT app components—MCP servers, widget runtime, authentication services, and backend APIs—into a centralized, searchable index with real-time dashboards.

In this comprehensive guide, you'll learn how to deploy a production-ready ELK Stack for ChatGPT application log aggregation. We'll cover the complete architecture, Docker Compose setup, Logstash pipeline configuration, Kibana dashboard creation, and production deployment strategies with security best practices.

For the complete ChatGPT development workflow, see our Complete Guide to Building ChatGPT Applications. If you want to skip the infrastructure complexity and focus on building your app, MakeAIHQ provides managed logging and monitoring out of the box.

ELK Stack Architecture for ChatGPT Apps

The ELK Stack consists of four core components working together to create a complete log aggregation pipeline. Understanding each component's role is essential for designing a reliable logging infrastructure.

Elasticsearch: The Search Engine

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It stores your log data in indices (similar to databases) and provides near-real-time search capabilities across billions of log entries.

For ChatGPT applications, Elasticsearch indexes contain structured log documents with fields like:

  • timestamp: When the event occurred
  • log_level: DEBUG, INFO, WARN, ERROR, CRITICAL
  • service_name: Which MCP server or component generated the log
  • tool_name: Which ChatGPT tool was invoked
  • user_id: Which user triggered the event (when authenticated)
  • message: The actual log message
  • stack_trace: Error stack traces for debugging
  • response_time: Performance metrics for tool calls

Elasticsearch automatically creates inverted indices for full-text search, allowing you to find logs like "all ERROR logs from the restaurant-booking MCP server in the last 24 hours where response_time > 5000ms" in milliseconds.

Logstash: The Data Pipeline

Logstash is a server-side data processing pipeline that ingests logs from multiple sources, transforms and enriches them, and sends them to Elasticsearch. It operates in three stages:

  1. Input plugins: Collect logs from files, HTTP endpoints, message queues, databases
  2. Filter plugins: Parse, transform, enrich log data (grok patterns, JSON parsing, GeoIP lookup)
  3. Output plugins: Send processed logs to Elasticsearch, S3, monitoring systems

For ChatGPT apps, Logstash pipelines typically:

  • Parse JSON-formatted logs from containerized MCP servers
  • Extract structured fields from unstructured log messages using grok patterns
  • Add metadata like environment (production, staging), region, deployment version
  • Calculate derived metrics (request duration, token usage, error rates)
  • Route logs to different Elasticsearch indices based on log level or service

Kibana: The Visualization Layer

Kibana is the web-based UI for visualizing Elasticsearch data. It provides:

  • Discover: Full-text search interface for exploring logs
  • Visualizations: Charts, graphs, maps, tables for log analytics
  • Dashboards: Pre-built collections of visualizations for monitoring
  • Canvas: Pixel-perfect infographic-style reports

For ChatGPT applications, Kibana dashboards typically show:

  • Real-time request volume by tool name
  • Error rate trends over time
  • P50/P95/P99 latency percentiles
  • Top 10 slowest tools
  • Geographic distribution of users (from IP addresses)
  • Alert thresholds (e.g., error rate > 5%)

Filebeat: The Lightweight Shipper

Filebeat is a lightweight agent that ships log files from your application servers to Logstash or Elasticsearch. Unlike Logstash (which is resource-intensive), Filebeat is designed to run on every server with minimal overhead.

For Docker-based ChatGPT deployments, Filebeat:

  • Mounts the Docker socket to collect container logs
  • Tails log files in real-time
  • Adds metadata like container name, labels, environment variables
  • Handles backpressure when Logstash is overloaded
  • Guarantees at-least-once delivery with persistent state

The typical data flow is: ChatGPT App → Filebeat → Logstash → Elasticsearch → Kibana.

Learn more about logging best practices in our guide: MCP Server Logging Best Practices for ChatGPT.

Production Docker Compose Setup

Deploying the ELK Stack in Docker provides consistency across development, staging, and production environments. This Docker Compose configuration creates a production-ready cluster with proper networking, volumes, and security.

# docker-compose.elk.yml
version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
    container_name: chatgpt-elasticsearch
    environment:
      # Cluster configuration
      - cluster.name=chatgpt-logs-cluster
      - node.name=chatgpt-es-node-01
      - discovery.type=single-node

      # Memory configuration (CRITICAL for production)
      - ES_JAVA_OPTS=-Xms4g -Xmx4g
      - bootstrap.memory_lock=true

      # Security configuration
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}

      # Performance tuning
      - indices.memory.index_buffer_size=30%
      - thread_pool.write.queue_size=1000
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
      - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - elk
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
    restart: unless-stopped

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.3
    container_name: chatgpt-logstash
    environment:
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - XPACK_MONITORING_ENABLED=true
      - XPACK_MONITORING_ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      - LS_JAVA_OPTS=-Xmx2g -Xms2g
    volumes:
      - ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro
      - ./logstash/pipeline:/usr/share/logstash/pipeline:ro
      - ./logstash/patterns:/usr/share/logstash/patterns:ro
    ports:
      - "5044:5044"  # Beats input
      - "9600:9600"  # Logstash monitoring API
    networks:
      - elk
    depends_on:
      elasticsearch:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:9600/_node/stats || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
    restart: unless-stopped

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.3
    container_name: chatgpt-kibana
    environment:
      - SERVERNAME=chatgpt-kibana
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
      - XPACK_SECURITY_ENABLED=true
      - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${KIBANA_ENCRYPTION_KEY}
    volumes:
      - ./kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml:ro
      - kibana-data:/usr/share/kibana/data
    ports:
      - "5601:5601"
    networks:
      - elk
    depends_on:
      elasticsearch:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:5601/api/status || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
    restart: unless-stopped

  filebeat:
    image: docker.elastic.co/beats/filebeat:8.11.3
    container_name: chatgpt-filebeat
    user: root
    environment:
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
    volumes:
      - ./filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - filebeat-data:/usr/share/filebeat/data
    networks:
      - elk
    depends_on:
      logstash:
        condition: service_healthy
    command: filebeat -e -strict.perms=false
    restart: unless-stopped

volumes:
  elasticsearch-data:
    driver: local
  kibana-data:
    driver: local
  filebeat-data:
    driver: local

networks:
  elk:
    driver: bridge

Critical production considerations:

  1. Memory allocation: Elasticsearch requires -Xms and -Xmx to be equal (prevents heap resizing). Allocate 50% of available RAM (max 32GB due to compressed pointers).

  2. Volume persistence: Named volumes ensure data survives container restarts. For production, use block storage (AWS EBS, GCP Persistent Disk).

  3. Health checks: Ensure services start in the correct order (Elasticsearch → Logstash → Kibana → Filebeat).

  4. Security: Use environment variables for passwords. Generate strong keys with openssl rand -hex 32.

Logstash Pipeline Configuration

Logstash pipelines define how logs flow from inputs through filters to outputs. This production-ready pipeline handles ChatGPT application logs with JSON parsing, field extraction, and enrichment.

# logstash/pipeline/chatgpt-app.conf

input {
  # Beats input (receives logs from Filebeat)
  beats {
    port => 5044
    codec => json
  }

  # HTTP input (for direct log shipping from apps)
  http {
    port => 8080
    codec => json
    additional_codecs => {
      "application/json" => "json"
    }
  }
}

filter {
  # Parse JSON logs from MCP servers
  if [message] =~ /^\{.*\}$/ {
    json {
      source => "message"
      target => "parsed"
    }

    # Promote parsed fields to top level
    if [parsed] {
      mutate {
        rename => {
          "[parsed][level]" => "log_level"
          "[parsed][timestamp]" => "log_timestamp"
          "[parsed][service]" => "service_name"
          "[parsed][tool]" => "tool_name"
          "[parsed][user_id]" => "user_id"
          "[parsed][duration_ms]" => "response_time"
          "[parsed][error]" => "error_message"
          "[parsed][stack]" => "stack_trace"
        }
      }
    }
  }

  # Parse unstructured logs with grok patterns
  if ![log_level] {
    grok {
      match => {
        "message" => "%{TIMESTAMP_ISO8601:log_timestamp} %{LOGLEVEL:log_level} \[%{DATA:service_name}\] %{GREEDYDATA:log_message}"
      }
      patterns_dir => ["/usr/share/logstash/patterns"]
    }
  }

  # Convert timestamps to @timestamp field
  if [log_timestamp] {
    date {
      match => ["log_timestamp", "ISO8601", "yyyy-MM-dd'T'HH:mm:ss.SSSZ"]
      target => "@timestamp"
      remove_field => ["log_timestamp"]
    }
  }

  # Normalize log levels
  mutate {
    uppercase => ["log_level"]
  }

  # Add environment metadata
  mutate {
    add_field => {
      "environment" => "${ENVIRONMENT:production}"
      "region" => "${AWS_REGION:us-east-1}"
      "deployment_version" => "${DEPLOYMENT_VERSION:unknown}"
    }
  }

  # Parse user agent strings
  if [http_user_agent] {
    useragent {
      source => "http_user_agent"
      target => "user_agent"
    }
  }

  # GeoIP lookup for client IPs
  if [client_ip] {
    geoip {
      source => "client_ip"
      target => "geoip"
      fields => ["city_name", "country_name", "location"]
    }
  }

  # Calculate derived metrics
  if [response_time] {
    ruby {
      code => "
        response_time = event.get('response_time').to_f
        event.set('response_time_category',
          case response_time
          when 0..100 then 'fast'
          when 101..500 then 'normal'
          when 501..2000 then 'slow'
          else 'very_slow'
          end
        )
      "
    }
  }

  # Tag errors for alerting
  if [log_level] == "ERROR" or [log_level] == "CRITICAL" {
    mutate {
      add_tag => ["error_log"]
    }
  }

  # Remove unnecessary fields
  mutate {
    remove_field => ["host", "agent", "ecs", "input", "parsed"]
  }
}

output {
  # Primary output: Elasticsearch
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    user => "elastic"
    password => "${ELASTIC_PASSWORD}"

    # Dynamic index routing by date
    index => "chatgpt-logs-%{+YYYY.MM.dd}"

    # Document ID (prevents duplicates)
    document_id => "%{[@metadata][fingerprint]}"

    # ILM policy (Index Lifecycle Management)
    ilm_enabled => true
    ilm_rollover_alias => "chatgpt-logs"
    ilm_pattern => "{now/d}-000001"
    ilm_policy => "chatgpt-logs-policy"
  }

  # Error output: Separate index for ERROR/CRITICAL logs
  if "error_log" in [tags] {
    elasticsearch {
      hosts => ["http://elasticsearch:9200"]
      user => "elastic"
      password => "${ELASTIC_PASSWORD}"
      index => "chatgpt-errors-%{+YYYY.MM.dd}"
    }
  }

  # Debugging output (only in non-production)
  if "${ENVIRONMENT:production}" != "production" {
    stdout {
      codec => rubydebug
    }
  }
}

Pipeline highlights:

  • Dual input: Accepts logs from Filebeat (port 5044) and direct HTTP (port 8080)
  • JSON parsing: Extracts structured fields from JSON logs
  • Grok patterns: Parses unstructured logs when JSON isn't available
  • Enrichment: Adds GeoIP, user agent parsing, environment metadata
  • Dynamic indexing: Creates daily indices (chatgpt-logs-2026-12-25)
  • Error routing: Sends ERROR/CRITICAL logs to a separate index for faster alerting

For advanced log analysis techniques, see our guide: Log Analysis with Kibana for ChatGPT.

Custom Grok Patterns for ChatGPT Logs

Grok patterns enable you to parse unstructured log messages into structured fields. These custom patterns handle common ChatGPT application log formats.

# logstash/patterns/chatgpt-patterns.txt

# MCP Server log pattern
# Example: 2026-12-25T14:32:18.456Z INFO [restaurant-booking] Tool call: create_reservation user=user_123 duration=234ms
MCP_LOG %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:service}\] Tool call: %{DATA:tool} user=%{DATA:user_id} duration=%{NUMBER:duration_ms}ms

# Widget runtime pattern
# Example: [2026-12-25 14:32:18] WARN Widget timeout: MapWidget component=InteractiveMap timeout=5000ms
WIDGET_LOG \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} Widget %{DATA:event_type}: %{DATA:widget_name} component=%{DATA:component_name} timeout=%{NUMBER:timeout_ms}ms

# Authentication log pattern
# Example: 2026-12-25T14:32:18Z INFO [auth-service] OAuth token verified: user_id=user_123 scope=read_profile,write_apps ip=203.0.113.42
AUTH_LOG %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[auth-service\] %{DATA:auth_event}: user_id=%{DATA:user_id} scope=%{DATA:scopes} ip=%{IP:client_ip}

# Error with stack trace pattern
# Example: 2026-12-25T14:32:18Z ERROR [mcp-server] UnhandledPromiseRejection: Connection timeout
ERROR_LOG %{TIMESTAMP_ISO8601:timestamp} ERROR \[%{DATA:service}\] %{DATA:error_type}: %{GREEDYDATA:error_message}

# Performance metric pattern
# Example: METRIC tool_call_duration_ms=234 service=restaurant-booking tool=create_reservation percentile=p95
METRIC_LOG METRIC %{DATA:metric_name}=%{NUMBER:metric_value} service=%{DATA:service} tool=%{DATA:tool} percentile=%{DATA:percentile}

Usage in pipeline:

filter {
  grok {
    match => {
      "message" => [
        "%{MCP_LOG}",
        "%{WIDGET_LOG}",
        "%{AUTH_LOG}",
        "%{ERROR_LOG}",
        "%{METRIC_LOG}"
      ]
    }
    patterns_dir => ["/usr/share/logstash/patterns"]
  }
}

Grok debugger tool: Use Kibana's Dev Tools → Grok Debugger to test patterns against real log samples.

Filebeat Configuration for Docker Containers

Filebeat ships logs from Docker containers to Logstash with minimal resource overhead. This configuration collects logs from all ChatGPT app containers with metadata enrichment.

# filebeat/filebeat.yml

filebeat.inputs:
  # Docker container log input
  - type: container
    enabled: true
    paths:
      - '/var/lib/docker/containers/*/*.log'

    # Decode JSON logs from containers
    json.keys_under_root: true
    json.overwrite_keys: true
    json.add_error_key: true

    # Add Docker metadata
    processors:
      - add_docker_metadata:
          host: "unix:///var/run/docker.sock"
          match_fields: ["container.id"]
          labels.dedot: true

      # Add container labels as fields
      - decode_json_fields:
          fields: ["message"]
          process_array: false
          max_depth: 3
          target: ""
          overwrite_keys: true

      # Add custom fields
      - add_fields:
          target: ''
          fields:
            environment: ${ENVIRONMENT:production}
            region: ${AWS_REGION:us-east-1}

    # Filter containers by label
    condition: "${data.docker.container.labels.logging} == 'enabled'"

  # File input (for non-containerized logs)
  - type: log
    enabled: true
    paths:
      - /var/log/chatgpt-apps/*.log
    fields:
      log_source: file_system
    multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
    multiline.negate: true
    multiline.match: after

# Filebeat modules (optional)
filebeat.modules:
  - module: system
    syslog:
      enabled: true
    auth:
      enabled: true

# Output to Logstash
output.logstash:
  hosts: ["logstash:5044"]

  # Load balancing across multiple Logstash instances
  loadbalance: true

  # Enable compression
  compression_level: 3

  # Bulk settings
  bulk_max_size: 2048
  worker: 2

  # Backpressure handling
  slow_start: true

# Logging configuration
logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat.log
  keepfiles: 7
  permissions: 0644

# Performance tuning
queue.mem:
  events: 4096
  flush.min_events: 512
  flush.timeout: 1s

# Monitoring
monitoring.enabled: true
monitoring.elasticsearch:
  hosts: ["http://elasticsearch:9200"]
  username: "elastic"
  password: "${ELASTIC_PASSWORD}"

Key features:

  • Container autodiscovery: Automatically detects and ships logs from all Docker containers
  • Metadata enrichment: Adds container name, labels, image, environment variables
  • JSON decoding: Parses JSON logs before sending to Logstash
  • Label filtering: Only ships logs from containers with logging=enabled label
  • Multiline handling: Combines stack traces into single log events
  • Backpressure: Slows down when Logstash is overloaded

Add logging label to ChatGPT app containers:

# In your ChatGPT app docker-compose.yml
services:
  mcp-server:
    labels:
      - "logging=enabled"

Kibana Dashboard Configuration

Kibana dashboards visualize log data for real-time monitoring and debugging. This dashboard configuration tracks ChatGPT application health, performance, and error rates.

{
  "title": "ChatGPT Application Monitoring Dashboard",
  "description": "Real-time monitoring for ChatGPT apps: request volume, latency, errors, tool usage",
  "panels": [
    {
      "id": "request_volume_timeline",
      "type": "line",
      "title": "Request Volume (Requests/min)",
      "gridData": {"x": 0, "y": 0, "w": 12, "h": 4},
      "visState": {
        "type": "line",
        "params": {
          "type": "line",
          "grid": {"categoryLines": false},
          "categoryAxes": [{"id": "CategoryAxis-1", "type": "category", "position": "bottom", "show": true}],
          "valueAxes": [{"id": "ValueAxis-1", "name": "Requests", "type": "value", "position": "left", "show": true}],
          "seriesParams": [{"show": true, "type": "line", "mode": "normal", "data": {"label": "Requests", "id": "1"}}]
        },
        "aggs": [
          {"id": "1", "enabled": true, "type": "count", "schema": "metric"},
          {"id": "2", "enabled": true, "type": "date_histogram", "schema": "segment", "params": {"field": "@timestamp", "interval": "1m", "min_doc_count": 0}}
        ]
      }
    },
    {
      "id": "error_rate_gauge",
      "type": "gauge",
      "title": "Error Rate (%)",
      "gridData": {"x": 12, "y": 0, "w": 6, "h": 4},
      "visState": {
        "type": "gauge",
        "params": {
          "gauge": {
            "gaugeType": "Arc",
            "percentageMode": true,
            "colorSchema": "Green to Red",
            "gaugeStyle": "Full",
            "backStyle": "Full",
            "orientation": "vertical",
            "verticalSplit": false,
            "labels": {"show": true, "color": "black"},
            "scale": {"show": true, "labels": false, "color": "#333"},
            "type": "meter",
            "style": {"bgFill": "#000", "fontSize": 60}
          }
        },
        "aggs": [
          {"id": "1", "enabled": true, "type": "count", "schema": "metric", "params": {"customLabel": "Error Rate"}},
          {"id": "2", "enabled": true, "type": "filters", "schema": "group", "params": {"filters": [{"input": {"query": "log_level:ERROR OR log_level:CRITICAL"}, "label": "Errors"}]}}
        ]
      }
    },
    {
      "id": "response_time_percentiles",
      "type": "area",
      "title": "Response Time Percentiles (ms)",
      "gridData": {"x": 18, "y": 0, "w": 6, "h": 4},
      "visState": {
        "type": "area",
        "aggs": [
          {"id": "1", "enabled": true, "type": "percentiles", "schema": "metric", "params": {"field": "response_time", "percents": [50, 95, 99]}},
          {"id": "2", "enabled": true, "type": "date_histogram", "schema": "segment", "params": {"field": "@timestamp", "interval": "1m"}}
        ]
      }
    },
    {
      "id": "top_tools_table",
      "type": "table",
      "title": "Top 10 Tools by Request Count",
      "gridData": {"x": 0, "y": 4, "w": 12, "h": 4},
      "visState": {
        "type": "table",
        "params": {
          "perPage": 10,
          "showPartialRows": false,
          "showMetricsAtAllLevels": false,
          "sort": {"columnIndex": null, "direction": null},
          "showTotal": true,
          "totalFunc": "sum"
        },
        "aggs": [
          {"id": "1", "enabled": true, "type": "count", "schema": "metric"},
          {"id": "2", "enabled": true, "type": "terms", "schema": "bucket", "params": {"field": "tool_name.keyword", "size": 10, "order": "desc", "orderBy": "1"}}
        ]
      }
    },
    {
      "id": "geographic_distribution_map",
      "type": "map",
      "title": "User Geographic Distribution",
      "gridData": {"x": 12, "y": 4, "w": 12, "h": 4},
      "visState": {
        "type": "map",
        "params": {
          "mapType": "Coordinate Map",
          "isDesaturated": false,
          "mapZoom": 2,
          "mapCenter": [0, 0]
        },
        "aggs": [
          {"id": "1", "enabled": true, "type": "count", "schema": "metric"},
          {"id": "2", "enabled": true, "type": "geohash_grid", "schema": "segment", "params": {"field": "geoip.location", "autoPrecision": true, "precision": 3}}
        ]
      }
    },
    {
      "id": "error_logs_table",
      "type": "table",
      "title": "Recent Error Logs",
      "gridData": {"x": 0, "y": 8, "w": 24, "h": 4},
      "visState": {
        "type": "table",
        "params": {
          "perPage": 20,
          "showPartialRows": false,
          "showMetricsAtAllLevels": false
        },
        "aggs": [
          {"id": "1", "enabled": true, "type": "top_hits", "schema": "metric", "params": {"field": "_source", "size": 20, "sortField": "@timestamp", "sortOrder": "desc"}},
          {"id": "2", "enabled": true, "type": "filters", "schema": "bucket", "params": {"filters": [{"input": {"query": "log_level:ERROR OR log_level:CRITICAL"}, "label": ""}]}}
        ]
      }
    }
  ],
  "timeRestore": true,
  "timeFrom": "now-1h",
  "timeTo": "now",
  "refreshInterval": {
    "pause": false,
    "value": 30000
  }
}

Dashboard features:

  • Request volume timeline: Tracks requests per minute with 1-minute granularity
  • Error rate gauge: Real-time error percentage with color-coded thresholds (green < 1%, yellow 1-5%, red > 5%)
  • Response time percentiles: P50, P95, P99 latency visualization
  • Top tools table: Shows which tools are most frequently called
  • Geographic map: User distribution based on GeoIP lookup
  • Error logs table: Live feed of ERROR/CRITICAL logs with full details

Import dashboard:

# Save JSON to chatgpt-dashboard.json, then:
curl -X POST "http://localhost:5601/api/kibana/dashboards/import" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -d @chatgpt-dashboard.json

Elasticsearch Index Template and ILM Policy

Index templates define field mappings and settings for new indices. ILM (Index Lifecycle Management) policies automate index lifecycle: rollover, retention, deletion.

{
  "index_patterns": ["chatgpt-logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.codec": "best_compression",
      "refresh_interval": "5s",
      "index.lifecycle.name": "chatgpt-logs-policy",
      "index.lifecycle.rollover_alias": "chatgpt-logs"
    },
    "mappings": {
      "properties": {
        "@timestamp": {"type": "date"},
        "log_level": {"type": "keyword"},
        "service_name": {"type": "keyword"},
        "tool_name": {"type": "keyword"},
        "user_id": {"type": "keyword"},
        "response_time": {"type": "long"},
        "response_time_category": {"type": "keyword"},
        "error_message": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256}}},
        "stack_trace": {"type": "text"},
        "message": {"type": "text"},
        "environment": {"type": "keyword"},
        "region": {"type": "keyword"},
        "deployment_version": {"type": "keyword"},
        "client_ip": {"type": "ip"},
        "geoip": {
          "properties": {
            "city_name": {"type": "keyword"},
            "country_name": {"type": "keyword"},
            "location": {"type": "geo_point"}
          }
        }
      }
    }
  }
}

ILM Policy:

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50GB",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "3d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Lifecycle phases:

  1. Hot phase: Active indices receiving writes. Rollover after 1 day or 50GB per shard.
  2. Warm phase: Older indices (3+ days). Shrink to 1 shard, force merge segments for compression.
  3. Delete phase: Indices older than 30 days are automatically deleted.

Apply template and policy:

# Create index template
curl -X PUT "http://localhost:9200/_index_template/chatgpt-logs-template" \
  -H "Content-Type: application/json" \
  -d @index-template.json

# Create ILM policy
curl -X PUT "http://localhost:9200/_ilm/policy/chatgpt-logs-policy" \
  -H "Content-Type: application/json" \
  -d @ilm-policy.json

For index optimization strategies, see: Elasticsearch Optimization for ChatGPT.

Production Deployment and Scaling

Deploying the ELK Stack to production requires careful planning for high availability, security, backup, and monitoring.

High Availability Architecture

For production ChatGPT applications handling millions of logs per day:

  • Elasticsearch cluster: Minimum 3 master-eligible nodes (quorum = 2). Separate data nodes for horizontal scaling.
  • Logstash: Deploy 2+ instances behind a load balancer (Filebeat automatically load balances).
  • Kibana: Run 2+ instances behind a load balancer with session affinity.
  • Filebeat: Deploy as a DaemonSet (Kubernetes) or on every Docker host.

Example AWS deployment:

  • Elasticsearch: 3× c5.2xlarge instances (8 vCPU, 16GB RAM) across 3 availability zones
  • Logstash: 2× c5.xlarge instances (4 vCPU, 8GB RAM) behind Application Load Balancer
  • Kibana: 2× t3.medium instances (2 vCPU, 4GB RAM) behind ALB with sticky sessions

Security Hardening

  1. Enable X-Pack Security: Authentication, role-based access control (RBAC), field-level security
  2. TLS/SSL encryption: Encrypt all communication (Elasticsearch cluster, Logstash → ES, Kibana → ES)
  3. API key authentication: Use API keys instead of passwords for application log shipping
  4. Network segmentation: Place Elasticsearch/Logstash in private subnets, expose only Kibana via ALB
  5. Audit logging: Enable audit logs for all authentication, authorization, and data access events
# elasticsearch.yml security configuration
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.audit.enabled: true

Backup and Disaster Recovery

Snapshot repository (S3 example):

# Register S3 snapshot repository
curl -X PUT "http://localhost:9200/_snapshot/chatgpt-logs-backup" -H "Content-Type: application/json" -d '{
  "type": "s3",
  "settings": {
    "bucket": "chatgpt-elasticsearch-backups",
    "region": "us-east-1",
    "base_path": "snapshots",
    "compress": true
  }
}'

# Create snapshot (automated via cron or Elasticsearch snapshot policy)
curl -X PUT "http://localhost:9200/_snapshot/chatgpt-logs-backup/snapshot-$(date +%Y%m%d-%H%M%S)?wait_for_completion=false"

Snapshot policy (automated daily backups, 30-day retention):

{
  "policy": {
    "schedule": "0 2 * * *",
    "name": "<chatgpt-logs-{now/d}>",
    "repository": "chatgpt-logs-backup",
    "config": {
      "indices": ["chatgpt-logs-*"],
      "ignore_unavailable": false,
      "include_global_state": false
    },
    "retention": {
      "expire_after": "30d",
      "min_count": 5,
      "max_count": 50
    }
  }
}

Monitoring and Alerting

Use Elasticsearch built-in monitoring (X-Pack):

# Enable monitoring in elasticsearch.yml
xpack.monitoring.collection.enabled: true

# In Kibana: Stack Monitoring shows cluster health, node stats, index stats

Watcher alerts for critical errors:

{
  "trigger": {
    "schedule": {"interval": "5m"}
  },
  "input": {
    "search": {
      "request": {
        "indices": ["chatgpt-logs-*"],
        "body": {
          "query": {
            "bool": {
              "must": [
                {"range": {"@timestamp": {"gte": "now-5m"}}},
                {"terms": {"log_level": ["ERROR", "CRITICAL"]}}
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {"ctx.payload.hits.total": {"gte": 100}}
  },
  "actions": {
    "send_email": {
      "email": {
        "to": "ops@example.com",
        "subject": "ChatGPT App: High Error Rate Alert",
        "body": "Detected {{ctx.payload.hits.total}} errors in the last 5 minutes"
      }
    }
  }
}

Conclusion

The ELK Stack provides a production-ready log aggregation platform for ChatGPT applications, enabling centralized search, real-time analytics, and proactive monitoring across distributed MCP servers and widgets. With the Docker Compose setup, Logstash pipelines, and Kibana dashboards provided in this guide, you now have a complete logging infrastructure that scales from prototype to production.

Key takeaways:

  • Elasticsearch provides fast, scalable log storage with near-real-time search
  • Logstash transforms and enriches logs with filters, grok patterns, and metadata
  • Kibana visualizes log data with customizable dashboards and alerts
  • Filebeat ships logs from Docker containers with minimal overhead
  • Production deployment requires high availability, security hardening, backup strategies, and monitoring

For complete ChatGPT application development workflows including logging integration, see our Complete Guide to Building ChatGPT Applications.

Skip the Infrastructure Complexity

Building and maintaining the ELK Stack requires significant DevOps expertise and ongoing management. If you'd rather focus on building your ChatGPT application instead of managing logging infrastructure, MakeAIHQ provides:

  • Managed log aggregation with centralized search and real-time dashboards
  • Pre-built monitoring for MCP servers, tool calls, and error tracking
  • Automatic alerting for performance degradation and error spikes
  • No infrastructure to manage – we handle Elasticsearch, Logstash, Kibana scaling and upgrades

Start your free trial and deploy production ChatGPT apps with enterprise logging in minutes, not weeks.


Related Guides:

External Resources: