Incident Response Planning for ChatGPT Apps
In the high-stakes world of ChatGPT applications handling sensitive user data and critical business operations, a security incident isn't a question of "if" but "when." The difference between a minor disruption and a catastrophic breach often comes down to one factor: preparation. Organizations with mature incident response (IR) capabilities contain breaches 54 days faster than those without, reducing average breach costs from $4.45M to $3.05M according to IBM's Cost of a Data Breach Report.
ChatGPT apps face unique incident response challenges: AI model poisoning attacks, prompt injection exploits, training data exfiltration, and multi-tenant isolation failures. Traditional IR frameworks designed for web applications miss these AI-specific threat vectors. This creates blind spots where attackers can operate undetected for weeks while compromising conversation history, extracting proprietary model configurations, or manipulating chatbot responses to spread misinformation.
This guide implements the NIST Incident Response Framework (SP 800-61 Rev. 2) tailored specifically for ChatGPT applications. We'll cover the complete incident lifecycle: Preparation → Detection & Analysis → Containment, Eradication & Recovery → Post-Incident Activity. You'll learn how to build automated detection systems that catch prompt injection attempts in milliseconds, containment playbooks that isolate compromised tenant environments without disrupting other users, and recovery orchestrators that restore service while preserving forensic evidence.
By the end, you'll have production-ready runbooks, incident detection engines with real-time alerting, and automated response workflows that reduce P0 incident MTTD (Mean Time To Detect) from hours to minutes. Whether you're preparing for SOC 2 Type II certification or responding to your first data breach, this framework ensures you're ready for the worst-case scenario.
Understanding Incident Severity Levels
Before building response capabilities, establish clear severity classifications that determine response urgency and resource allocation:
P0 - Critical (15-minute response SLA)
- Complete service outage affecting all users
- Active data breach with confirmed exfiltration
- AI model poisoning with malicious outputs
- Authentication bypass or privilege escalation exploits
P1 - High (1-hour response SLA)
- Partial outage affecting >25% of users
- Suspected data breach (detection without confirmation)
- Successful prompt injection attacks
- Unauthorized access to admin systems
P2 - Medium (4-hour response SLA)
- Degraded performance affecting <25% users
- Security control failures (logging, encryption)
- Anomalous access patterns suggesting reconnaissance
- Compliance violations (data retention, audit logs)
P3 - Low (24-hour response SLA)
- Minor service disruptions
- Security misconfigurations without exploitation
- Policy violations (password requirements, MFA)
P4 - Informational (72-hour response SLA)
- Security recommendations
- Vulnerability disclosures (unconfirmed)
- Threat intelligence updates
These classifications drive escalation paths, notification requirements, and resource allocation. A P0 incident triggers immediate page-outs to on-call engineers, security team, executive leadership, and external IR firms. P3 incidents may only notify the security team during business hours.
Phase 1: Preparation - Building Response Capabilities
Preparation is the most critical phase of incident response. Organizations that invest in preparation reduce incident costs by 58% compared to reactive approaches. This phase focuses on building three core capabilities: team structure, runbook development, and technical infrastructure.
Incident Response Team Structure
Your IR team should include these roles:
Incident Commander (IC) - Single decision-maker with authority to declare incidents, authorize containment actions, and allocate resources. Typically a senior engineering manager or security director.
Technical Lead - Deep technical expert who analyzes logs, performs forensics, and implements containment measures. Usually a principal engineer or security architect.
Communications Lead - Manages internal/external communications, user notifications, and regulatory reporting. Often from legal or compliance teams.
Scribe - Documents timeline, decisions, and actions in real-time. Creates post-mortem reports and preserves evidence chain of custody.
Subject Matter Experts (SMEs) - On-call rotations for specific domains: AI/ML security, infrastructure, application security, compliance.
Incident Response Runbook
A runbook is your crisis playbook - step-by-step procedures for common incident scenarios. Here's a production-ready runbook template:
# Incident Response Runbook - ChatGPT App Security
# Version: 2.0 | Last Updated: 2026-12-25
# Owner: Security Team | On-Call: security-oncall@company.com
metadata:
severity_levels:
P0:
sla_minutes: 15
escalation: ["IC", "Security Director", "CTO", "CEO"]
notification: ["PagerDuty", "Slack #incidents", "Email executives"]
P1:
sla_minutes: 60
escalation: ["IC", "Security Team", "Engineering Manager"]
notification: ["PagerDuty", "Slack #incidents"]
P2:
sla_minutes: 240
escalation: ["Security Team", "Engineering Manager"]
notification: ["Slack #security", "Email security@"]
P3:
sla_minutes: 1440
escalation: ["Security Team"]
notification: ["Slack #security"]
playbooks:
data_breach_confirmed:
severity: P0
description: "Active data exfiltration detected or confirmed unauthorized data access"
detection_signals:
- Unusual outbound data transfers (>100GB to external IPs)
- Database export commands in audit logs
- Firestore batch reads exceeding normal baseline by 500%
- Alert: "Anomalous data access pattern detected"
response_steps:
1_initial_triage:
duration: "5 minutes"
actions:
- Verify alert is not false positive (check monitoring dashboards)
- Identify affected systems and data scope
- Page Incident Commander via PagerDuty
- Create incident war room (Zoom + Slack channel)
2_immediate_containment:
duration: "10 minutes"
actions:
- Revoke compromised API keys/tokens
- Block source IP addresses at firewall
- Enable aggressive rate limiting (10 req/min per user)
- Isolate affected tenant databases
- Disable compromised user accounts
commands:
- "gcloud compute firewall-rules create block-attacker-ip --action=DENY --source-ranges=<ATTACKER_IP>"
- "firebase auth:export --format=JSON | jq '.users[] | select(.uid==\"<COMPROMISED_UID>\")' | firebase auth:disable"
- "node scripts/revoke-api-keys.js --tenant=<TENANT_ID>"
3_evidence_preservation:
duration: "20 minutes"
actions:
- Snapshot affected Firestore collections
- Export Cloud Function logs (last 30 days)
- Capture network traffic logs from VPC Flow Logs
- Preserve authentication audit trails
commands:
- "gcloud firestore export gs://<BUCKET>/forensics/$(date +%s)"
- "gcloud logging read 'resource.type=cloud_function' --limit=50000 --format=json > logs.json"
- "gsutil -m cp -r gs://<PROJECT>.appspot.com/<TENANT>/* gs://<FORENSICS_BUCKET>/"
4_eradication:
duration: "60 minutes"
actions:
- Rotate all credentials (service accounts, API keys, JWT secrets)
- Patch exploited vulnerability
- Remove malicious code/backdoors
- Reset affected user passwords (force re-authentication)
5_recovery:
duration: "120 minutes"
actions:
- Restore from last known good backup (if data corruption)
- Re-enable services with enhanced monitoring
- Gradual traffic ramp-up (10% → 50% → 100%)
- Verify system integrity (checksums, security scans)
6_notification:
duration: "24 hours"
regulatory_requirements:
GDPR: "72 hours to supervisory authority"
CCPA: "Without unreasonable delay"
HIPAA: "60 days to affected individuals"
actions:
- Draft user notification (reviewed by legal)
- Report to data protection authority (if EU users affected)
- Notify insurance carrier
- Prepare public statement (if media inquiries)
prompt_injection_attack:
severity: P1
description: "Successful prompt injection bypassing security controls"
detection_signals:
- Alert: "Prompt injection pattern detected"
- Unusual system prompts in conversation logs
- Chatbot responses containing training data
- SQL queries or code execution in AI outputs
response_steps:
1_validate_exploit:
duration: "10 minutes"
actions:
- Review flagged conversation in logs
- Attempt to reproduce exploit in sandbox
- Identify injection technique (jailbreak, instruction override)
2_containment:
duration: "15 minutes"
actions:
- Deploy emergency prompt filter update
- Enable strict content filtering (block suspicious patterns)
- Rate-limit affected user/IP
- Log all subsequent requests for forensic analysis
3_remediation:
duration: "60 minutes"
actions:
- Update system prompt with stronger boundaries
- Add input validation rules to block injection patterns
- Implement output sanitization
- Test fixes against known injection techniques
communications:
templates:
user_notification:
subject: "Security Incident Notification - Action Required"
body: |
Dear [User],
We are writing to inform you of a security incident that may have affected your account.
**What Happened:** [Brief description without technical jargon]
**Data Affected:** [Specific data types: email, conversation history, etc.]
**What We're Doing:** [Containment and remediation actions]
**What You Should Do:**
- Reset your password immediately
- Enable two-factor authentication
- Review account activity for unauthorized access
**Timeline:**
- Incident Detected: [Date/Time]
- Containment Completed: [Date/Time]
- Systems Restored: [Date/Time]
We sincerely apologize for this incident and are committed to preventing future occurrences.
For questions, contact: security@company.com
Sincerely,
[Company] Security Team
escalation_matrix:
P0:
immediate: ["Incident Commander", "Security Director", "CTO"]
within_30min: ["CEO", "Legal Counsel", "External IR Firm"]
within_2hrs: ["Board of Directors", "Cyber Insurance"]
P1:
immediate: ["Incident Commander", "Security Team"]
within_1hr: ["Engineering VP", "Legal Counsel"]
P2:
immediate: ["Security Team Lead"]
within_4hrs: ["Engineering Manager"]
tools:
monitoring:
- name: "Google Cloud Monitoring"
alerts: ["Firestore anomalous reads", "Cloud Functions errors >5%", "Authentication failures"]
- name: "Sentry"
alerts: ["Application errors", "Performance degradation"]
forensics:
- name: "Cloud Logging"
retention: "30 days"
exports: "gs://forensics-bucket"
- name: "VPC Flow Logs"
enabled: true
sampling: 1.0
communication:
- name: "PagerDuty"
escalation_policy: "Security Incidents"
- name: "Slack"
channels: ["#incidents", "#security", "#engineering"]
This runbook covers the two most common ChatGPT app incidents: data breaches and prompt injection attacks. Customize the IP addresses, bucket names, and contact details for your organization.
For comprehensive incident response planning, see our ChatGPT App Security Best Practices guide.
Phase 2: Detection and Analysis - Finding Incidents Fast
The average time to detect a breach is 207 days (IBM). For ChatGPT apps, this is catastrophic - attackers can exfiltrate months of conversation history, fine-tune models on stolen data, or maintain persistent backdoors. Automated detection systems reduce MTTD from months to minutes.
Building an Incident Detection Engine
Modern detection relies on behavioral analytics rather than signature-based rules. Instead of looking for known attack patterns (easily evaded), we establish baselines of normal behavior and alert on statistical anomalies.
// Incident Detection Engine for ChatGPT Apps
// Monitors Firestore activity, API usage, and authentication patterns
// Triggers alerts for anomalous behavior using statistical analysis
import { Firestore } from '@google-cloud/firestore';
import { PubSub } from '@google-cloud/pubsub';
import * as admin from 'firebase-admin';
interface DetectionRule {
name: string;
severity: 'P0' | 'P1' | 'P2' | 'P3';
condition: (context: DetectionContext) => Promise<boolean>;
threshold: number;
windowMinutes: number;
}
interface DetectionContext {
userId: string;
tenantId: string;
metrics: ActivityMetrics;
baseline: BaselineMetrics;
}
interface ActivityMetrics {
firestoreReads: number;
firestoreWrites: number;
apiCalls: number;
authFailures: number;
promptInjections: number;
dataExported: number; // bytes
suspiciousPatterns: string[];
}
interface BaselineMetrics {
avgReads: number;
avgWrites: number;
avgApiCalls: number;
stdDevReads: number;
stdDevWrites: number;
}
class IncidentDetectionEngine {
private db: Firestore;
private pubsub: PubSub;
private rules: DetectionRule[];
constructor() {
this.db = new Firestore();
this.pubsub = new PubSub();
this.rules = this.initializeRules();
}
private initializeRules(): DetectionRule[] {
return [
{
name: 'Anomalous Firestore Reads',
severity: 'P0',
threshold: 5.0, // 5 standard deviations above baseline
windowMinutes: 10,
condition: async (ctx) => {
const zScore = (ctx.metrics.firestoreReads - ctx.baseline.avgReads) /
ctx.baseline.stdDevReads;
return zScore > 5.0 && ctx.metrics.firestoreReads > 1000;
}
},
{
name: 'Mass Data Export',
severity: 'P0',
threshold: 100 * 1024 * 1024 * 1024, // 100 GB
windowMinutes: 60,
condition: async (ctx) => {
return ctx.metrics.dataExported > 100 * 1024 * 1024 * 1024;
}
},
{
name: 'Repeated Authentication Failures',
severity: 'P1',
threshold: 20,
windowMinutes: 5,
condition: async (ctx) => {
return ctx.metrics.authFailures > 20;
}
},
{
name: 'Prompt Injection Attempts',
severity: 'P1',
threshold: 10,
windowMinutes: 10,
condition: async (ctx) => {
return ctx.metrics.promptInjections > 10;
}
},
{
name: 'Privilege Escalation',
severity: 'P0',
threshold: 1,
windowMinutes: 1,
condition: async (ctx) => {
return ctx.metrics.suspiciousPatterns.includes('PRIVILEGE_ESCALATION');
}
}
];
}
async monitorActivity(userId: string, tenantId: string): Promise<void> {
const metrics = await this.collectMetrics(userId, tenantId, 10);
const baseline = await this.getBaseline(userId, tenantId);
const context: DetectionContext = {
userId,
tenantId,
metrics,
baseline
};
for (const rule of this.rules) {
const triggered = await rule.condition(context);
if (triggered) {
await this.createIncident(rule, context);
}
}
}
private async collectMetrics(
userId: string,
tenantId: string,
windowMinutes: number
): Promise<ActivityMetrics> {
const cutoff = new Date(Date.now() - windowMinutes * 60 * 1000);
// Firestore read/write activity
const activityRef = this.db.collection('activity_logs');
const snapshot = await activityRef
.where('userId', '==', userId)
.where('timestamp', '>', cutoff)
.get();
let firestoreReads = 0;
let firestoreWrites = 0;
let apiCalls = 0;
let authFailures = 0;
let promptInjections = 0;
let dataExported = 0;
const suspiciousPatterns: string[] = [];
snapshot.forEach(doc => {
const data = doc.data();
if (data.action === 'firestore_read') firestoreReads++;
if (data.action === 'firestore_write') firestoreWrites++;
if (data.action === 'api_call') apiCalls++;
if (data.action === 'auth_failure') authFailures++;
if (data.action === 'prompt_injection') promptInjections++;
if (data.action === 'export') dataExported += data.bytes || 0;
// Pattern detection
if (data.action === 'privilege_change' && data.newRole === 'admin') {
suspiciousPatterns.push('PRIVILEGE_ESCALATION');
}
if (data.sqlQuery && /DROP|DELETE|UPDATE/i.test(data.sqlQuery)) {
suspiciousPatterns.push('SQL_INJECTION');
}
});
return {
firestoreReads,
firestoreWrites,
apiCalls,
authFailures,
promptInjections,
dataExported,
suspiciousPatterns
};
}
private async getBaseline(userId: string, tenantId: string): Promise<BaselineMetrics> {
// Calculate 30-day baseline from historical data
const thirtyDaysAgo = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000);
const baselineRef = this.db.collection('user_baselines').doc(userId);
const baselineDoc = await baselineRef.get();
if (!baselineDoc.exists) {
// First-time user - use tenant-wide baseline
return this.getTenantBaseline(tenantId);
}
return baselineDoc.data() as BaselineMetrics;
}
private async getTenantBaseline(tenantId: string): Promise<BaselineMetrics> {
const tenantBaselineRef = this.db.collection('tenant_baselines').doc(tenantId);
const doc = await tenantBaselineRef.get();
if (!doc.exists) {
// Default conservative baseline
return {
avgReads: 100,
avgWrites: 50,
avgApiCalls: 200,
stdDevReads: 20,
stdDevWrites: 10
};
}
return doc.data() as BaselineMetrics;
}
private async createIncident(rule: DetectionRule, context: DetectionContext): Promise<void> {
const incidentId = `INC-${Date.now()}`;
const incident = {
id: incidentId,
severity: rule.severity,
status: 'open',
rule: rule.name,
userId: context.userId,
tenantId: context.tenantId,
metrics: context.metrics,
baseline: context.baseline,
detectedAt: admin.firestore.FieldValue.serverTimestamp(),
assignedTo: null
};
// Store incident in Firestore
await this.db.collection('incidents').doc(incidentId).set(incident);
// Publish alert to Pub/Sub for processing
const topic = this.pubsub.topic('security-incidents');
await topic.publishMessage({
json: incident,
attributes: {
severity: rule.severity,
type: 'security_incident'
}
});
// Send immediate notifications for P0/P1
if (rule.severity === 'P0' || rule.severity === 'P1') {
await this.sendPagerDutyAlert(incident);
await this.sendSlackAlert(incident);
}
console.log(`[INCIDENT CREATED] ${incidentId} - ${rule.name} (${rule.severity})`);
}
private async sendPagerDutyAlert(incident: any): Promise<void> {
// Integration with PagerDuty Events API
const payload = {
routing_key: process.env.PAGERDUTY_INTEGRATION_KEY,
event_action: 'trigger',
payload: {
summary: `${incident.severity}: ${incident.rule}`,
severity: incident.severity === 'P0' ? 'critical' : 'error',
source: 'ChatGPT App Security',
custom_details: {
incident_id: incident.id,
user_id: incident.userId,
tenant_id: incident.tenantId,
metrics: incident.metrics
}
}
};
// HTTP POST to PagerDuty (implementation details omitted)
console.log('[PAGERDUTY] Alert sent:', payload);
}
private async sendSlackAlert(incident: any): Promise<void> {
const message = {
channel: '#incidents',
text: `🚨 *${incident.severity} Security Incident Detected*`,
blocks: [
{
type: 'section',
text: {
type: 'mrkdwn',
text: `*Incident ID:* ${incident.id}\n*Rule:* ${incident.rule}\n*Severity:* ${incident.severity}`
}
},
{
type: 'section',
fields: [
{ type: 'mrkdwn', text: `*User ID:*\n${incident.userId}` },
{ type: 'mrkdwn', text: `*Tenant ID:*\n${incident.tenantId}` }
]
}
]
};
// HTTP POST to Slack Webhook (implementation details omitted)
console.log('[SLACK] Alert sent to #incidents:', message);
}
}
// Cloud Function: Real-time activity monitor
export const monitorSecurityEvents = async () => {
const engine = new IncidentDetectionEngine();
// Monitor all active users every minute
const usersSnapshot = await admin.firestore()
.collection('users')
.where('status', '==', 'active')
.get();
const promises = usersSnapshot.docs.map(doc => {
const { uid, tenantId } = doc.data();
return engine.monitorActivity(uid, tenantId);
});
await Promise.all(promises);
};
This detection engine runs as a Cloud Function on a 1-minute schedule, analyzing activity patterns for all active users. When anomalies exceed thresholds (5 standard deviations above baseline), it automatically creates incidents and triggers notifications.
For comprehensive security monitoring, integrate with Security Auditing and Logging systems.
Phase 3: Containment - Stopping the Bleeding
Containment has two goals: stop the attack from spreading (short-term) and maintain business operations (long-term). For ChatGPT apps, this often means isolating compromised tenant environments while keeping other tenants operational.
Automated Containment Playbook
// Containment Automation for ChatGPT Apps
// Executes containment actions based on incident severity and type
import * as admin from 'firebase-admin';
import { Firestore } from '@google-cloud/firestore';
interface ContainmentAction {
type: 'REVOKE_TOKENS' | 'BLOCK_IP' | 'ISOLATE_TENANT' | 'DISABLE_USER' | 'RATE_LIMIT';
target: string;
severity: 'P0' | 'P1' | 'P2';
reversible: boolean;
impactRadius: 'USER' | 'TENANT' | 'GLOBAL';
}
class ContainmentOrchestrator {
private db: Firestore;
private auth: admin.auth.Auth;
constructor() {
this.db = new Firestore();
this.auth = admin.auth();
}
async executeContainment(incidentId: string): Promise<void> {
const incident = await this.getIncident(incidentId);
const actions = this.determineActions(incident);
console.log(`[CONTAINMENT] Executing ${actions.length} actions for ${incidentId}`);
for (const action of actions) {
try {
await this.executeAction(action, incident);
await this.logAction(incidentId, action, 'SUCCESS');
} catch (error) {
await this.logAction(incidentId, action, 'FAILED', error.message);
console.error(`[CONTAINMENT ERROR] ${action.type} failed:`, error);
}
}
}
private async getIncident(incidentId: string): Promise<any> {
const doc = await this.db.collection('incidents').doc(incidentId).get();
if (!doc.exists) throw new Error(`Incident ${incidentId} not found`);
return { id: incidentId, ...doc.data() };
}
private determineActions(incident: any): ContainmentAction[] {
const actions: ContainmentAction[] = [];
// Data breach response
if (incident.rule.includes('Firestore Reads') || incident.rule.includes('Data Export')) {
actions.push({
type: 'DISABLE_USER',
target: incident.userId,
severity: incident.severity,
reversible: true,
impactRadius: 'USER'
});
actions.push({
type: 'REVOKE_TOKENS',
target: incident.userId,
severity: incident.severity,
reversible: false,
impactRadius: 'USER'
});
}
// Brute force authentication
if (incident.rule.includes('Authentication Failures')) {
actions.push({
type: 'BLOCK_IP',
target: incident.metrics.sourceIp,
severity: incident.severity,
reversible: true,
impactRadius: 'GLOBAL'
});
actions.push({
type: 'RATE_LIMIT',
target: incident.userId,
severity: incident.severity,
reversible: true,
impactRadius: 'USER'
});
}
// Tenant-wide compromise
if (incident.severity === 'P0' && incident.metrics.affectedUsers > 10) {
actions.push({
type: 'ISOLATE_TENANT',
target: incident.tenantId,
severity: 'P0',
reversible: true,
impactRadius: 'TENANT'
});
}
return actions;
}
private async executeAction(action: ContainmentAction, incident: any): Promise<void> {
switch (action.type) {
case 'DISABLE_USER':
await this.auth.updateUser(action.target, { disabled: true });
console.log(`[CONTAINMENT] Disabled user: ${action.target}`);
break;
case 'REVOKE_TOKENS':
await this.auth.revokeRefreshTokens(action.target);
console.log(`[CONTAINMENT] Revoked tokens for user: ${action.target}`);
break;
case 'BLOCK_IP':
await this.addFirewallRule(action.target);
console.log(`[CONTAINMENT] Blocked IP: ${action.target}`);
break;
case 'RATE_LIMIT':
await this.db.collection('rate_limits').doc(action.target).set({
limit: 10,
windowSeconds: 60,
enabled: true,
reason: `Incident ${incident.id}`
});
console.log(`[CONTAINMENT] Applied rate limit: ${action.target}`);
break;
case 'ISOLATE_TENANT':
await this.db.collection('tenants').doc(action.target).update({
isolated: true,
isolatedAt: admin.firestore.FieldValue.serverTimestamp(),
isolationReason: incident.id
});
console.log(`[CONTAINMENT] Isolated tenant: ${action.target}`);
break;
}
}
private async addFirewallRule(ipAddress: string): Promise<void> {
// Add IP to blocked list in Firestore (read by Cloud Armor)
await this.db.collection('blocked_ips').add({
ip: ipAddress,
blockedAt: admin.firestore.FieldValue.serverTimestamp(),
reason: 'Automated containment',
expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000) // 24 hours
});
}
private async logAction(
incidentId: string,
action: ContainmentAction,
status: 'SUCCESS' | 'FAILED',
error?: string
): Promise<void> {
await this.db.collection('incidents').doc(incidentId).collection('actions').add({
type: action.type,
target: action.target,
status,
error: error || null,
timestamp: admin.firestore.FieldValue.serverTimestamp()
});
}
}
This orchestrator automatically executes containment based on incident type: disables compromised users, blocks malicious IPs, and isolates affected tenants. All actions are logged for audit trails and post-mortem analysis.
For broader security context, see Penetration Testing for ChatGPT Apps.
Phase 4: Eradication and Recovery - Removing Threats and Restoring Service
Containment stops the attack; eradication removes the root cause. Recovery restores normal operations while maintaining enhanced monitoring.
Recovery Orchestration System
// Recovery Orchestrator for ChatGPT Apps
// Manages phased recovery with verification checkpoints
interface RecoveryPhase {
name: string;
order: number;
actions: RecoveryAction[];
verificationSteps: VerificationStep[];
}
interface RecoveryAction {
description: string;
command: () => Promise<void>;
rollback: () => Promise<void>;
}
interface VerificationStep {
name: string;
check: () => Promise<boolean>;
requiredForNextPhase: boolean;
}
class RecoveryOrchestrator {
private db: Firestore;
constructor() {
this.db = new Firestore();
}
async executeRecovery(incidentId: string): Promise<void> {
const phases: RecoveryPhase[] = [
{
name: 'Credential Rotation',
order: 1,
actions: [
{
description: 'Rotate JWT secret',
command: async () => this.rotateJWTSecret(),
rollback: async () => this.restoreJWTSecret()
},
{
description: 'Rotate API keys',
command: async () => this.rotateAPIKeys(),
rollback: async () => this.restoreAPIKeys()
},
{
description: 'Rotate service account keys',
command: async () => this.rotateServiceAccountKeys(),
rollback: async () => this.restoreServiceAccountKeys()
}
],
verificationSteps: [
{
name: 'Verify authentication works with new credentials',
check: async () => this.testAuthentication(),
requiredForNextPhase: true
}
]
},
{
name: 'System Restoration',
order: 2,
actions: [
{
description: 'Re-enable isolated tenants',
command: async () => this.enableTenants(),
rollback: async () => this.disableTenants()
},
{
description: 'Re-enable disabled users',
command: async () => this.enableUsers(),
rollback: async () => this.disableUsers()
},
{
description: 'Restore rate limits to normal',
command: async () => this.restoreRateLimits(),
rollback: async () => this.enforceStrictRateLimits()
}
],
verificationSteps: [
{
name: 'Verify user logins successful',
check: async () => this.testUserLogin(),
requiredForNextPhase: true
},
{
name: 'Verify API responses normal',
check: async () => this.testAPIHealth(),
requiredForNextPhase: true
}
]
},
{
name: 'Traffic Ramp-Up',
order: 3,
actions: [
{
description: 'Enable 10% traffic',
command: async () => this.setTrafficPercentage(10),
rollback: async () => this.setTrafficPercentage(0)
},
{
description: 'Enable 50% traffic',
command: async () => this.setTrafficPercentage(50),
rollback: async () => this.setTrafficPercentage(10)
},
{
description: 'Enable 100% traffic',
command: async () => this.setTrafficPercentage(100),
rollback: async () => this.setTrafficPercentage(50)
}
],
verificationSteps: [
{
name: 'Monitor error rates <1%',
check: async () => this.checkErrorRates(),
requiredForNextPhase: true
},
{
name: 'Monitor latency p95 <500ms',
check: async () => this.checkLatency(),
requiredForNextPhase: true
}
]
}
];
for (const phase of phases.sort((a, b) => a.order - b.order)) {
console.log(`[RECOVERY] Starting phase: ${phase.name}`);
for (const action of phase.actions) {
console.log(`[RECOVERY] Executing: ${action.description}`);
try {
await action.command();
} catch (error) {
console.error(`[RECOVERY ERROR] ${action.description} failed:`, error);
await action.rollback();
throw new Error(`Recovery failed at: ${action.description}`);
}
}
// Verify phase completed successfully
for (const verification of phase.verificationSteps) {
console.log(`[RECOVERY] Verifying: ${verification.name}`);
const passed = await verification.check();
if (!passed && verification.requiredForNextPhase) {
throw new Error(`Verification failed: ${verification.name}`);
}
}
await this.logPhaseCompletion(incidentId, phase.name);
}
console.log('[RECOVERY] Complete - all systems restored');
}
private async rotateJWTSecret(): Promise<void> {
// Generate new JWT secret and update environment variables
const newSecret = crypto.randomBytes(64).toString('hex');
await this.db.collection('config').doc('jwt').set({
secret: newSecret,
rotatedAt: admin.firestore.FieldValue.serverTimestamp()
});
}
private async rotateAPIKeys(): Promise<void> {
// Invalidate old API keys and generate new ones
const apiKeysSnapshot = await this.db.collection('api_keys').get();
const updates = apiKeysSnapshot.docs.map(doc =>
doc.ref.update({
revoked: true,
revokedAt: admin.firestore.FieldValue.serverTimestamp()
})
);
await Promise.all(updates);
}
private async rotateServiceAccountKeys(): Promise<void> {
// Rotate Firebase service account keys (requires GCP API)
console.log('[RECOVERY] Service account rotation requires manual intervention');
}
private async testAuthentication(): Promise<boolean> {
// Test authentication flow with new credentials
try {
const testUser = await admin.auth().getUserByEmail('test@example.com');
const customToken = await admin.auth().createCustomToken(testUser.uid);
return !!customToken;
} catch {
return false;
}
}
private async enableTenants(): Promise<void> {
await this.db.collection('tenants')
.where('isolated', '==', true)
.get()
.then(snapshot => {
const updates = snapshot.docs.map(doc =>
doc.ref.update({ isolated: false })
);
return Promise.all(updates);
});
}
private async enableUsers(): Promise<void> {
const users = await admin.auth().listUsers();
const updates = users.users
.filter(u => u.disabled)
.map(u => admin.auth().updateUser(u.uid, { disabled: false }));
await Promise.all(updates);
}
private async restoreRateLimits(): Promise<void> {
await this.db.collection('rate_limits')
.where('enabled', '==', true)
.get()
.then(snapshot => {
const deletes = snapshot.docs.map(doc => doc.ref.delete());
return Promise.all(deletes);
});
}
private async setTrafficPercentage(percent: number): Promise<void> {
await this.db.collection('config').doc('traffic').set({
percentage: percent,
updatedAt: admin.firestore.FieldValue.serverTimestamp()
});
}
private async checkErrorRates(): Promise<boolean> {
// Query monitoring for error rates
const errorRate = 0.5; // Mock value - integrate with actual monitoring
return errorRate < 1.0;
}
private async checkLatency(): Promise<boolean> {
// Query monitoring for p95 latency
const p95Latency = 350; // Mock value - integrate with actual monitoring
return p95Latency < 500;
}
private async logPhaseCompletion(incidentId: string, phaseName: string): Promise<void> {
await this.db.collection('incidents').doc(incidentId).collection('recovery').add({
phase: phaseName,
completedAt: admin.firestore.FieldValue.serverTimestamp()
});
}
}
Recovery follows a phased approach with verification checkpoints. If any phase fails, automated rollback prevents further damage.
For vulnerability remediation guidance, see Vulnerability Management for ChatGPT Apps.
Phase 5: Post-Incident Activities - Learning from Failures
Post-mortem reports turn incidents into organizational learning. Without systematic analysis, teams repeat mistakes.
Automated Post-Mortem Generator
// Post-Mortem Report Generator
// Analyzes incident timeline and generates structured reports
interface PostMortemReport {
incidentId: string;
summary: string;
timeline: TimelineEvent[];
rootCause: RootCauseAnalysis;
impact: ImpactAssessment;
actionItems: ActionItem[];
}
interface TimelineEvent {
timestamp: Date;
event: string;
actor: string; // "AUTOMATED" or engineer name
}
interface RootCauseAnalysis {
technique: 'Five Whys' | 'Fishbone' | 'Fault Tree';
findings: string[];
rootCause: string;
}
interface ImpactAssessment {
usersAffected: number;
dataExposed: string;
downtimeMinutes: number;
financialImpact: number;
}
interface ActionItem {
description: string;
owner: string;
priority: 'P0' | 'P1' | 'P2';
dueDate: Date;
status: 'TODO' | 'IN_PROGRESS' | 'DONE';
}
class PostMortemGenerator {
private db: Firestore;
constructor() {
this.db = new Firestore();
}
async generateReport(incidentId: string): Promise<PostMortemReport> {
const incident = await this.getIncidentDetails(incidentId);
const timeline = await this.buildTimeline(incidentId);
const rootCause = await this.analyzeRootCause(incident, timeline);
const impact = await this.assessImpact(incident);
const actionItems = this.generateActionItems(incident, rootCause);
const report: PostMortemReport = {
incidentId,
summary: this.generateSummary(incident),
timeline,
rootCause,
impact,
actionItems
};
// Save report to Firestore
await this.db.collection('post_mortems').doc(incidentId).set(report);
// Generate human-readable document
const markdown = this.formatAsMarkdown(report);
console.log(markdown);
return report;
}
private async getIncidentDetails(incidentId: string): Promise<any> {
const doc = await this.db.collection('incidents').doc(incidentId).get();
return { id: incidentId, ...doc.data() };
}
private async buildTimeline(incidentId: string): Promise<TimelineEvent[]> {
const actionsSnapshot = await this.db
.collection('incidents')
.doc(incidentId)
.collection('actions')
.orderBy('timestamp', 'asc')
.get();
return actionsSnapshot.docs.map(doc => {
const data = doc.data();
return {
timestamp: data.timestamp.toDate(),
event: `${data.type}: ${data.target}`,
actor: data.actor || 'AUTOMATED'
};
});
}
private async analyzeRootCause(incident: any, timeline: TimelineEvent[]): Promise<RootCauseAnalysis> {
// Five Whys analysis
const findings = [
`Why did ${incident.rule} trigger? ${incident.metrics.firestoreReads} reads exceeded baseline by 500%`,
'Why did reads spike? Attacker executed batch export query',
'Why was batch export allowed? Missing rate limiting on export endpoint',
'Why was rate limiting missing? Export feature shipped without security review',
'Why no security review? Release process lacks security gate for API changes'
];
return {
technique: 'Five Whys',
findings,
rootCause: 'Missing security review gate in release process allowed vulnerable export endpoint to reach production'
};
}
private async assessImpact(incident: any): Promise<ImpactAssessment> {
return {
usersAffected: incident.metrics.affectedUsers || 0,
dataExposed: 'Email addresses, conversation history (2,341 records)',
downtimeMinutes: incident.recoveryDurationMinutes || 0,
financialImpact: this.calculateFinancialImpact(incident)
};
}
private calculateFinancialImpact(incident: any): number {
// Simplified cost model
const downtimeCost = (incident.recoveryDurationMinutes / 60) * 5000; // $5K/hour
const breachCost = incident.metrics.affectedUsers * 150; // $150/user (IBM average)
const regulatoryCost = incident.severity === 'P0' ? 50000 : 0; // GDPR fine risk
return downtimeCost + breachCost + regulatoryCost;
}
private generateActionItems(incident: any, rootCause: RootCauseAnalysis): ActionItem[] {
return [
{
description: 'Add security review gate to CI/CD pipeline',
owner: 'Security Team',
priority: 'P0',
dueDate: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000), // 7 days
status: 'TODO'
},
{
description: 'Implement rate limiting on all export endpoints',
owner: 'Backend Team',
priority: 'P0',
dueDate: new Date(Date.now() + 3 * 24 * 60 * 60 * 1000), // 3 days
status: 'TODO'
},
{
description: 'Add export activity monitoring to detection rules',
owner: 'Security Team',
priority: 'P1',
dueDate: new Date(Date.now() + 14 * 24 * 60 * 60 * 1000), // 14 days
status: 'TODO'
}
];
}
private generateSummary(incident: any): string {
return `On ${incident.detectedAt.toDate().toISOString()}, automated detection systems identified ${incident.rule} (${incident.severity}). The incident was contained within ${incident.containmentDurationMinutes} minutes and fully recovered after ${incident.recoveryDurationMinutes} minutes. Root cause analysis identified gaps in security review processes.`;
}
private formatAsMarkdown(report: PostMortemReport): string {
return `# Post-Mortem Report: ${report.incidentId}
## Summary
${report.summary}
## Impact Assessment
- **Users Affected:** ${report.impact.usersAffected}
- **Data Exposed:** ${report.impact.dataExposed}
- **Downtime:** ${report.impact.downtimeMinutes} minutes
- **Financial Impact:** $${report.impact.financialImpact.toLocaleString()}
## Timeline
${report.timeline.map(e => `- **${e.timestamp.toISOString()}**: ${e.event} (${e.actor})`).join('\n')}
## Root Cause Analysis (${report.rootCause.technique})
${report.rootCause.findings.map((f, i) => `${i + 1}. ${f}`).join('\n')}
**Root Cause:** ${report.rootCause.rootCause}
## Action Items
${report.actionItems.map(a => `- [ ] **[${a.priority}]** ${a.description} (@${a.owner}, due ${a.dueDate.toLocaleDateString()})`).join('\n')}
---
*Generated automatically by Post-Mortem Generator*
`;
}
}
This generator analyzes incidents using Five Whys methodology, calculates financial impact, and produces actionable remediation plans.
For compliance requirements, see GDPR Data Breach Notification and SOC 2 Certification for ChatGPT Apps.
Advanced Incident Response Capabilities
Communication Template Engine
// Automated communication templates for incident notifications
interface CommunicationTemplate {
audience: 'USERS' | 'REGULATORS' | 'EXECUTIVES' | 'MEDIA';
subject: string;
body: string;
variables: string[];
}
class CommunicationEngine {
private templates: Map<string, CommunicationTemplate>;
constructor() {
this.templates = new Map([
['USER_BREACH_NOTIFICATION', {
audience: 'USERS',
subject: 'Important Security Update - Action Required',
body: `Dear {{userName}},
We are writing to inform you of a security incident that affected your account on {{incidentDate}}.
**What Happened:**
{{incidentDescription}}
**Data Affected:**
{{dataTypes}}
**What We've Done:**
- Immediately contained the incident on {{containmentDate}}
- Conducted thorough investigation
- Implemented additional security measures
- Notified relevant authorities
**What You Should Do:**
1. Reset your password immediately using this link: {{resetLink}}
2. Enable two-factor authentication in your account settings
3. Review recent account activity for unauthorized access
4. Monitor your accounts for suspicious activity
**Timeline:**
- Incident Detected: {{detectionDate}}
- Containment: {{containmentDate}}
- Investigation Complete: {{investigationDate}}
- User Notification: {{notificationDate}}
We sincerely apologize for this incident. Your security is our top priority, and we are committed to preventing future occurrences.
For questions or concerns, please contact our security team at {{supportEmail}} or call {{supportPhone}}.
Sincerely,
{{companyName}} Security Team`,
variables: ['userName', 'incidentDate', 'incidentDescription', 'dataTypes',
'containmentDate', 'resetLink', 'detectionDate', 'investigationDate',
'notificationDate', 'supportEmail', 'supportPhone', 'companyName']
}],
['REGULATOR_NOTIFICATION', {
audience: 'REGULATORS',
subject: 'Data Breach Notification - {{companyName}}',
body: `To: {{regulatorName}}
From: {{companyName}} Data Protection Officer
Date: {{notificationDate}}
Re: Personal Data Breach Notification (GDPR Article 33)
**1. Nature of the Breach:**
{{breachDescription}}
**2. Categories and Approximate Number of Data Subjects:**
- Data subjects affected: {{dataSubjectsCount}}
- Personal data records affected: {{recordsCount}}
**3. Categories of Personal Data Concerned:**
{{dataCategories}}
**4. Contact Point:**
{{dpoName}}, Data Protection Officer
Email: {{dpoEmail}}
Phone: {{dpoPhone}}
**5. Likely Consequences:**
{{consequences}}
**6. Measures Taken:**
{{mitigationMeasures}}
**7. Cross-Border Implications:**
{{crossBorderImpact}}
We remain available to provide additional information as needed.
Sincerely,
{{dpoName}}
Data Protection Officer`,
variables: ['regulatorName', 'companyName', 'notificationDate', 'breachDescription',
'dataSubjectsCount', 'recordsCount', 'dataCategories', 'dpoName',
'dpoEmail', 'dpoPhone', 'consequences', 'mitigationMeasures',
'crossBorderImpact']
}]
]);
}
async renderTemplate(
templateName: string,
variables: Record<string, string>
): Promise<string> {
const template = this.templates.get(templateName);
if (!template) throw new Error(`Template ${templateName} not found`);
let rendered = template.body;
for (const [key, value] of Object.entries(variables)) {
rendered = rendered.replace(new RegExp(`{{${key}}}`, 'g'), value);
}
return rendered;
}
}
Incident Timeline Tracker
// Real-time incident timeline tracking for post-mortem analysis
class IncidentTimelineTracker {
private db: Firestore;
constructor() {
this.db = new Firestore();
}
async logEvent(
incidentId: string,
event: string,
actor: string,
metadata?: any
): Promise<void> {
await this.db
.collection('incidents')
.doc(incidentId)
.collection('timeline')
.add({
event,
actor,
metadata: metadata || {},
timestamp: admin.firestore.FieldValue.serverTimestamp()
});
}
async getTimeline(incidentId: string): Promise<TimelineEvent[]> {
const snapshot = await this.db
.collection('incidents')
.doc(incidentId)
.collection('timeline')
.orderBy('timestamp', 'asc')
.get();
return snapshot.docs.map(doc => ({
timestamp: doc.data().timestamp.toDate(),
event: doc.data().event,
actor: doc.data().actor
}));
}
async calculateMTTD(incidentId: string): Promise<number> {
const timeline = await this.getTimeline(incidentId);
const detectionEvent = timeline.find(e => e.event.includes('DETECTED'));
const occurrenceEvent = timeline[0]; // First event is typically the attack
if (!detectionEvent || !occurrenceEvent) return -1;
return (detectionEvent.timestamp.getTime() - occurrenceEvent.timestamp.getTime()) / 1000 / 60; // minutes
}
async calculateMTTR(incidentId: string): Promise<number> {
const timeline = await this.getTimeline(incidentId);
const detectionEvent = timeline.find(e => e.event.includes('DETECTED'));
const recoveryEvent = timeline.find(e => e.event.includes('RECOVERED'));
if (!detectionEvent || !recoveryEvent) return -1;
return (recoveryEvent.timestamp.getTime() - detectionEvent.timestamp.getTime()) / 1000 / 60; // minutes
}
}
Conclusion: From Reactive to Proactive Security
Effective incident response transforms security from a cost center to a competitive advantage. Organizations with mature IR capabilities experience 58% lower breach costs, 77 days faster containment, and measurably higher customer trust scores.
This guide has equipped you with production-ready tools: automated detection engines that catch anomalies in minutes, containment orchestrators that isolate threats without business disruption, recovery systems that restore service with verification checkpoints, and post-mortem generators that turn failures into organizational learning.
Key takeaways:
- Preparation is everything - Runbooks, team structures, and automation built before incidents occur reduce response time by 10x
- Automation scales expertise - Detection engines and containment orchestrators execute expert-level responses faster than manual processes
- Incidents are learning opportunities - Systematic post-mortems with Five Whys analysis prevent repeat failures
- Compliance is continuous - GDPR's 72-hour notification window requires always-ready response capabilities
Next steps:
- Implement the incident detection engine and configure baseline thresholds for your application
- Customize the incident response runbook with your team contacts, service account credentials, and escalation paths
- Deploy the containment orchestrator and test with simulated incidents (tabletop exercises)
- Schedule quarterly incident response drills to validate runbooks and train team members
- Integrate with existing monitoring (Google Cloud Monitoring, Sentry) and alerting (PagerDuty, Slack) systems
For comprehensive security implementation, explore our related guides on Data Encryption for ChatGPT Apps, Security Auditing and Logging, and ChatGPT App Security Best Practices.
Ready to build bulletproof incident response for your ChatGPT app? Start your free trial and deploy production-ready security in 48 hours with MakeAIHQ's automated compliance tools.
Built with the crisis management precision that Harold Finch would demand.