Content Moderation Integration for ChatGPT Apps

Content moderation is not optional—it's a critical requirement for responsible ChatGPT application deployment. Whether you're building a customer service bot, educational assistant, or healthcare application, implementing robust content moderation protects your users, your brand, and your legal standing.

The stakes are high. A single unmoderated conversation can expose minors to inappropriate content, violate HIPAA regulations, or create liability under GDPR. Reactive moderation—responding to violations after they occur—leaves your application vulnerable to harm and regulatory penalties. Proactive moderation, implemented correctly, prevents these scenarios before users encounter them.

This guide provides production-ready implementations for content moderation in ChatGPT applications. You'll learn how to integrate OpenAI's Moderation API, build custom filters for industry-specific requirements, validate user inputs, filter AI-generated responses, and maintain comprehensive compliance logs. Each code example is battle-tested and ready for deployment.

Content moderation operates at multiple layers: input validation (preventing harmful prompts), response filtering (catching unsafe AI outputs), and audit logging (maintaining compliance records). The most effective moderation systems combine automated detection with human review workflows, creating defense-in-depth that adapts to evolving threats.

For ChatGPT applications specifically, moderation must account for multi-turn conversations, context windows, and the stochastic nature of language models. A prompt that appears benign in isolation may become problematic when combined with conversation history. Your moderation system must understand this nuance while maintaining sub-100ms response times to preserve user experience.

OpenAI Moderation API Integration

The OpenAI Moderation API is your first line of defense, providing real-time classification across seven categories: hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. The API returns confidence scores for each category, allowing you to set custom thresholds based on your application's risk tolerance.

The Moderation API offers exceptional performance characteristics: 40ms average latency, 99.9% uptime, and no additional cost beyond your existing OpenAI usage. Unlike custom machine learning models that require training data and maintenance, the Moderation API is pre-trained and continuously updated by OpenAI's Trust & Safety team.

Implementation requires two critical decisions: threshold configuration and violation handling. Conservative thresholds (flagging scores above 0.5) minimize false negatives but may reject legitimate content. Permissive thresholds (0.8+) reduce false positives but risk allowing harmful content. Most production applications use category-specific thresholds: strict for sexual/minors (0.3), moderate for hate (0.6), and permissive for violence in gaming contexts (0.8).

Here's a production-ready TypeScript implementation that integrates the Moderation API with proper error handling, caching, and logging:

import OpenAI from 'openai';

interface ModerationResult {
  flagged: boolean;
  categories: {
    [key: string]: boolean;
  };
  categoryScores: {
    [key: string]: number;
  };
  violationDetails?: string;
}

interface ModerationConfig {
  thresholds: {
    hate: number;
    'hate/threatening': number;
    'self-harm': number;
    sexual: number;
    'sexual/minors': number;
    violence: number;
    'violence/graphic': number;
  };
  cacheEnabled: boolean;
  cacheTTL: number; // seconds
}

class ContentModerator {
  private openai: OpenAI;
  private config: ModerationConfig;
  private cache: Map<string, { result: ModerationResult; timestamp: number }>;

  constructor(apiKey: string, config?: Partial<ModerationConfig>) {
    this.openai = new OpenAI({ apiKey });

    // Default configuration with conservative thresholds
    this.config = {
      thresholds: {
        hate: 0.6,
        'hate/threatening': 0.5,
        'self-harm': 0.4,
        sexual: 0.7,
        'sexual/minors': 0.3, // Strictest threshold
        violence: 0.7,
        'violence/graphic': 0.6,
      },
      cacheEnabled: true,
      cacheTTL: 3600, // 1 hour
      ...config,
    };

    this.cache = new Map();
  }

  async moderateContent(content: string): Promise<ModerationResult> {
    // Check cache first
    if (this.config.cacheEnabled) {
      const cached = this.getCached(content);
      if (cached) return cached;
    }

    try {
      const response = await this.openai.moderations.create({
        input: content,
      });

      const result = response.results[0];
      const moderationResult: ModerationResult = {
        flagged: false,
        categories: result.categories,
        categoryScores: result.category_scores,
      };

      // Apply custom thresholds
      const violations: string[] = [];

      for (const [category, score] of Object.entries(result.category_scores)) {
        const threshold = this.config.thresholds[category as keyof typeof this.config.thresholds];

        if (score > threshold) {
          moderationResult.flagged = true;
          violations.push(`${category} (score: ${score.toFixed(3)})`);
        }
      }

      if (moderationResult.flagged) {
        moderationResult.violationDetails = violations.join(', ');
      }

      // Cache result
      if (this.config.cacheEnabled) {
        this.setCached(content, moderationResult);
      }

      return moderationResult;

    } catch (error) {
      console.error('Moderation API error:', error);

      // Fail closed: flag content as potentially unsafe
      return {
        flagged: true,
        categories: {},
        categoryScores: {},
        violationDetails: 'Moderation service unavailable (fail-safe mode)',
      };
    }
  }

  private getCached(content: string): ModerationResult | null {
    const cacheKey = this.hashContent(content);
    const cached = this.cache.get(cacheKey);

    if (!cached) return null;

    const isExpired = Date.now() - cached.timestamp > this.config.cacheTTL * 1000;
    if (isExpired) {
      this.cache.delete(cacheKey);
      return null;
    }

    return cached.result;
  }

  private setCached(content: string, result: ModerationResult): void {
    const cacheKey = this.hashContent(content);
    this.cache.set(cacheKey, {
      result,
      timestamp: Date.now(),
    });
  }

  private hashContent(content: string): string {
    // Simple hash for cache keys (use crypto.subtle.digest in production)
    let hash = 0;
    for (let i = 0; i < content.length; i++) {
      const char = content.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return hash.toString(36);
  }
}

export { ContentModerator, ModerationResult, ModerationConfig };

The moderation cache reduces API calls for repeated content (common in multi-turn conversations where users reference previous messages). The fail-closed approach ensures that moderation service outages default to safety rather than permissiveness.

For applications processing high volumes, implement rate limiting and batch moderation. The Moderation API supports up to 20 inputs per request, reducing latency for bulk operations. Monitor your moderation metrics: false positive rate, false negative rate (requires manual review), and average processing time.

Custom Filter Implementation

OpenAI's Moderation API provides broad coverage, but industry-specific applications require custom filters. Healthcare applications must detect Protected Health Information (PHI). Legal applications must prevent disclosure of privileged communications. Financial services must flag insider trading discussions. Educational platforms must enforce age-appropriate content policies beyond sexual/minors detection.

Custom filters complement the Moderation API with domain knowledge. While the API identifies general hate speech, a custom filter recognizes industry-specific slurs. While the API detects sexual content, a custom filter enforces your organization's acceptable use policy. The combination creates comprehensive coverage.

Effective custom filters balance three approaches: keyword blacklists (fast, prone to false positives), regex patterns (flexible, requires expertise), and context-aware detection (accurate, computationally expensive). Production systems typically use all three in a cascading pipeline: keyword screening eliminates obvious violations, regex catches pattern-based violations, and context-aware analysis handles nuanced cases.

Here's a production-ready custom filter engine with support for multiple filter types and context-aware detection:

interface FilterRule {
  id: string;
  type: 'keyword' | 'regex' | 'context';
  pattern: string | RegExp;
  category: string;
  severity: 'low' | 'medium' | 'high' | 'critical';
  contextRequired?: boolean;
}

interface FilterResult {
  flagged: boolean;
  matchedRules: Array<{
    ruleId: string;
    category: string;
    severity: string;
    matchedText: string;
    context?: string;
  }>;
  score: number; // 0-100, aggregated severity
}

class CustomFilterEngine {
  private rules: FilterRule[];
  private contextWindow: number = 500; // characters

  constructor(rules: FilterRule[]) {
    this.rules = rules;
  }

  filterContent(content: string, conversationHistory?: string[]): FilterResult {
    const result: FilterResult = {
      flagged: false,
      matchedRules: [],
      score: 0,
    };

    // Build context from conversation history
    const context = conversationHistory
      ? conversationHistory.join(' ').slice(-this.contextWindow)
      : '';

    // Apply keyword filters first (fastest)
    const keywordMatches = this.applyKeywordFilters(content);
    result.matchedRules.push(...keywordMatches);

    // Apply regex filters
    const regexMatches = this.applyRegexFilters(content);
    result.matchedRules.push(...regexMatches);

    // Apply context-aware filters if context available
    if (context) {
      const contextMatches = this.applyContextFilters(content, context);
      result.matchedRules.push(...contextMatches);
    }

    // Calculate aggregate score
    result.score = this.calculateSeverityScore(result.matchedRules);
    result.flagged = result.score > 50 || result.matchedRules.some(m => m.severity === 'critical');

    return result;
  }

  private applyKeywordFilters(content: string): FilterResult['matchedRules'] {
    const matches: FilterResult['matchedRules'] = [];
    const normalizedContent = content.toLowerCase();

    const keywordRules = this.rules.filter(r => r.type === 'keyword');

    for (const rule of keywordRules) {
      const keyword = (rule.pattern as string).toLowerCase();

      if (normalizedContent.includes(keyword)) {
        matches.push({
          ruleId: rule.id,
          category: rule.category,
          severity: rule.severity,
          matchedText: keyword,
        });
      }
    }

    return matches;
  }

  private applyRegexFilters(content: string): FilterResult['matchedRules'] {
    const matches: FilterResult['matchedRules'] = [];

    const regexRules = this.rules.filter(r => r.type === 'regex');

    for (const rule of regexRules) {
      const regex = rule.pattern instanceof RegExp
        ? rule.pattern
        : new RegExp(rule.pattern as string, 'gi');

      const regexMatches = content.match(regex);

      if (regexMatches) {
        matches.push({
          ruleId: rule.id,
          category: rule.category,
          severity: rule.severity,
          matchedText: regexMatches[0],
        });
      }
    }

    return matches;
  }

  private applyContextFilters(content: string, context: string): FilterResult['matchedRules'] {
    const matches: FilterResult['matchedRules'] = [];

    const contextRules = this.rules.filter(r => r.type === 'context' && r.contextRequired);

    for (const rule of contextRules) {
      const pattern = rule.pattern as string;
      const contentMatch = content.toLowerCase().includes(pattern.toLowerCase());
      const contextMatch = context.toLowerCase().includes(pattern.toLowerCase());

      if (contentMatch && contextMatch) {
        matches.push({
          ruleId: rule.id,
          category: rule.category,
          severity: rule.severity,
          matchedText: pattern,
          context: context.slice(0, 100) + '...', // Truncate for logging
        });
      }
    }

    return matches;
  }

  private calculateSeverityScore(matches: FilterResult['matchedRules']): number {
    const severityWeights = {
      low: 10,
      medium: 30,
      high: 60,
      critical: 100,
    };

    const totalScore = matches.reduce((sum, match) => {
      return sum + severityWeights[match.severity as keyof typeof severityWeights];
    }, 0);

    // Cap at 100
    return Math.min(totalScore, 100);
  }

  addRule(rule: FilterRule): void {
    this.rules.push(rule);
  }

  removeRule(ruleId: string): void {
    this.rules = this.rules.filter(r => r.id !== ruleId);
  }

  updateRule(ruleId: string, updates: Partial<FilterRule>): void {
    const ruleIndex = this.rules.findIndex(r => r.id === ruleId);
    if (ruleIndex !== -1) {
      this.rules[ruleIndex] = { ...this.rules[ruleIndex], ...updates };
    }
  }
}

// Example: HIPAA-compliant filter rules
const hipaaFilterRules: FilterRule[] = [
  {
    id: 'ssn-pattern',
    type: 'regex',
    pattern: /\b\d{3}-\d{2}-\d{4}\b/,
    category: 'phi',
    severity: 'critical',
  },
  {
    id: 'medical-record-number',
    type: 'regex',
    pattern: /\b(?:MRN|medical record|patient id)[\s:]+\w+/gi,
    category: 'phi',
    severity: 'critical',
  },
  {
    id: 'diagnosis-disclosure',
    type: 'keyword',
    pattern: 'diagnosed with',
    category: 'phi',
    severity: 'high',
    contextRequired: true,
  },
];

export { CustomFilterEngine, FilterRule, FilterResult, hipaaFilterRules };

This engine supports dynamic rule management, allowing you to add industry-specific filters without redeployment. The severity scoring system aggregates multiple low-severity matches into actionable signals (e.g., three "medium" violations = block).

For healthcare applications, extend the HIPAA rules with medication names, procedure codes, and facility identifiers. For legal applications, add case numbers, client identifiers, and privileged communication markers. For financial services, include account numbers, transaction details, and material non-public information patterns.

User Input Validation

Input validation is your first opportunity to prevent harmful interactions. Pre-submission validation provides immediate feedback, reducing frustration and educating users about acceptable content. Effective validation combines real-time client-side checks (instant feedback) with authoritative server-side enforcement (security boundary).

Client-side validation should be lightweight: character count limits, prohibited pattern detection, and basic profanity filtering. Server-side validation performs comprehensive moderation: OpenAI Moderation API, custom filters, and context analysis. Never rely solely on client-side validation—it's trivially bypassed by malicious users.

Progressive disclosure improves user experience when violations occur. Instead of generic "Content blocked" messages, explain the specific violation: "Your message contains medical information. Please remove personally identifiable details." Provide examples of acceptable alternatives. For repeat violations, escalate messaging and consider temporary restrictions.

Here's a production-ready input validator with client-side checks and server-side enforcement:

interface ValidationResult {
  valid: boolean;
  errors: Array<{
    field: string;
    code: string;
    message: string;
    suggestion?: string;
  }>;
  warnings: Array<{
    field: string;
    code: string;
    message: string;
  }>;
}

class InputValidator {
  private moderator: ContentModerator;
  private customFilter: CustomFilterEngine;
  private maxLength: number = 4000; // Match GPT-4 context limits
  private minLength: number = 1;

  constructor(moderator: ContentModerator, customFilter: CustomFilterEngine) {
    this.moderator = moderator;
    this.customFilter = customFilter;
  }

  async validateInput(
    userInput: string,
    conversationHistory?: string[]
  ): Promise<ValidationResult> {
    const result: ValidationResult = {
      valid: true,
      errors: [],
      warnings: [],
    };

    // Client-side validations (fast)
    this.validateLength(userInput, result);
    this.validateEncoding(userInput, result);
    this.validateRepetition(userInput, result);

    // Server-side moderation (authoritative)
    const moderationResult = await this.moderator.moderateContent(userInput);

    if (moderationResult.flagged) {
      result.valid = false;
      result.errors.push({
        field: 'content',
        code: 'CONTENT_VIOLATION',
        message: `Content flagged for policy violation: ${moderationResult.violationDetails}`,
        suggestion: 'Please rephrase your message to comply with our content policy.',
      });
    }

    // Custom filter check
    const filterResult = this.customFilter.filterContent(userInput, conversationHistory);

    if (filterResult.flagged) {
      result.valid = false;

      filterResult.matchedRules.forEach(match => {
        result.errors.push({
          field: 'content',
          code: `FILTER_${match.category.toUpperCase()}`,
          message: `Content contains prohibited ${match.category}: "${match.matchedText}"`,
          suggestion: this.getSuggestionForCategory(match.category),
        });
      });
    }

    return result;
  }

  private validateLength(input: string, result: ValidationResult): void {
    if (input.length < this.minLength) {
      result.valid = false;
      result.errors.push({
        field: 'content',
        code: 'TOO_SHORT',
        message: `Input must be at least ${this.minLength} character(s)`,
      });
    }

    if (input.length > this.maxLength) {
      result.valid = false;
      result.errors.push({
        field: 'content',
        code: 'TOO_LONG',
        message: `Input exceeds maximum length of ${this.maxLength} characters`,
        suggestion: 'Please shorten your message or split it into multiple parts.',
      });
    }
  }

  private validateEncoding(input: string, result: ValidationResult): void {
    // Detect suspicious character patterns (base64, hex dumps, etc.)
    const base64Pattern = /^[A-Za-z0-9+/]{50,}={0,2}$/;
    const hexPattern = /^(0x)?[0-9A-Fa-f]{100,}$/;

    if (base64Pattern.test(input.trim()) || hexPattern.test(input.trim())) {
      result.warnings.push({
        field: 'content',
        code: 'SUSPICIOUS_ENCODING',
        message: 'Input appears to be encoded data. Please use plain text.',
      });
    }
  }

  private validateRepetition(input: string, result: ValidationResult): void {
    // Detect excessive character/word repetition (spam indicator)
    const charRepetition = /(.)\1{20,}/;
    const wordRepetition = /\b(\w+)\s+\1(\s+\1){5,}/i;

    if (charRepetition.test(input)) {
      result.valid = false;
      result.errors.push({
        field: 'content',
        code: 'EXCESSIVE_REPETITION',
        message: 'Input contains excessive character repetition',
        suggestion: 'Please provide meaningful content without repeated characters.',
      });
    }

    if (wordRepetition.test(input)) {
      result.warnings.push({
        field: 'content',
        code: 'REPEATED_WORDS',
        message: 'Input contains repeated words',
      });
    }
  }

  private getSuggestionForCategory(category: string): string {
    const suggestions: { [key: string]: string } = {
      phi: 'Please remove any personally identifiable health information such as names, medical record numbers, or diagnoses.',
      pii: 'Please remove personally identifiable information such as social security numbers, addresses, or phone numbers.',
      profanity: 'Please rephrase your message without profanity.',
      threat: 'Please rephrase your message without threatening language.',
    };

    return suggestions[category] || 'Please rephrase your message to comply with our policies.';
  }
}

export { InputValidator, ValidationResult };

This validator provides three-tier feedback: blocking errors (prevent submission), warnings (allow submission with notification), and suggestions (guide users toward compliance). The encoding and repetition checks defend against prompt injection attacks and spam.

Implement rate limiting alongside input validation. Users who trigger multiple violations within a short timeframe may be attempting to probe your moderation system. Exponential backoff (1 minute, 5 minutes, 15 minutes) discourages adversarial behavior while allowing legitimate users to recover from mistakes.

Response Filtering

AI-generated responses require moderation just as user inputs do. Language models occasionally produce outputs that violate content policies despite safe prompts. Response filtering catches these edge cases before users see them, maintaining consistent safety standards.

Response filtering operates differently from input validation. While user inputs are adversarial (users may intentionally violate policies), model outputs are stochastic (violations are unintentional artifacts of the generation process). Your response filter should err toward allowing borderline content from the model while blocking clear violations.

Implement fallback responses for filtered content. Instead of showing an error, provide a generic response that maintains conversation flow: "I don't have enough information to answer that safely. Could you rephrase your question?" This prevents user confusion and reduces the incentive to probe for unsafe outputs.

Here's a production-ready response filter with fallback handling and detailed logging:

interface ResponseFilterResult {
  allowed: boolean;
  filteredResponse?: string;
  originalResponse: string;
  violationDetails?: string;
  fallbackUsed: boolean;
}

class ResponseFilter {
  private moderator: ContentModerator;
  private customFilter: CustomFilterEngine;
  private fallbackResponses: { [category: string]: string[] };

  constructor(
    moderator: ContentModerator,
    customFilter: CustomFilterEngine
  ) {
    this.moderator = moderator;
    this.customFilter = customFilter;

    this.fallbackResponses = {
      default: [
        "I apologize, but I need more context to provide a helpful response. Could you rephrase your question?",
        "I don't have enough information to answer that safely. Can you provide more details?",
        "Let me clarify your question to ensure I provide accurate information.",
      ],
      phi: [
        "I cannot discuss specific medical information. Please consult with a licensed healthcare provider.",
        "For medical advice, please contact your healthcare provider directly.",
      ],
      legal: [
        "I cannot provide legal advice. Please consult with a licensed attorney.",
        "For legal matters, please contact a qualified legal professional.",
      ],
    };
  }

  async filterResponse(
    response: string,
    userPrompt: string,
    conversationHistory?: string[]
  ): Promise<ResponseFilterResult> {
    const result: ResponseFilterResult = {
      allowed: true,
      originalResponse: response,
      fallbackUsed: false,
    };

    // Apply OpenAI Moderation API
    const moderationResult = await this.moderator.moderateContent(response);

    if (moderationResult.flagged) {
      result.allowed = false;
      result.violationDetails = moderationResult.violationDetails;
      result.filteredResponse = this.getFallbackResponse('default');
      result.fallbackUsed = true;

      // Log for manual review
      this.logViolation({
        type: 'response_moderation',
        response,
        userPrompt,
        violation: moderationResult.violationDetails,
        timestamp: new Date().toISOString(),
      });
    }

    // Apply custom filters
    const filterResult = this.customFilter.filterContent(response, conversationHistory);

    if (filterResult.flagged) {
      result.allowed = false;

      const primaryCategory = filterResult.matchedRules[0]?.category || 'default';
      result.filteredResponse = this.getFallbackResponse(primaryCategory);
      result.fallbackUsed = true;

      result.violationDetails = filterResult.matchedRules
        .map(m => `${m.category}: ${m.matchedText}`)
        .join(', ');

      // Log for manual review
      this.logViolation({
        type: 'response_custom_filter',
        response,
        userPrompt,
        violation: result.violationDetails,
        matchedRules: filterResult.matchedRules,
        timestamp: new Date().toISOString(),
      });
    }

    return result;
  }

  private getFallbackResponse(category: string): string {
    const responses = this.fallbackResponses[category] || this.fallbackResponses.default;

    // Randomize to avoid repetitive fallback messages
    const randomIndex = Math.floor(Math.random() * responses.length);
    return responses[randomIndex];
  }

  private logViolation(violation: any): void {
    // In production, send to logging service (Datadog, CloudWatch, etc.)
    console.error('[RESPONSE_FILTER_VIOLATION]', JSON.stringify(violation, null, 2));

    // Increment metrics for monitoring
    this.incrementMetric('response_filter.violations', {
      type: violation.type,
    });
  }

  private incrementMetric(metricName: string, tags: { [key: string]: string }): void {
    // Placeholder for metrics service integration
    // Example: statsd.increment(metricName, tags);
  }

  addFallbackResponse(category: string, response: string): void {
    if (!this.fallbackResponses[category]) {
      this.fallbackResponses[category] = [];
    }

    this.fallbackResponses[category].push(response);
  }
}

export { ResponseFilter, ResponseFilterResult };

This response filter maintains user experience through contextual fallback responses while logging all violations for manual review. The randomized fallback selection prevents users from identifying filtered responses through repetitive messaging.

Monitor your response filter metrics: violation rate by category, fallback usage frequency, and false positive reports. A high violation rate (>5%) suggests your system prompt needs refinement. A high false positive rate indicates your thresholds are too strict.

For applications with strict compliance requirements, implement human-in-the-loop review. Flag filtered responses for manual review within 24 hours. If review determines the filter was incorrect, add the case to your false positive training set and adjust thresholds accordingly.

Compliance and Reporting

Comprehensive audit logging transforms content moderation from a defensive measure into a compliance asset. Regulators, legal teams, and trust & safety investigators require detailed records of moderation decisions. Your logging system must capture sufficient detail for investigation while protecting user privacy.

Compliance logging follows data retention policies specific to your jurisdiction and industry. GDPR requires the ability to delete user data on request (right to erasure). HIPAA requires retention for six years. Financial services regulations require seven years. Your logging architecture must support both compliance obligations simultaneously.

Effective audit logs capture: timestamp, user identifier (anonymized if required), content hash (not plaintext), moderation decision, confidence scores, reviewing system (API vs custom filter), and resolution (blocked, allowed, manual review). Never log raw user content in compliance systems—use content hashes for correlation without exposing sensitive data.

Here's a production-ready compliance logger with privacy-preserving hashing and structured reporting:

import crypto from 'crypto';

interface ModerationEvent {
  eventId: string;
  timestamp: string;
  userId: string; // Anonymized if required
  contentHash: string; // SHA-256 of content
  contentType: 'user_input' | 'ai_response';
  moderationDecision: 'allowed' | 'blocked' | 'manual_review';
  moderationSources: Array<{
    source: 'openai_api' | 'custom_filter';
    flagged: boolean;
    categories: string[];
    scores: { [category: string]: number };
  }>;
  action: 'displayed' | 'blocked' | 'fallback_shown';
  metadata?: {
    conversationId?: string;
    sessionId?: string;
    ipAddress?: string; // Hash if required by GDPR
    userAgent?: string;
  };
}

class ComplianceLogger {
  private retentionPeriodDays: number;
  private anonymizeUsers: boolean;
  private events: ModerationEvent[] = []; // In production, write to database

  constructor(config: {
    retentionPeriodDays: number;
    anonymizeUsers: boolean;
  }) {
    this.retentionPeriodDays = config.retentionPeriodDays;
    this.anonymizeUsers = config.anonymizeUsers;
  }

  logModerationEvent(event: Omit<ModerationEvent, 'eventId' | 'timestamp'>): void {
    const fullEvent: ModerationEvent = {
      eventId: this.generateEventId(),
      timestamp: new Date().toISOString(),
      ...event,
      userId: this.anonymizeUsers ? this.hashUserId(event.userId) : event.userId,
    };

    // In production, write to durable storage
    this.events.push(fullEvent);

    // Also write to real-time monitoring
    this.sendToMonitoring(fullEvent);
  }

  async generateComplianceReport(
    startDate: Date,
    endDate: Date,
    userId?: string
  ): Promise<ComplianceReport> {
    const events = this.events.filter(e => {
      const eventDate = new Date(e.timestamp);
      const inRange = eventDate >= startDate && eventDate <= endDate;
      const matchesUser = !userId || e.userId === this.hashUserId(userId);

      return inRange && matchesUser;
    });

    const report: ComplianceReport = {
      periodStart: startDate.toISOString(),
      periodEnd: endDate.toISOString(),
      totalEvents: events.length,
      eventsByDecision: this.groupByDecision(events),
      eventsBySource: this.groupBySource(events),
      topCategories: this.getTopCategories(events),
      violationTrends: this.calculateTrends(events),
    };

    return report;
  }

  private hashContent(content: string): string {
    return crypto.createHash('sha256').update(content).digest('hex');
  }

  private hashUserId(userId: string): string {
    // Use HMAC for user ID hashing to enable lookup while protecting privacy
    const secret = process.env.USER_ID_HASH_SECRET || 'change-in-production';
    return crypto.createHmac('sha256', secret).update(userId).digest('hex');
  }

  private generateEventId(): string {
    return `mod_${Date.now()}_${crypto.randomBytes(8).toString('hex')}`;
  }

  private sendToMonitoring(event: ModerationEvent): void {
    // In production, send to monitoring service
    console.log('[MODERATION_EVENT]', {
      decision: event.moderationDecision,
      contentType: event.contentType,
      sources: event.moderationSources.length,
    });
  }

  private groupByDecision(events: ModerationEvent[]): {
    [decision: string]: number;
  } {
    return events.reduce((acc, event) => {
      acc[event.moderationDecision] = (acc[event.moderationDecision] || 0) + 1;
      return acc;
    }, {} as { [decision: string]: number });
  }

  private groupBySource(events: ModerationEvent[]): {
    [source: string]: number;
  } {
    const sourceCount: { [source: string]: number } = {};

    events.forEach(event => {
      event.moderationSources.forEach(source => {
        if (source.flagged) {
          sourceCount[source.source] = (sourceCount[source.source] || 0) + 1;
        }
      });
    });

    return sourceCount;
  }

  private getTopCategories(events: ModerationEvent[]): Array<{
    category: string;
    count: number;
  }> {
    const categoryCount: { [category: string]: number } = {};

    events.forEach(event => {
      event.moderationSources.forEach(source => {
        source.categories.forEach(category => {
          categoryCount[category] = (categoryCount[category] || 0) + 1;
        });
      });
    });

    return Object.entries(categoryCount)
      .map(([category, count]) => ({ category, count }))
      .sort((a, b) => b.count - a.count)
      .slice(0, 10);
  }

  private calculateTrends(events: ModerationEvent[]): {
    [date: string]: number;
  } {
    const dailyCounts: { [date: string]: number } = {};

    events.forEach(event => {
      const date = event.timestamp.split('T')[0]; // YYYY-MM-DD
      dailyCounts[date] = (dailyCounts[date] || 0) + 1;
    });

    return dailyCounts;
  }
}

interface ComplianceReport {
  periodStart: string;
  periodEnd: string;
  totalEvents: number;
  eventsByDecision: { [decision: string]: number };
  eventsBySource: { [source: string]: number };
  topCategories: Array<{ category: string; count: number }>;
  violationTrends: { [date: string]: number };
}

export { ComplianceLogger, ModerationEvent, ComplianceReport };

This logger uses HMAC for user ID hashing, enabling correlation across events while protecting user privacy. The compliance report generator supports regulatory audits with pre-aggregated statistics that don't require accessing raw logs.

Implement automated data retention enforcement. Schedule daily jobs that purge events older than your retention period. For GDPR compliance, implement user data deletion endpoints that remove all moderation events associated with a user ID (using the HMAC hash for lookup).

Create monitoring dashboards that track moderation KPIs: daily violation rate, category distribution, false positive rate (requires manual review integration), and average moderation latency. Set alerts for anomalies: sudden spikes in violations suggest adversarial attacks, sudden drops suggest filter failures.

Incident Response and Appeals

Even perfect moderation systems generate false positives. Users will appeal blocked content, claiming legitimate use cases. Your incident response process must balance user satisfaction with safety standards.

Effective appeals processes have three components: structured submission (users explain why content should be allowed), expert review (trained moderators evaluate context), and transparent communication (users receive detailed explanations of decisions). Most organizations resolve appeals within 24-48 hours for standard cases, 1 hour for time-sensitive business use.

Track appeal outcomes to improve your filters. If 30% of appeals for a specific rule are upheld, the rule is too strict. If 0% are upheld, users may not understand the policy. Target 5-10% appeal approval rate as a signal of appropriate threshold calibration.

Conclusion

Content moderation is not a feature—it's a fundamental requirement for responsible AI deployment. The implementation strategies in this guide provide production-ready components for comprehensive moderation: OpenAI's Moderation API for broad threat coverage, custom filters for industry-specific compliance, input validation for user education, response filtering for model safety, and compliance logging for regulatory requirements.

Effective moderation systems are never "done." They require continuous refinement based on new threats, user feedback, and regulatory changes. Allocate 20% of your moderation development time to ongoing improvement: threshold tuning, rule updates, and false positive remediation.

Ready to build ChatGPT applications with enterprise-grade content moderation? MakeAIHQ provides production-ready moderation templates, compliance-ready logging, and industry-specific filter libraries for healthcare, legal, financial services, and education. Our AI Conversational Editor generates moderation-compliant ChatGPT apps in minutes—no coding required.

Start your free trial and deploy safe, compliant ChatGPT applications today. Your users, your legal team, and your regulators will thank you.


Related Resources


About MakeAIHQ: We're building the "Shopify of ChatGPT Apps" - the only no-code platform that transforms business ideas into ChatGPT App Store submissions in 48 hours. Trusted by healthcare providers, legal professionals, and enterprise teams for compliant, production-ready ChatGPT applications.