ChatGPT App Testing & QA: Unit Tests, Integration Tests & MCP Inspector

Testing is the difference between a ChatGPT app that works perfectly in development and one that crashes in production. With OpenAI's increasingly strict approval standards, comprehensive QA isn't optional—it's mandatory for first-submission approval.

This guide covers every testing methodology you need to ship production-ready ChatGPT apps: from unit testing individual tools to end-to-end testing complete conversation flows using MCP Inspector.

Part 1: ChatGPT App Testing Fundamentals

What Makes ChatGPT App Testing Different?

ChatGPT apps face three unique testing challenges traditional web apps don't:

1. Non-Deterministic AI Behavior OpenAI's language model produces different tool calls for the same user input depending on context, conversation history, and model variance. Your testing strategy must account for this unpredictability.

2. Stateful Multi-Turn Conversations ChatGPT maintains conversation history across multiple user messages. A tool call in message 5 might depend on context from messages 1-4. Testing must validate entire conversation flows, not just individual API calls.

3. OpenAI Approval Requirements Unlike traditional apps, ChatGPT apps face manual human review by OpenAI. Your QA process must verify compliance with 12+ technical requirements before submission.

Why MCP Inspector Is Essential

The Model Context Protocol (MCP) Inspector is your primary testing tool. It simulates how ChatGPT calls your MCP server, letting you validate tool definitions, test tool handlers, and catch errors before OpenAI reviews your app. According to the official Model Context Protocol specification, MCP Inspector provides the reference implementation for testing MCP servers before production deployment.

MCP Inspector replaces manual testing. Instead of manually triggering tools in ChatGPT, you can systematically test:

  • Tool registration and metadata
  • Handler error cases
  • Response format validation
  • Performance benchmarks (sub-4k token responses)
  • Widget rendering
  • OAuth token validation
  • Error recovery paths

Without MCP Inspector, you're guessing whether your app actually works. This testing gap is why 60% of ChatGPT app submissions are rejected on the first attempt.

MCP Inspector integrates with the OpenAI Apps SDK, giving you the exact same validation that OpenAI reviewers use during app approval. You catch compliance issues in minutes instead of waiting days for manual rejection.


Part 2: Unit Testing MCP Server Tools

Unit testing focuses on individual tool handlers in isolation—testing a single tool with various inputs to verify correct behavior.

Tool Handler Testing Strategy

Each tool in your MCP server should have unit tests covering:

1. Happy Path (Expected Behavior)

// Example: searchClasses tool for fitness studio app
describe('searchClasses tool', () => {
  it('should return available classes when querying by date and time', async () => {
    const handler = createToolHandler('searchClasses');

    const result = await handler({
      date: '2025-12-26',
      time: '10:00',
      classType: 'yoga'
    });

    expect(result).toHaveProperty('structuredContent');
    expect(result.structuredContent).toHaveProperty('classes');
    expect(result.structuredContent.classes.length).toBeGreaterThan(0);
    expect(result.structuredContent.classes[0]).toHaveProperty('id');
    expect(result.structuredContent.classes[0]).toHaveProperty('name');
    expect(result.structuredContent.classes[0]).toHaveProperty('availableSpots');
  });
});

2. Edge Cases (Boundary Conditions)

it('should handle queries with no available classes', async () => {
  const handler = createToolHandler('searchClasses');

  const result = await handler({
    date: '2025-01-01', // New Year, all classes full
    time: '15:00'
  });

  expect(result).toHaveProperty('structuredContent');
  expect(result.structuredContent.classes).toEqual([]);
  expect(result.content).toContain('No classes available');
});

it('should reject invalid date format', async () => {
  const handler = createToolHandler('searchClasses');

  expect(async () => {
    await handler({
      date: 'invalid-date',
      time: '10:00'
    });
  }).rejects.toThrow('Invalid date format');
});

3. Error Handling (Network Failures, Third-Party API Issues)

it('should gracefully handle Mindbody API timeout', async () => {
  // Mock Mindbody API to simulate timeout
  jest.spyOn(global, 'fetch').mockRejectedValueOnce(new Error('timeout'));

  const handler = createToolHandler('searchClasses');
  const result = await handler({
    date: '2025-12-26',
    time: '10:00'
  });

  expect(result).toHaveProperty('content');
  expect(result.content).toContain('unable to fetch available classes');
  expect(result).not.toThrow();
});

4. Performance Benchmarks

it('should return response in under 2 seconds', async () => {
  const handler = createToolHandler('searchClasses');

  const startTime = performance.now();
  await handler({
    date: '2025-12-26',
    time: '10:00'
  });
  const endTime = performance.now();

  expect(endTime - startTime).toBeLessThan(2000);
});

it('should keep structured content under 4000 tokens', async () => {
  const handler = createToolHandler('searchClasses');

  const result = await handler({
    date: '2025-12-26',
    time: '10:00'
  });

  // Rough token estimate (1 token ≈ 4 characters)
  const contentTokenCount = JSON.stringify(result.structuredContent).length / 4;
  expect(contentTokenCount).toBeLessThan(4000);
});

Tool Parameter Validation

Every tool must validate input parameters before processing. Unit tests should verify validation works:

it('should validate required parameters', async () => {
  const handler = createToolHandler('bookClass');

  // Missing required 'classId'
  expect(async () => {
    await handler({ userId: '123' });
  }).rejects.toThrow('Missing required parameter: classId');
});

it('should validate parameter types', async () => {
  const handler = createToolHandler('bookClass');

  // classId should be number, not string
  expect(async () => {
    await handler({
      classId: 'not-a-number',
      userId: '123'
    });
  }).rejects.toThrow('Parameter "classId" must be a number');
});

Test Coverage Targets

For production ChatGPT apps targeting OpenAI approval:

  • Minimum 80% code coverage for all tool handlers
  • 100% coverage for authentication/authorization logic
  • 100% coverage for error paths (no unhandled exceptions)

Use a code coverage tool to track:

npm test -- --coverage

# Expected output:
# Statements   : 85.3% ( 427/501 )
# Branches     : 82.1% ( 312/380 )
# Functions    : 87.4% ( 139/159 )
# Lines        : 85.9% ( 430/501 )

Part 3: Integration Testing with MCP Inspector

Integration tests verify that your MCP server works correctly as a complete system—all tools together, with real dependencies (databases, third-party APIs).

Setting Up MCP Inspector

MCP Inspector is the official testing tool for MCP servers. Install and configure it:

npm install -D @modelcontextprotocol/inspector

# Add to package.json scripts:
# "test:mcp": "mcp-inspector http://localhost:3000/mcp"

Running MCP Inspector

# Start your MCP server
npm run dev:server

# In another terminal, start MCP Inspector
npm run test:mcp

# MCP Inspector opens at http://localhost:5000
# You'll see an interactive interface to test your tools

MCP Inspector Testing Workflow

Step 1: Tool Discovery MCP Inspector automatically discovers all tools registered with your server. Verify each tool appears:

Available Tools:
✅ searchClasses
✅ bookClass
✅ cancelBooking
✅ getMembershipStatus

Step 2: Interactive Tool Testing Use the MCP Inspector UI to manually test each tool:

  1. Click on tool (e.g., "searchClasses")
  2. Enter input parameters in JSON format:
{
  "date": "2025-12-26",
  "time": "10:00",
  "classType": "yoga"
}
  1. Inspect full response:
{
  "structuredContent": {
    "classes": [
      { "id": 1, "name": "Morning Vinyasa", "spots": 5 }
    ]
  },
  "content": "Found 1 yoga class available",
  "_meta": { "executionTime": 245 }
}

Automated Integration Tests with MCP Inspector

// integration.test.js
const mcpClient = require('@modelcontextprotocol/client-node');

describe('MCP Server Integration Tests', () => {
  let client;

  beforeAll(async () => {
    client = new mcpClient({
      url: 'http://localhost:3000/mcp'
    });
    await client.connect();
  });

  describe('searchClasses tool', () => {
    it('should return proper MCP response structure', async () => {
      const response = await client.callTool('searchClasses', {
        date: '2025-12-26',
        time: '10:00'
      });

      // Verify MCP response structure
      expect(response).toHaveProperty('structuredContent');
      expect(response).toHaveProperty('content');
      expect(response).toHaveProperty('_meta');

      // Verify structuredContent is valid
      expect(response.structuredContent).toBeInstanceOf(Object);
      expect(response.content).toBeInstanceOf(String);
      expect(response._meta).toHaveProperty('executionTime');
    });

    it('should format response as valid HTML when requested', async () => {
      const response = await client.callTool('searchClasses', {
        date: '2025-12-26',
        time: '10:00',
        format: 'html'
      });

      // Response should be valid HTML that can render in ChatGPT
      expect(response.content).toMatch(/^<div|^<section|^<article/);
      expect(response.content).toContain('</');
    });
  });

  describe('Cross-Tool Workflows', () => {
    it('should handle booking flow: search → book → confirm', async () => {
      // Step 1: Search for classes
      const searchResponse = await client.callTool('searchClasses', {
        date: '2025-12-26',
        time: '10:00'
      });

      expect(searchResponse.structuredContent.classes.length).toBeGreaterThan(0);
      const classId = searchResponse.structuredContent.classes[0].id;

      // Step 2: Book the class
      const bookResponse = await client.callTool('bookClass', {
        classId: classId,
        userId: 'test-user-123'
      });

      expect(bookResponse.content).toContain('successfully booked');
      expect(bookResponse.structuredContent).toHaveProperty('confirmation');

      // Step 3: Verify booking by checking membership status
      const statusResponse = await client.callTool('getMembershipStatus', {
        userId: 'test-user-123'
      });

      expect(statusResponse.structuredContent.upcomingClasses).toContainEqual(
        expect.objectContaining({ id: classId })
      );
    });
  });

  afterAll(async () => {
    await client.disconnect();
  });
});

MCP Inspector Validation Checklist

Before submitting to OpenAI, validate these items in MCP Inspector:

## MCP Inspector Pre-Submission Checklist

- [ ] All tools appear in tool discovery list
- [ ] Each tool has accurate name and description
- [ ] Tool parameters match documentation
- [ ] Happy path returns valid MCP response structure
- [ ] Edge cases handled gracefully
- [ ] Error messages are user-friendly (not stack traces)
- [ ] Response tokens stay under 4000 token limit
- [ ] Response time under 2 seconds for all tools
- [ ] Widget/HTML responses render properly in ChatGPT UI
- [ ] No console errors in MCP Inspector logs
- [ ] Authenticated tools properly validate OAuth tokens

Part 4: End-to-End Testing

End-to-end (E2E) testing simulates real users interacting with your ChatGPT app through ChatGPT's interface. This is closest to how OpenAI will test your app during review.

E2E Testing Strategy

E2E tests verify complete user journeys:

Fitness Studio Booking E2E Test:

describe('Fitness Studio ChatGPT App - E2E', () => {
  it('member should book a yoga class through natural conversation', async () => {
    const conversation = new ChatGPTConversation({
      appId: 'fitness-studio-app'
    });

    // User asks to book a class
    const message1 = 'Can you help me book a yoga class tomorrow morning?';
    const response1 = await conversation.send(message1);

    // ChatGPT should call searchClasses tool
    expect(response1.toolCalls).toContainEqual(
      expect.objectContaining({
        name: 'searchClasses',
        params: expect.objectContaining({ classType: 'yoga' })
      })
    );

    // Response should show available classes
    expect(response1.content).toMatch(/available yoga classes/i);
    expect(response1.structuredContent.classes.length).toBeGreaterThan(0);

    // User selects a class
    const message2 = 'I want to book the 10am Vinyasa Flow class';
    const response2 = await conversation.send(message2);

    // ChatGPT should call bookClass tool
    expect(response2.toolCalls).toContainEqual(
      expect.objectContaining({ name: 'bookClass' })
    );

    // Verify booking confirmation
    expect(response2.content).toContain('successfully booked');
  });

  it('should handle booking conflicts gracefully', async () => {
    const conversation = new ChatGPTConversation({
      appId: 'fitness-studio-app',
      userId: 'user-with-conflict'
    });

    const message = 'Book me for the 10am yoga class on Friday';
    const response = await conversation.send(message);

    // Should handle conflict without crashing
    expect(response).not.toThrow();
    expect(response.content).toMatch(/conflict|already booked|unavailable/i);
  });
});

Real Conversation Testing with ngrok

For true E2E testing, deploy your MCP server with ngrok and test in actual ChatGPT:

# Install ngrok
npm install -g ngrok

# Start your MCP server locally
npm run dev:server

# In another terminal, expose to internet with ngrok
ngrok http 3000

# This generates a public HTTPS URL (expires after 2 hours):
# https://xxxx-xx-xxx-xxx-xx.ngrok.io

# Add to ChatGPT developer mode:
# 1. Open ChatGPT
# 2. Settings → Developer → Edit connectors
# 3. Paste ngrok URL: https://xxxx-xx-xxx-xxx-xx.ngrok.io/mcp
# 4. Test conversation in ChatGPT UI

Conversation Flow Testing Framework

Create a framework to test common conversation patterns:

// conversationFlows.test.js
const testFlows = {
  'happy_path_booking': {
    steps: [
      {
        userMessage: 'Book me a yoga class tomorrow at 10am',
        expectedToolCall: 'searchClasses',
        expectedResponse: /classes available|no classes/
      },
      {
        userMessage: 'I want the first option',
        expectedToolCall: 'bookClass',
        expectedResponse: /successfully booked|already booked/
      }
    ]
  },

  'error_recovery': {
    steps: [
      {
        userMessage: 'Book me for a class at an invalid time: 25:00',
        expectedError: true,
        expectedResponse: /invalid time|between 5am and 10pm/
      },
      {
        userMessage: 'How about 9am instead?',
        expectedToolCall: 'searchClasses',
        expectedResponse: /classes available/
      }
    ]
  }
};

describe('Conversation Flows', () => {
  Object.entries(testFlows).forEach(([flowName, flow]) => {
    it(`should handle ${flowName} correctly`, async () => {
      const conversation = new ChatGPTConversation();

      for (const step of flow.steps) {
        const response = await conversation.send(step.userMessage);

        if (step.expectedToolCall) {
          expect(response.toolCalls[0].name).toBe(step.expectedToolCall);
        }

        expect(response.content).toMatch(step.expectedResponse);
      }
    });
  });
});

Part 5: Performance Testing & Optimization

OpenAI has strict performance requirements. ChatGPT can't wait for slow API responses while users watch.

Response Time Benchmarks

Critical Performance Targets:

  • Tool execution: Under 2 seconds
  • Response formatting: Under 500ms
  • Widget rendering: Under 1 second
  • Total tool call to display: Under 3 seconds
describe('Performance Benchmarks', () => {
  it('searchClasses should complete in under 2 seconds', async () => {
    const iterations = 10;
    const times = [];

    for (let i = 0; i < iterations; i++) {
      const start = performance.now();
      await client.callTool('searchClasses', {
        date: '2025-12-26',
        time: '10:00'
      });
      const end = performance.now();
      times.push(end - start);
    }

    const avgTime = times.reduce((a, b) => a + b) / times.length;
    const maxTime = Math.max(...times);

    console.log(`Average: ${avgTime.toFixed(0)}ms, Max: ${maxTime.toFixed(0)}ms`);

    expect(avgTime).toBeLessThan(1500); // 1.5s average
    expect(maxTime).toBeLessThan(2000); // 2s max
  });
});

Load Testing (Spike Tolerance)

ChatGPT may send multiple simultaneous requests. Test your server's behavior under load:

describe('Load Testing', () => {
  it('should handle 10 concurrent requests', async () => {
    const promises = [];

    for (let i = 0; i < 10; i++) {
      promises.push(
        client.callTool('searchClasses', {
          date: '2025-12-26',
          time: '10:00'
        })
      );
    }

    const results = await Promise.all(promises);

    expect(results).toHaveLength(10);
    results.forEach(result => {
      expect(result).toHaveProperty('structuredContent');
    });
  });
});

Token Limit Validation

Every response must stay under 4000 tokens:

function estimateTokens(text) {
  // Rough estimate: 1 token ≈ 4 characters
  return Math.ceil(text.length / 4);
}

describe('Token Limits', () => {
  it('responses should stay under 4000 tokens', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26',
      time: '10:00'
    });

    const contentTokens = estimateTokens(JSON.stringify(response.structuredContent));
    expect(contentTokens).toBeLessThan(4000);
  });
});

Part 6: OpenAI Approval QA Checklist

Before submitting your ChatGPT app to OpenAI, run through this complete QA checklist:

Critical Compliance Checks

## OpenAI Approval QA Checklist

### Functionality
- [ ] All tools work correctly in MCP Inspector
- [ ] Happy path conversions complete end-to-end
- [ ] Error handling prevents crashes
- [ ] No unhandled exceptions in logs
- [ ] Performance under 2s per tool call

### MCP Protocol Compliance
- [ ] Tool metadata includes name, description, parameters
- [ ] Responses include structuredContent, content, _meta
- [ ] Widget responses use mimeType: "text/html+skybridge"
- [ ] No custom fonts (system fonts only)
- [ ] Max 2 primary CTAs per card

### OpenAI UX/UI Standards
- [ ] Inline widgets don't exceed 4000 tokens
- [ ] No nested scrolling in cards
- [ ] No more than 3 levels of navigation
- [ ] Contrast ratios meet WCAG AA standards
- [ ] Mobile responsive
- [ ] Alt text for all images

### Security & Auth
- [ ] OAuth 2.1 with PKCE properly implemented
- [ ] Access tokens verified on every request
- [ ] No API keys exposed in frontend
- [ ] HTTPS enforced
- [ ] CORS headers correct

### Data & Privacy
- [ ] Privacy policy linked
- [ ] GDPR compliant (if EU users)
- [ ] PII not logged
- [ ] Data retention policy clear
- [ ] User consent obtained for data collection

### Testing Coverage
- [ ] 80%+ code coverage
- [ ] All error paths tested
- [ ] MCP Inspector validates all tools
- [ ] E2E tests for main workflows
- [ ] Performance benchmarks met

Pre-Submission Testing Workflow

Week 1: Unit Testing

  • Write and run unit tests for all tools
  • Achieve 80%+ code coverage
  • Fix all test failures

Week 2: Integration Testing

  • Test with MCP Inspector
  • Test multi-step workflows
  • Load test under concurrent requests

Week 3: E2E Testing

  • Test in actual ChatGPT (via ngrok)
  • Run through all user journeys
  • Test edge cases and error scenarios

Week 4: OpenAI Compliance

  • Run through QA checklist
  • Fix any compliance issues
  • Create test report document

Week 5: Final Review

  • Have peer review test your app
  • Fix any discovered issues
  • Generate final test report

Part 7: Common Testing Pitfalls & Solutions

Pitfall 1: Only Testing Happy Path

Problem: Tests only validate the ideal scenario, missing edge cases that crash in production.

Solution: Test error cases systematically:

// Don't do this:
it('should search for classes', async () => {
  const result = await searchClasses({ date: '2025-12-26', time: '10:00' });
  expect(result.classes.length).toBeGreaterThan(0);
});

// Do this instead:
it('should search for classes when available', async () => { ... });
it('should return empty when no classes available', async () => { ... });
it('should handle invalid date format', async () => { ... });
it('should handle API timeout', async () => { ... });
it('should handle authentication error', async () => { ... });

Pitfall 2: Ignoring Token Limits

Problem: Response works in testing but exceeds 4000 tokens in production, breaking ChatGPT rendering.

Solution: Add token validation to every test:

const tokens = estimateTokens(JSON.stringify(response));
expect(tokens).toBeLessThan(4000);

Pitfall 3: Not Testing Structured Content Rendering

Problem: Data structure is valid JSON but doesn't render properly in ChatGPT UI.

Solution: Test actual widget rendering:

const rendered = renderWidget(response.structuredContent);
expect(rendered).not.toContain('[object Object]');
expect(rendered.querySelectorAll('button').length).toBeLessThanOrEqual(2);

Pitfall 4: Assuming ChatGPT Behavior

Problem: Assuming ChatGPT will always call your tools in expected order, breaking when model behavior changes.

Solution: Design tools to be order-independent:

// Wrong: Assumes getPreviousBooking called before cancelBooking
function cancelBooking(bookingId) { ... }

// Right: Booking ID is explicit parameter
function cancelBooking(bookingId) { ... }

Part 8: CI/CD Integration for Testing

Automate testing in your deployment pipeline:

# .github/workflows/test.yml
name: ChatGPT App CI/CD

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - uses: actions/setup-node@v3
        with:
          node-version: '20'

      - run: npm install

      - run: npm run test -- --coverage
        env:
          CI: true

      - run: npm run test:mcp

      - run: npm run build

      - name: Generate Test Report
        if: always()
        run: |
          echo "## Test Results" >> $GITHUB_STEP_SUMMARY
          echo "✅ Unit Tests: Passed" >> $GITHUB_STEP_SUMMARY
          echo "✅ Integration Tests: Passed" >> $GITHUB_STEP_SUMMARY
          echo "✅ Coverage: 85%" >> $GITHUB_STEP_SUMMARY

Part 9: Authentication & OAuth Testing

ChatGPT apps that require user authentication must implement OAuth 2.1 with PKCE. Testing OAuth flows is critical for security and user experience.

OAuth Token Validation Testing

describe('OAuth Token Validation', () => {
  it('should reject expired access tokens', async () => {
    const expiredToken = jwt.sign(
      { sub: 'user-123' },
      'secret',
      { expiresIn: '1s' }
    );

    // Wait for token to expire
    await new Promise(resolve => setTimeout(resolve, 1100));

    expect(async () => {
      await client.callTool('bookClass', {
        classId: 1,
        accessToken: expiredToken
      });
    }).rejects.toThrow('Token expired');
  });

  it('should verify token issuer and audience', async () => {
    const malformedToken = jwt.sign(
      {
        sub: 'user-123',
        iss: 'wrong-issuer',
        aud: 'wrong-audience'
      },
      'secret'
    );

    expect(async () => {
      await client.callTool('bookClass', {
        classId: 1,
        accessToken: malformedToken
      });
    }).rejects.toThrow('Invalid token issuer or audience');
  });

  it('should validate token signature', async () => {
    const tamperedToken = jwt.sign(
      { sub: 'user-456' }, // Different user
      'wrong-secret'
    );

    expect(async () => {
      await client.callTool('bookClass', {
        classId: 1,
        accessToken: tamperedToken
      });
    }).rejects.toThrow('Invalid token signature');
  });
});

OAuth Flow E2E Testing

describe('OAuth 2.1 PKCE Flow', () => {
  it('should complete authorization code flow with PKCE', async () => {
    // Step 1: Generate code verifier
    const codeVerifier = generateRandomString(128);
    const codeChallenge = base64UrlEncode(
      await crypto.subtle.digest('SHA-256', new TextEncoder().encode(codeVerifier))
    );

    // Step 2: Simulate user clicking "authorize" on OAuth provider
    const authCode = await simulateOAuthAuthorization({
      codeChallenge: codeChallenge,
      clientId: 'test-client-id'
    });

    expect(authCode).toBeDefined();

    // Step 3: Exchange code for token
    const tokenResponse = await exchangeOAuthCode({
      code: authCode,
      codeVerifier: codeVerifier,
      clientId: 'test-client-id'
    });

    expect(tokenResponse).toHaveProperty('access_token');
    expect(tokenResponse.access_token).toBeDefined();

    // Step 4: Use token to call protected tool
    const response = await client.callTool('bookClass', {
      classId: 1,
      accessToken: tokenResponse.access_token
    });

    expect(response.content).toContain('successfully booked');
  });

  it('should fail OAuth flow if code verifier invalid', async () => {
    const codeVerifier = generateRandomString(128);
    const codeChallenge = base64UrlEncode(
      await crypto.subtle.digest('SHA-256', new TextEncoder().encode(codeVerifier))
    );

    const authCode = await simulateOAuthAuthorization({
      codeChallenge: codeChallenge
    });

    // Try to exchange with wrong code verifier
    expect(async () => {
      await exchangeOAuthCode({
        code: authCode,
        codeVerifier: generateRandomString(128), // Different verifier
        clientId: 'test-client-id'
      });
    }).rejects.toThrow('Invalid code verifier');
  });
});

Testing Unauthenticated vs Authenticated Endpoints

describe('Authentication Requirements', () => {
  it('should allow searchClasses without authentication', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26',
      time: '10:00'
    });

    expect(response).toHaveProperty('structuredContent');
  });

  it('should require authentication for bookClass', async () => {
    expect(async () => {
      await client.callTool('bookClass', {
        classId: 1
        // Missing accessToken
      });
    }).rejects.toThrow('Authentication required');
  });

  it('should restrict bookings to authenticated user', async () => {
    const token1 = await getAuthToken('user-1');
    const token2 = await getAuthToken('user-2');

    // User 1 books a class
    await client.callTool('bookClass', {
      classId: 1,
      accessToken: token1
    });

    // User 2 tries to retrieve User 1's booking
    const bookings = await client.callTool('getUserBookings', {
      accessToken: token2
    });

    expect(bookings.structuredContent).not.toContainEqual(
      expect.objectContaining({ id: 1 })
    );
  });
});

Scopes and Permissions Testing

If your OAuth implementation uses scopes:

describe('OAuth Scopes', () => {
  it('should allow read-only operations with read scope', async () => {
    const readOnlyToken = await getAuthToken('user-123', ['read']);

    // Read operations should work
    const response = await client.callTool('searchClasses', {
      accessToken: readOnlyToken
    });
    expect(response).toHaveProperty('structuredContent');
  });

  it('should reject write operations without write scope', async () => {
    const readOnlyToken = await getAuthToken('user-123', ['read']);

    // Write operations should fail
    expect(async () => {
      await client.callTool('bookClass', {
        classId: 1,
        accessToken: readOnlyToken
      });
    }).rejects.toThrow('Insufficient permissions');
  });

  it('should allow all operations with admin scope', async () => {
    const adminToken = await getAuthToken('admin-123', ['admin']);

    // Both read and write should work
    const search = await client.callTool('searchClasses', {
      accessToken: adminToken
    });
    expect(search).toHaveProperty('structuredContent');

    const book = await client.callTool('bookClass', {
      classId: 1,
      accessToken: adminToken
    });
    expect(book.content).toContain('successfully booked');
  });
});

Part 10: Database & Persistence Testing

If your MCP server uses a database (Firebase, PostgreSQL, MongoDB), you need database-specific tests.

Testing Database Transactions

describe('Database Transactions', () => {
  it('should prevent double-booking of same class', async () => {
    const userId1 = 'user-1';
    const userId2 = 'user-2';
    const classId = 1;

    // Simulate concurrent booking attempts
    const [result1, result2] = await Promise.all([
      client.callTool('bookClass', {
        classId: classId,
        userId: userId1
      }),
      client.callTool('bookClass', {
        classId: classId,
        userId: userId2
      })
    ]);

    // One should succeed, one should fail
    const successCount = [result1, result2].filter(
      r => r.content.includes('successfully booked')
    ).length;

    expect(successCount).toBe(1); // Only one booking succeeded
  });

  it('should rollback partial bookings on error', async () => {
    const result = await client.callTool('bookClassAndChargeCard', {
      classId: 1,
      userId: 'user-with-invalid-card',
      cardToken: 'invalid-token'
    });

    // Payment should fail
    expect(result.content).toContain('payment failed');

    // Class should NOT be booked
    const bookings = await client.callTool('getUserBookings', {
      userId: 'user-with-invalid-card'
    });

    expect(bookings.structuredContent.bookings).not.toContainEqual(
      expect.objectContaining({ classId: 1 })
    );
  });
});

Database Consistency Testing

describe('Database Consistency', () => {
  it('should maintain referential integrity', async () => {
    // Book a class
    await client.callTool('bookClass', {
      classId: 1,
      userId: 'user-123'
    });

    // Delete the class
    await deleteClass(1);

    // User's booking should be cleaned up or show as deleted
    const bookings = await client.callTool('getUserBookings', {
      userId: 'user-123'
    });

    const classExists = bookings.structuredContent.bookings.some(
      b => b.classId === 1
    );

    expect(classExists).toBe(false);
  });

  it('should maintain data consistency under load', async () => {
    const promises = [];

    // 100 concurrent bookings
    for (let i = 0; i < 100; i++) {
      promises.push(
        client.callTool('bookClass', {
          classId: 1,
          userId: `user-${i}`
        })
      );
    }

    await Promise.all(promises);

    // Verify total bookings = 100 (or less if class capacity limited)
    const totalBookings = await getTotalBookingsForClass(1);
    expect(totalBookings).toBeLessThanOrEqual(100);
    expect(totalBookings).toBeGreaterThan(0);
  });
});

Part 11: Third-Party API Integration Testing

Most ChatGPT apps integrate with third-party APIs (Mindbody, Stripe, OpenTable, etc.). Testing these integrations is crucial.

API Integration Testing with Mocks

describe('Mindbody API Integration', () => {
  beforeEach(() => {
    // Mock Mindbody API responses
    jest.spyOn(mindbodyClient, 'getClasses').mockResolvedValue([
      {
        id: 1,
        name: 'Vinyasa Flow',
        instructorId: 5,
        startTime: '2025-12-26T10:00:00Z',
        capacity: 20,
        enrolled: 18
      }
    ]);
  });

  it('should map Mindbody API response to ChatGPT widget format', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26',
      time: '10:00'
    });

    expect(response.structuredContent.classes[0]).toEqual(
      expect.objectContaining({
        id: 1,
        name: 'Vinyasa Flow',
        instructor: expect.stringContaining('instructor'),
        availableSpots: 2 // 20 capacity - 18 enrolled
      })
    );
  });

  it('should handle Mindbody API errors gracefully', async () => {
    mindbodyClient.getClasses.mockRejectedValueOnce(
      new Error('API rate limit exceeded')
    );

    const response = await client.callTool('searchClasses', {
      date: '2025-12-26',
      time: '10:00'
    });

    expect(response.content).toContain('temporarily unavailable');
    expect(response.content).not.toContain('API rate limit'); // Don't expose internal errors
  });
});

Testing API Fallback Strategies

describe('API Fallback Strategies', () => {
  it('should use cached data if API fails', async () => {
    // First call succeeds and caches
    mindbodyClient.getClasses.mockResolvedValueOnce([...classData]);

    const response1 = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });
    expect(response1.structuredContent.classes.length).toBe(10);

    // Second call fails, should use cache
    mindbodyClient.getClasses.mockRejectedValueOnce(new Error('API down'));

    const response2 = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });
    expect(response2.structuredContent.classes.length).toBe(10); // Cached data
  });

  it('should indicate when data is stale', async () => {
    // Use cached data
    mindbodyClient.getClasses.mockRejectedValueOnce(new Error('API down'));

    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    expect(response._meta).toHaveProperty('isCached');
    expect(response._meta.isCached).toBe(true);
    expect(response.content).toContain('schedule may not be current');
  });
});


Part 12: Accessibility & Compliance Testing

OpenAI has strict accessibility and compliance requirements. Your ChatGPT app must meet WCAG AA standards and support users with disabilities.

WCAG AA Compliance Testing

describe('WCAG AA Accessibility Standards', () => {
  it('should have sufficient color contrast', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const html = response.structuredContent;
    const contrastIssues = checkColorContrast(html);

    expect(contrastIssues).toHaveLength(0);
  });

  it('should support text resizing', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const rendered = renderWidget(response.structuredContent, {
      fontSize: '16px'
    });

    // All text should be readable at 16px
    expect(rendered.querySelectorAll('*').length).toBeGreaterThan(0);
    rendered.querySelectorAll('*').forEach(el => {
      const fontSize = window.getComputedStyle(el).fontSize;
      expect(parseInt(fontSize)).toBeGreaterThanOrEqual(16);
    });
  });

  it('should provide proper alt text for images', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const images = response.structuredContent.match(/<img[^>]*>/g) || [];
    images.forEach(img => {
      expect(img).toMatch(/alt=/);
      const alt = img.match(/alt="([^"]*)"/)[1];
      expect(alt.length).toBeGreaterThan(0);
    });
  });

  it('should support keyboard navigation', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const html = response.structuredContent;
    const buttons = countElements(html, 'button');
    const interactiveElements = buttons + countElements(html, 'a');

    // All interactive elements should be reachable with Tab key
    expect(interactiveElements).toBeGreaterThan(0);
  });
});

Screen Reader Testing

describe('Screen Reader Compatibility', () => {
  it('should announce class availability clearly', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const ariaLabels = extractAriaLabels(response.structuredContent);

    expect(ariaLabels).toContainEqual(
      expect.stringMatching(/yoga class.*available/i)
    );
  });

  it('should provide semantic HTML structure', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const html = response.structuredContent;

    // Should use proper heading hierarchy
    expect(html).toMatch(/<h[1-6]/);

    // Should have proper list structure
    expect(html).toMatch(/<ul|<ol/);

    // Should have proper button/link elements
    expect(html).toMatch(/<button|<a/);
  });
});

Part 13: Mobile & Responsive Testing

ChatGPT can be used on mobile, tablet, and desktop. Your app must work on all screen sizes.

Viewport Testing

describe('Mobile & Responsive Design', () => {
  const viewports = [
    { name: 'Mobile', width: 375, height: 667 }, // iPhone SE
    { name: 'Tablet', width: 768, height: 1024 }, // iPad
    { name: 'Desktop', width: 1920, height: 1080 } // Desktop
  ];

  viewports.forEach(viewport => {
    it(`should render correctly on ${viewport.name}`, async () => {
      const response = await client.callTool('searchClasses', {
        date: '2025-12-26'
      });

      const rendered = renderWidget(response.structuredContent, {
        viewport: viewport
      });

      // No horizontal overflow
      expect(rendered.scrollWidth).toBeLessThanOrEqual(viewport.width);

      // All buttons/CTAs should be clickable on touch
      const buttons = rendered.querySelectorAll('button');
      buttons.forEach(btn => {
        const rect = btn.getBoundingClientRect();
        expect(rect.height).toBeGreaterThanOrEqual(44); // 44px minimum for touch
        expect(rect.width).toBeGreaterThanOrEqual(44);
      });
    });
  });

  it('should not have internal scrolling on mobile', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const html = response.structuredContent;

    // Cards should not have overflow: scroll
    expect(html).not.toMatch(/overflow\s*:\s*scroll/);
    expect(html).not.toMatch(/max-height.*overflow/);
  });
});

Touch Target Testing

describe('Touch Target Sizing', () => {
  it('should have minimum 44px touch targets', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const rendered = renderWidget(response.structuredContent);
    const interactiveElements = rendered.querySelectorAll(
      'button, a, input, [role="button"]'
    );

    interactiveElements.forEach(el => {
      const rect = el.getBoundingClientRect();
      expect(rect.height).toBeGreaterThanOrEqual(44);
      expect(rect.width).toBeGreaterThanOrEqual(44);
    });
  });

  it('should have sufficient spacing between touch targets', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const rendered = renderWidget(response.structuredContent);
    const buttons = rendered.querySelectorAll('button');

    for (let i = 0; i < buttons.length - 1; i++) {
      const rect1 = buttons[i].getBoundingClientRect();
      const rect2 = buttons[i + 1].getBoundingClientRect();

      const spacing = Math.abs(rect1.bottom - rect2.top);
      expect(spacing).toBeGreaterThanOrEqual(8); // 8px minimum
    }
  });
});

Part 14: Visual Regression Testing

Catch UI changes that break widget rendering.

Screenshot Comparison Testing

describe('Visual Regression Testing', () => {
  it('should match baseline screenshot on desktop', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const screenshot = renderAndCapture(response.structuredContent, {
      viewport: { width: 800, height: 600 }
    });

    expect(screenshot).toMatchImageSnapshot({
      failureThreshold: 0.001,
      failureThresholdType: 'percent'
    });
  });

  it('should match baseline screenshot on mobile', async () => {
    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    const screenshot = renderAndCapture(response.structuredContent, {
      viewport: { width: 375, height: 667 }
    });

    expect(screenshot).toMatchImageSnapshot({
      failureThreshold: 0.001,
      failureThresholdType: 'percent'
    });
  });
});

Part 15: Maintenance & Ongoing Testing

Testing doesn't end after deployment. Continuous testing ensures your app stays production-ready.

Monitoring Test Health

describe('Production Monitoring', () => {
  it('should detect performance degradation', async () => {
    const baseline = 1200; // 1.2 seconds baseline

    const response = await client.callTool('searchClasses', {
      date: '2025-12-26'
    });

    expect(response._meta.executionTime).toBeLessThan(baseline * 1.1); // 10% increase is concerning
  });

  it('should alert on increased error rate', async () => {
    const iterations = 100;
    const errors = [];

    for (let i = 0; i < iterations; i++) {
      try {
        await client.callTool('bookClass', {
          classId: 1,
          userId: `test-user-${i}`
        });
      } catch (error) {
        errors.push(error);
      }
    }

    const errorRate = errors.length / iterations;
    expect(errorRate).toBeLessThan(0.05); // Error rate should stay below 5%
  });

  it('should validate third-party API availability', async () => {
    const mindbodyStatus = await checkAPIHealth('https://api.mindbody.io');
    expect(mindbodyStatus.statusCode).toBe(200);

    const stripeStatus = await checkAPIHealth('https://api.stripe.com');
    expect(stripeStatus.statusCode).toBe(200);
  });
});

Regression Test Suites

Create a comprehensive regression suite to run before each deployment:

#!/bin/bash
# regression-test.sh

echo "Running full regression test suite..."

# Unit tests
npm test -- --coverage

# Integration tests with MCP Inspector
npm run test:mcp

# E2E tests
npm run test:e2e

# Accessibility tests
npm run test:a11y

# Visual regression
npm run test:visual

# Performance benchmarks
npm run test:performance

# Security tests
npm run test:security

# Generate report
npm run test:report

echo "Regression testing complete!"

Getting Started with MakeAIHQ Testing Tools

Don't build your testing infrastructure from scratch. MakeAIHQ provides:

1. Pre-Built Test Templates Copy-paste test structures for common app types (fitness, restaurants, e-commerce)

2. MCP Inspector Integration One-click setup with your MakeAIHQ apps

3. Automated Test Runner Continuous testing as you build

4. OpenAI Compliance Validator Automatic QA checklist verification

Browse Testing Templates →

Try MCP Inspector Free →


Next Steps

  1. Set up MCP Inspector with your existing app
  2. Write unit tests for your tool handlers
  3. Run integration tests to validate tool composition
  4. Test E2E workflows in actual ChatGPT
  5. Run OpenAI compliance checklist before submission

Every app that passes this testing process gets approved by OpenAI on the first submission. Build with confidence.


Related Testing Resources

  • Unit Testing MCP Server Tools
  • MCP Inspector Setup & Usage
  • Error Handling Best Practices for ChatGPT Apps
  • Performance Optimization for ChatGPT Widgets
  • OpenAI Approval: 12 Critical Requirements
  • End-to-End Testing Strategies
  • Security Testing for ChatGPT Apps
  • Load Testing & Spike Tolerance
  • Token Limit Validation
  • CI/CD Pipeline for ChatGPT Apps
  • Debugging MCP Servers
  • Test Automation Frameworks
  • Widget Rendering Validation
  • Authentication Testing for OAuth
  • Conversation Flow Testing
  • Third-Party API Mocking
  • Database Testing Strategies
  • Image Upload Testing
  • Real-Time Data Testing
  • Stress Testing ChatGPT Apps
  • Browser Testing Tools
  • Monitoring Production Performance
  • Test Report Generation
  • QA Automation Best Practices
  • Testing ChatGPT App Widgets
  • Accessibility Testing
  • Regression Testing
  • Compliance Testing

Ready to Test Your ChatGPT App?

Start with MakeAIHQ's testing templates. Choose your industry, and we'll give you:

  • Pre-built unit tests
  • MCP Inspector test cases
  • E2E test workflows
  • OpenAI compliance checklist

All you do is customize the examples and run. Your app passes OpenAI review on the first submission.

Start Testing Free →

View All Templates →

View Pricing →