ChatGPT App Testing & QA: Unit Tests, Integration Tests & MCP Inspector
Testing is the difference between a ChatGPT app that works perfectly in development and one that crashes in production. With OpenAI's increasingly strict approval standards, comprehensive QA isn't optional—it's mandatory for first-submission approval.
This guide covers every testing methodology you need to ship production-ready ChatGPT apps: from unit testing individual tools to end-to-end testing complete conversation flows using MCP Inspector.
Part 1: ChatGPT App Testing Fundamentals
What Makes ChatGPT App Testing Different?
ChatGPT apps face three unique testing challenges traditional web apps don't:
1. Non-Deterministic AI Behavior OpenAI's language model produces different tool calls for the same user input depending on context, conversation history, and model variance. Your testing strategy must account for this unpredictability.
2. Stateful Multi-Turn Conversations ChatGPT maintains conversation history across multiple user messages. A tool call in message 5 might depend on context from messages 1-4. Testing must validate entire conversation flows, not just individual API calls.
3. OpenAI Approval Requirements Unlike traditional apps, ChatGPT apps face manual human review by OpenAI. Your QA process must verify compliance with 12+ technical requirements before submission.
Why MCP Inspector Is Essential
The Model Context Protocol (MCP) Inspector is your primary testing tool. It simulates how ChatGPT calls your MCP server, letting you validate tool definitions, test tool handlers, and catch errors before OpenAI reviews your app. According to the official Model Context Protocol specification, MCP Inspector provides the reference implementation for testing MCP servers before production deployment.
MCP Inspector replaces manual testing. Instead of manually triggering tools in ChatGPT, you can systematically test:
- Tool registration and metadata
- Handler error cases
- Response format validation
- Performance benchmarks (sub-4k token responses)
- Widget rendering
- OAuth token validation
- Error recovery paths
Without MCP Inspector, you're guessing whether your app actually works. This testing gap is why 60% of ChatGPT app submissions are rejected on the first attempt.
MCP Inspector integrates with the OpenAI Apps SDK, giving you the exact same validation that OpenAI reviewers use during app approval. You catch compliance issues in minutes instead of waiting days for manual rejection.
Part 2: Unit Testing MCP Server Tools
Unit testing focuses on individual tool handlers in isolation—testing a single tool with various inputs to verify correct behavior.
Tool Handler Testing Strategy
Each tool in your MCP server should have unit tests covering:
1. Happy Path (Expected Behavior)
// Example: searchClasses tool for fitness studio app
describe('searchClasses tool', () => {
it('should return available classes when querying by date and time', async () => {
const handler = createToolHandler('searchClasses');
const result = await handler({
date: '2025-12-26',
time: '10:00',
classType: 'yoga'
});
expect(result).toHaveProperty('structuredContent');
expect(result.structuredContent).toHaveProperty('classes');
expect(result.structuredContent.classes.length).toBeGreaterThan(0);
expect(result.structuredContent.classes[0]).toHaveProperty('id');
expect(result.structuredContent.classes[0]).toHaveProperty('name');
expect(result.structuredContent.classes[0]).toHaveProperty('availableSpots');
});
});
2. Edge Cases (Boundary Conditions)
it('should handle queries with no available classes', async () => {
const handler = createToolHandler('searchClasses');
const result = await handler({
date: '2025-01-01', // New Year, all classes full
time: '15:00'
});
expect(result).toHaveProperty('structuredContent');
expect(result.structuredContent.classes).toEqual([]);
expect(result.content).toContain('No classes available');
});
it('should reject invalid date format', async () => {
const handler = createToolHandler('searchClasses');
expect(async () => {
await handler({
date: 'invalid-date',
time: '10:00'
});
}).rejects.toThrow('Invalid date format');
});
3. Error Handling (Network Failures, Third-Party API Issues)
it('should gracefully handle Mindbody API timeout', async () => {
// Mock Mindbody API to simulate timeout
jest.spyOn(global, 'fetch').mockRejectedValueOnce(new Error('timeout'));
const handler = createToolHandler('searchClasses');
const result = await handler({
date: '2025-12-26',
time: '10:00'
});
expect(result).toHaveProperty('content');
expect(result.content).toContain('unable to fetch available classes');
expect(result).not.toThrow();
});
4. Performance Benchmarks
it('should return response in under 2 seconds', async () => {
const handler = createToolHandler('searchClasses');
const startTime = performance.now();
await handler({
date: '2025-12-26',
time: '10:00'
});
const endTime = performance.now();
expect(endTime - startTime).toBeLessThan(2000);
});
it('should keep structured content under 4000 tokens', async () => {
const handler = createToolHandler('searchClasses');
const result = await handler({
date: '2025-12-26',
time: '10:00'
});
// Rough token estimate (1 token ≈ 4 characters)
const contentTokenCount = JSON.stringify(result.structuredContent).length / 4;
expect(contentTokenCount).toBeLessThan(4000);
});
Tool Parameter Validation
Every tool must validate input parameters before processing. Unit tests should verify validation works:
it('should validate required parameters', async () => {
const handler = createToolHandler('bookClass');
// Missing required 'classId'
expect(async () => {
await handler({ userId: '123' });
}).rejects.toThrow('Missing required parameter: classId');
});
it('should validate parameter types', async () => {
const handler = createToolHandler('bookClass');
// classId should be number, not string
expect(async () => {
await handler({
classId: 'not-a-number',
userId: '123'
});
}).rejects.toThrow('Parameter "classId" must be a number');
});
Test Coverage Targets
For production ChatGPT apps targeting OpenAI approval:
- Minimum 80% code coverage for all tool handlers
- 100% coverage for authentication/authorization logic
- 100% coverage for error paths (no unhandled exceptions)
Use a code coverage tool to track:
npm test -- --coverage
# Expected output:
# Statements : 85.3% ( 427/501 )
# Branches : 82.1% ( 312/380 )
# Functions : 87.4% ( 139/159 )
# Lines : 85.9% ( 430/501 )
Part 3: Integration Testing with MCP Inspector
Integration tests verify that your MCP server works correctly as a complete system—all tools together, with real dependencies (databases, third-party APIs).
Setting Up MCP Inspector
MCP Inspector is the official testing tool for MCP servers. Install and configure it:
npm install -D @modelcontextprotocol/inspector
# Add to package.json scripts:
# "test:mcp": "mcp-inspector http://localhost:3000/mcp"
Running MCP Inspector
# Start your MCP server
npm run dev:server
# In another terminal, start MCP Inspector
npm run test:mcp
# MCP Inspector opens at http://localhost:5000
# You'll see an interactive interface to test your tools
MCP Inspector Testing Workflow
Step 1: Tool Discovery MCP Inspector automatically discovers all tools registered with your server. Verify each tool appears:
Available Tools:
✅ searchClasses
✅ bookClass
✅ cancelBooking
✅ getMembershipStatus
Step 2: Interactive Tool Testing Use the MCP Inspector UI to manually test each tool:
- Click on tool (e.g., "searchClasses")
- Enter input parameters in JSON format:
{
"date": "2025-12-26",
"time": "10:00",
"classType": "yoga"
}
- Inspect full response:
{
"structuredContent": {
"classes": [
{ "id": 1, "name": "Morning Vinyasa", "spots": 5 }
]
},
"content": "Found 1 yoga class available",
"_meta": { "executionTime": 245 }
}
Automated Integration Tests with MCP Inspector
// integration.test.js
const mcpClient = require('@modelcontextprotocol/client-node');
describe('MCP Server Integration Tests', () => {
let client;
beforeAll(async () => {
client = new mcpClient({
url: 'http://localhost:3000/mcp'
});
await client.connect();
});
describe('searchClasses tool', () => {
it('should return proper MCP response structure', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
// Verify MCP response structure
expect(response).toHaveProperty('structuredContent');
expect(response).toHaveProperty('content');
expect(response).toHaveProperty('_meta');
// Verify structuredContent is valid
expect(response.structuredContent).toBeInstanceOf(Object);
expect(response.content).toBeInstanceOf(String);
expect(response._meta).toHaveProperty('executionTime');
});
it('should format response as valid HTML when requested', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00',
format: 'html'
});
// Response should be valid HTML that can render in ChatGPT
expect(response.content).toMatch(/^<div|^<section|^<article/);
expect(response.content).toContain('</');
});
});
describe('Cross-Tool Workflows', () => {
it('should handle booking flow: search → book → confirm', async () => {
// Step 1: Search for classes
const searchResponse = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
expect(searchResponse.structuredContent.classes.length).toBeGreaterThan(0);
const classId = searchResponse.structuredContent.classes[0].id;
// Step 2: Book the class
const bookResponse = await client.callTool('bookClass', {
classId: classId,
userId: 'test-user-123'
});
expect(bookResponse.content).toContain('successfully booked');
expect(bookResponse.structuredContent).toHaveProperty('confirmation');
// Step 3: Verify booking by checking membership status
const statusResponse = await client.callTool('getMembershipStatus', {
userId: 'test-user-123'
});
expect(statusResponse.structuredContent.upcomingClasses).toContainEqual(
expect.objectContaining({ id: classId })
);
});
});
afterAll(async () => {
await client.disconnect();
});
});
MCP Inspector Validation Checklist
Before submitting to OpenAI, validate these items in MCP Inspector:
## MCP Inspector Pre-Submission Checklist
- [ ] All tools appear in tool discovery list
- [ ] Each tool has accurate name and description
- [ ] Tool parameters match documentation
- [ ] Happy path returns valid MCP response structure
- [ ] Edge cases handled gracefully
- [ ] Error messages are user-friendly (not stack traces)
- [ ] Response tokens stay under 4000 token limit
- [ ] Response time under 2 seconds for all tools
- [ ] Widget/HTML responses render properly in ChatGPT UI
- [ ] No console errors in MCP Inspector logs
- [ ] Authenticated tools properly validate OAuth tokens
Part 4: End-to-End Testing
End-to-end (E2E) testing simulates real users interacting with your ChatGPT app through ChatGPT's interface. This is closest to how OpenAI will test your app during review.
E2E Testing Strategy
E2E tests verify complete user journeys:
Fitness Studio Booking E2E Test:
describe('Fitness Studio ChatGPT App - E2E', () => {
it('member should book a yoga class through natural conversation', async () => {
const conversation = new ChatGPTConversation({
appId: 'fitness-studio-app'
});
// User asks to book a class
const message1 = 'Can you help me book a yoga class tomorrow morning?';
const response1 = await conversation.send(message1);
// ChatGPT should call searchClasses tool
expect(response1.toolCalls).toContainEqual(
expect.objectContaining({
name: 'searchClasses',
params: expect.objectContaining({ classType: 'yoga' })
})
);
// Response should show available classes
expect(response1.content).toMatch(/available yoga classes/i);
expect(response1.structuredContent.classes.length).toBeGreaterThan(0);
// User selects a class
const message2 = 'I want to book the 10am Vinyasa Flow class';
const response2 = await conversation.send(message2);
// ChatGPT should call bookClass tool
expect(response2.toolCalls).toContainEqual(
expect.objectContaining({ name: 'bookClass' })
);
// Verify booking confirmation
expect(response2.content).toContain('successfully booked');
});
it('should handle booking conflicts gracefully', async () => {
const conversation = new ChatGPTConversation({
appId: 'fitness-studio-app',
userId: 'user-with-conflict'
});
const message = 'Book me for the 10am yoga class on Friday';
const response = await conversation.send(message);
// Should handle conflict without crashing
expect(response).not.toThrow();
expect(response.content).toMatch(/conflict|already booked|unavailable/i);
});
});
Real Conversation Testing with ngrok
For true E2E testing, deploy your MCP server with ngrok and test in actual ChatGPT:
# Install ngrok
npm install -g ngrok
# Start your MCP server locally
npm run dev:server
# In another terminal, expose to internet with ngrok
ngrok http 3000
# This generates a public HTTPS URL (expires after 2 hours):
# https://xxxx-xx-xxx-xxx-xx.ngrok.io
# Add to ChatGPT developer mode:
# 1. Open ChatGPT
# 2. Settings → Developer → Edit connectors
# 3. Paste ngrok URL: https://xxxx-xx-xxx-xxx-xx.ngrok.io/mcp
# 4. Test conversation in ChatGPT UI
Conversation Flow Testing Framework
Create a framework to test common conversation patterns:
// conversationFlows.test.js
const testFlows = {
'happy_path_booking': {
steps: [
{
userMessage: 'Book me a yoga class tomorrow at 10am',
expectedToolCall: 'searchClasses',
expectedResponse: /classes available|no classes/
},
{
userMessage: 'I want the first option',
expectedToolCall: 'bookClass',
expectedResponse: /successfully booked|already booked/
}
]
},
'error_recovery': {
steps: [
{
userMessage: 'Book me for a class at an invalid time: 25:00',
expectedError: true,
expectedResponse: /invalid time|between 5am and 10pm/
},
{
userMessage: 'How about 9am instead?',
expectedToolCall: 'searchClasses',
expectedResponse: /classes available/
}
]
}
};
describe('Conversation Flows', () => {
Object.entries(testFlows).forEach(([flowName, flow]) => {
it(`should handle ${flowName} correctly`, async () => {
const conversation = new ChatGPTConversation();
for (const step of flow.steps) {
const response = await conversation.send(step.userMessage);
if (step.expectedToolCall) {
expect(response.toolCalls[0].name).toBe(step.expectedToolCall);
}
expect(response.content).toMatch(step.expectedResponse);
}
});
});
});
Part 5: Performance Testing & Optimization
OpenAI has strict performance requirements. ChatGPT can't wait for slow API responses while users watch.
Response Time Benchmarks
Critical Performance Targets:
- Tool execution: Under 2 seconds
- Response formatting: Under 500ms
- Widget rendering: Under 1 second
- Total tool call to display: Under 3 seconds
describe('Performance Benchmarks', () => {
it('searchClasses should complete in under 2 seconds', async () => {
const iterations = 10;
const times = [];
for (let i = 0; i < iterations; i++) {
const start = performance.now();
await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
const end = performance.now();
times.push(end - start);
}
const avgTime = times.reduce((a, b) => a + b) / times.length;
const maxTime = Math.max(...times);
console.log(`Average: ${avgTime.toFixed(0)}ms, Max: ${maxTime.toFixed(0)}ms`);
expect(avgTime).toBeLessThan(1500); // 1.5s average
expect(maxTime).toBeLessThan(2000); // 2s max
});
});
Load Testing (Spike Tolerance)
ChatGPT may send multiple simultaneous requests. Test your server's behavior under load:
describe('Load Testing', () => {
it('should handle 10 concurrent requests', async () => {
const promises = [];
for (let i = 0; i < 10; i++) {
promises.push(
client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
})
);
}
const results = await Promise.all(promises);
expect(results).toHaveLength(10);
results.forEach(result => {
expect(result).toHaveProperty('structuredContent');
});
});
});
Token Limit Validation
Every response must stay under 4000 tokens:
function estimateTokens(text) {
// Rough estimate: 1 token ≈ 4 characters
return Math.ceil(text.length / 4);
}
describe('Token Limits', () => {
it('responses should stay under 4000 tokens', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
const contentTokens = estimateTokens(JSON.stringify(response.structuredContent));
expect(contentTokens).toBeLessThan(4000);
});
});
Part 6: OpenAI Approval QA Checklist
Before submitting your ChatGPT app to OpenAI, run through this complete QA checklist:
Critical Compliance Checks
## OpenAI Approval QA Checklist
### Functionality
- [ ] All tools work correctly in MCP Inspector
- [ ] Happy path conversions complete end-to-end
- [ ] Error handling prevents crashes
- [ ] No unhandled exceptions in logs
- [ ] Performance under 2s per tool call
### MCP Protocol Compliance
- [ ] Tool metadata includes name, description, parameters
- [ ] Responses include structuredContent, content, _meta
- [ ] Widget responses use mimeType: "text/html+skybridge"
- [ ] No custom fonts (system fonts only)
- [ ] Max 2 primary CTAs per card
### OpenAI UX/UI Standards
- [ ] Inline widgets don't exceed 4000 tokens
- [ ] No nested scrolling in cards
- [ ] No more than 3 levels of navigation
- [ ] Contrast ratios meet WCAG AA standards
- [ ] Mobile responsive
- [ ] Alt text for all images
### Security & Auth
- [ ] OAuth 2.1 with PKCE properly implemented
- [ ] Access tokens verified on every request
- [ ] No API keys exposed in frontend
- [ ] HTTPS enforced
- [ ] CORS headers correct
### Data & Privacy
- [ ] Privacy policy linked
- [ ] GDPR compliant (if EU users)
- [ ] PII not logged
- [ ] Data retention policy clear
- [ ] User consent obtained for data collection
### Testing Coverage
- [ ] 80%+ code coverage
- [ ] All error paths tested
- [ ] MCP Inspector validates all tools
- [ ] E2E tests for main workflows
- [ ] Performance benchmarks met
Pre-Submission Testing Workflow
Week 1: Unit Testing
- Write and run unit tests for all tools
- Achieve 80%+ code coverage
- Fix all test failures
Week 2: Integration Testing
- Test with MCP Inspector
- Test multi-step workflows
- Load test under concurrent requests
Week 3: E2E Testing
- Test in actual ChatGPT (via ngrok)
- Run through all user journeys
- Test edge cases and error scenarios
Week 4: OpenAI Compliance
- Run through QA checklist
- Fix any compliance issues
- Create test report document
Week 5: Final Review
- Have peer review test your app
- Fix any discovered issues
- Generate final test report
Part 7: Common Testing Pitfalls & Solutions
Pitfall 1: Only Testing Happy Path
Problem: Tests only validate the ideal scenario, missing edge cases that crash in production.
Solution: Test error cases systematically:
// Don't do this:
it('should search for classes', async () => {
const result = await searchClasses({ date: '2025-12-26', time: '10:00' });
expect(result.classes.length).toBeGreaterThan(0);
});
// Do this instead:
it('should search for classes when available', async () => { ... });
it('should return empty when no classes available', async () => { ... });
it('should handle invalid date format', async () => { ... });
it('should handle API timeout', async () => { ... });
it('should handle authentication error', async () => { ... });
Pitfall 2: Ignoring Token Limits
Problem: Response works in testing but exceeds 4000 tokens in production, breaking ChatGPT rendering.
Solution: Add token validation to every test:
const tokens = estimateTokens(JSON.stringify(response));
expect(tokens).toBeLessThan(4000);
Pitfall 3: Not Testing Structured Content Rendering
Problem: Data structure is valid JSON but doesn't render properly in ChatGPT UI.
Solution: Test actual widget rendering:
const rendered = renderWidget(response.structuredContent);
expect(rendered).not.toContain('[object Object]');
expect(rendered.querySelectorAll('button').length).toBeLessThanOrEqual(2);
Pitfall 4: Assuming ChatGPT Behavior
Problem: Assuming ChatGPT will always call your tools in expected order, breaking when model behavior changes.
Solution: Design tools to be order-independent:
// Wrong: Assumes getPreviousBooking called before cancelBooking
function cancelBooking(bookingId) { ... }
// Right: Booking ID is explicit parameter
function cancelBooking(bookingId) { ... }
Part 8: CI/CD Integration for Testing
Automate testing in your deployment pipeline:
# .github/workflows/test.yml
name: ChatGPT App CI/CD
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '20'
- run: npm install
- run: npm run test -- --coverage
env:
CI: true
- run: npm run test:mcp
- run: npm run build
- name: Generate Test Report
if: always()
run: |
echo "## Test Results" >> $GITHUB_STEP_SUMMARY
echo "✅ Unit Tests: Passed" >> $GITHUB_STEP_SUMMARY
echo "✅ Integration Tests: Passed" >> $GITHUB_STEP_SUMMARY
echo "✅ Coverage: 85%" >> $GITHUB_STEP_SUMMARY
Part 9: Authentication & OAuth Testing
ChatGPT apps that require user authentication must implement OAuth 2.1 with PKCE. Testing OAuth flows is critical for security and user experience.
OAuth Token Validation Testing
describe('OAuth Token Validation', () => {
it('should reject expired access tokens', async () => {
const expiredToken = jwt.sign(
{ sub: 'user-123' },
'secret',
{ expiresIn: '1s' }
);
// Wait for token to expire
await new Promise(resolve => setTimeout(resolve, 1100));
expect(async () => {
await client.callTool('bookClass', {
classId: 1,
accessToken: expiredToken
});
}).rejects.toThrow('Token expired');
});
it('should verify token issuer and audience', async () => {
const malformedToken = jwt.sign(
{
sub: 'user-123',
iss: 'wrong-issuer',
aud: 'wrong-audience'
},
'secret'
);
expect(async () => {
await client.callTool('bookClass', {
classId: 1,
accessToken: malformedToken
});
}).rejects.toThrow('Invalid token issuer or audience');
});
it('should validate token signature', async () => {
const tamperedToken = jwt.sign(
{ sub: 'user-456' }, // Different user
'wrong-secret'
);
expect(async () => {
await client.callTool('bookClass', {
classId: 1,
accessToken: tamperedToken
});
}).rejects.toThrow('Invalid token signature');
});
});
OAuth Flow E2E Testing
describe('OAuth 2.1 PKCE Flow', () => {
it('should complete authorization code flow with PKCE', async () => {
// Step 1: Generate code verifier
const codeVerifier = generateRandomString(128);
const codeChallenge = base64UrlEncode(
await crypto.subtle.digest('SHA-256', new TextEncoder().encode(codeVerifier))
);
// Step 2: Simulate user clicking "authorize" on OAuth provider
const authCode = await simulateOAuthAuthorization({
codeChallenge: codeChallenge,
clientId: 'test-client-id'
});
expect(authCode).toBeDefined();
// Step 3: Exchange code for token
const tokenResponse = await exchangeOAuthCode({
code: authCode,
codeVerifier: codeVerifier,
clientId: 'test-client-id'
});
expect(tokenResponse).toHaveProperty('access_token');
expect(tokenResponse.access_token).toBeDefined();
// Step 4: Use token to call protected tool
const response = await client.callTool('bookClass', {
classId: 1,
accessToken: tokenResponse.access_token
});
expect(response.content).toContain('successfully booked');
});
it('should fail OAuth flow if code verifier invalid', async () => {
const codeVerifier = generateRandomString(128);
const codeChallenge = base64UrlEncode(
await crypto.subtle.digest('SHA-256', new TextEncoder().encode(codeVerifier))
);
const authCode = await simulateOAuthAuthorization({
codeChallenge: codeChallenge
});
// Try to exchange with wrong code verifier
expect(async () => {
await exchangeOAuthCode({
code: authCode,
codeVerifier: generateRandomString(128), // Different verifier
clientId: 'test-client-id'
});
}).rejects.toThrow('Invalid code verifier');
});
});
Testing Unauthenticated vs Authenticated Endpoints
describe('Authentication Requirements', () => {
it('should allow searchClasses without authentication', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
expect(response).toHaveProperty('structuredContent');
});
it('should require authentication for bookClass', async () => {
expect(async () => {
await client.callTool('bookClass', {
classId: 1
// Missing accessToken
});
}).rejects.toThrow('Authentication required');
});
it('should restrict bookings to authenticated user', async () => {
const token1 = await getAuthToken('user-1');
const token2 = await getAuthToken('user-2');
// User 1 books a class
await client.callTool('bookClass', {
classId: 1,
accessToken: token1
});
// User 2 tries to retrieve User 1's booking
const bookings = await client.callTool('getUserBookings', {
accessToken: token2
});
expect(bookings.structuredContent).not.toContainEqual(
expect.objectContaining({ id: 1 })
);
});
});
Scopes and Permissions Testing
If your OAuth implementation uses scopes:
describe('OAuth Scopes', () => {
it('should allow read-only operations with read scope', async () => {
const readOnlyToken = await getAuthToken('user-123', ['read']);
// Read operations should work
const response = await client.callTool('searchClasses', {
accessToken: readOnlyToken
});
expect(response).toHaveProperty('structuredContent');
});
it('should reject write operations without write scope', async () => {
const readOnlyToken = await getAuthToken('user-123', ['read']);
// Write operations should fail
expect(async () => {
await client.callTool('bookClass', {
classId: 1,
accessToken: readOnlyToken
});
}).rejects.toThrow('Insufficient permissions');
});
it('should allow all operations with admin scope', async () => {
const adminToken = await getAuthToken('admin-123', ['admin']);
// Both read and write should work
const search = await client.callTool('searchClasses', {
accessToken: adminToken
});
expect(search).toHaveProperty('structuredContent');
const book = await client.callTool('bookClass', {
classId: 1,
accessToken: adminToken
});
expect(book.content).toContain('successfully booked');
});
});
Part 10: Database & Persistence Testing
If your MCP server uses a database (Firebase, PostgreSQL, MongoDB), you need database-specific tests.
Testing Database Transactions
describe('Database Transactions', () => {
it('should prevent double-booking of same class', async () => {
const userId1 = 'user-1';
const userId2 = 'user-2';
const classId = 1;
// Simulate concurrent booking attempts
const [result1, result2] = await Promise.all([
client.callTool('bookClass', {
classId: classId,
userId: userId1
}),
client.callTool('bookClass', {
classId: classId,
userId: userId2
})
]);
// One should succeed, one should fail
const successCount = [result1, result2].filter(
r => r.content.includes('successfully booked')
).length;
expect(successCount).toBe(1); // Only one booking succeeded
});
it('should rollback partial bookings on error', async () => {
const result = await client.callTool('bookClassAndChargeCard', {
classId: 1,
userId: 'user-with-invalid-card',
cardToken: 'invalid-token'
});
// Payment should fail
expect(result.content).toContain('payment failed');
// Class should NOT be booked
const bookings = await client.callTool('getUserBookings', {
userId: 'user-with-invalid-card'
});
expect(bookings.structuredContent.bookings).not.toContainEqual(
expect.objectContaining({ classId: 1 })
);
});
});
Database Consistency Testing
describe('Database Consistency', () => {
it('should maintain referential integrity', async () => {
// Book a class
await client.callTool('bookClass', {
classId: 1,
userId: 'user-123'
});
// Delete the class
await deleteClass(1);
// User's booking should be cleaned up or show as deleted
const bookings = await client.callTool('getUserBookings', {
userId: 'user-123'
});
const classExists = bookings.structuredContent.bookings.some(
b => b.classId === 1
);
expect(classExists).toBe(false);
});
it('should maintain data consistency under load', async () => {
const promises = [];
// 100 concurrent bookings
for (let i = 0; i < 100; i++) {
promises.push(
client.callTool('bookClass', {
classId: 1,
userId: `user-${i}`
})
);
}
await Promise.all(promises);
// Verify total bookings = 100 (or less if class capacity limited)
const totalBookings = await getTotalBookingsForClass(1);
expect(totalBookings).toBeLessThanOrEqual(100);
expect(totalBookings).toBeGreaterThan(0);
});
});
Part 11: Third-Party API Integration Testing
Most ChatGPT apps integrate with third-party APIs (Mindbody, Stripe, OpenTable, etc.). Testing these integrations is crucial.
API Integration Testing with Mocks
describe('Mindbody API Integration', () => {
beforeEach(() => {
// Mock Mindbody API responses
jest.spyOn(mindbodyClient, 'getClasses').mockResolvedValue([
{
id: 1,
name: 'Vinyasa Flow',
instructorId: 5,
startTime: '2025-12-26T10:00:00Z',
capacity: 20,
enrolled: 18
}
]);
});
it('should map Mindbody API response to ChatGPT widget format', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
expect(response.structuredContent.classes[0]).toEqual(
expect.objectContaining({
id: 1,
name: 'Vinyasa Flow',
instructor: expect.stringContaining('instructor'),
availableSpots: 2 // 20 capacity - 18 enrolled
})
);
});
it('should handle Mindbody API errors gracefully', async () => {
mindbodyClient.getClasses.mockRejectedValueOnce(
new Error('API rate limit exceeded')
);
const response = await client.callTool('searchClasses', {
date: '2025-12-26',
time: '10:00'
});
expect(response.content).toContain('temporarily unavailable');
expect(response.content).not.toContain('API rate limit'); // Don't expose internal errors
});
});
Testing API Fallback Strategies
describe('API Fallback Strategies', () => {
it('should use cached data if API fails', async () => {
// First call succeeds and caches
mindbodyClient.getClasses.mockResolvedValueOnce([...classData]);
const response1 = await client.callTool('searchClasses', {
date: '2025-12-26'
});
expect(response1.structuredContent.classes.length).toBe(10);
// Second call fails, should use cache
mindbodyClient.getClasses.mockRejectedValueOnce(new Error('API down'));
const response2 = await client.callTool('searchClasses', {
date: '2025-12-26'
});
expect(response2.structuredContent.classes.length).toBe(10); // Cached data
});
it('should indicate when data is stale', async () => {
// Use cached data
mindbodyClient.getClasses.mockRejectedValueOnce(new Error('API down'));
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
expect(response._meta).toHaveProperty('isCached');
expect(response._meta.isCached).toBe(true);
expect(response.content).toContain('schedule may not be current');
});
});
Part 12: Accessibility & Compliance Testing
OpenAI has strict accessibility and compliance requirements. Your ChatGPT app must meet WCAG AA standards and support users with disabilities.
WCAG AA Compliance Testing
describe('WCAG AA Accessibility Standards', () => {
it('should have sufficient color contrast', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const html = response.structuredContent;
const contrastIssues = checkColorContrast(html);
expect(contrastIssues).toHaveLength(0);
});
it('should support text resizing', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const rendered = renderWidget(response.structuredContent, {
fontSize: '16px'
});
// All text should be readable at 16px
expect(rendered.querySelectorAll('*').length).toBeGreaterThan(0);
rendered.querySelectorAll('*').forEach(el => {
const fontSize = window.getComputedStyle(el).fontSize;
expect(parseInt(fontSize)).toBeGreaterThanOrEqual(16);
});
});
it('should provide proper alt text for images', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const images = response.structuredContent.match(/<img[^>]*>/g) || [];
images.forEach(img => {
expect(img).toMatch(/alt=/);
const alt = img.match(/alt="([^"]*)"/)[1];
expect(alt.length).toBeGreaterThan(0);
});
});
it('should support keyboard navigation', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const html = response.structuredContent;
const buttons = countElements(html, 'button');
const interactiveElements = buttons + countElements(html, 'a');
// All interactive elements should be reachable with Tab key
expect(interactiveElements).toBeGreaterThan(0);
});
});
Screen Reader Testing
describe('Screen Reader Compatibility', () => {
it('should announce class availability clearly', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const ariaLabels = extractAriaLabels(response.structuredContent);
expect(ariaLabels).toContainEqual(
expect.stringMatching(/yoga class.*available/i)
);
});
it('should provide semantic HTML structure', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const html = response.structuredContent;
// Should use proper heading hierarchy
expect(html).toMatch(/<h[1-6]/);
// Should have proper list structure
expect(html).toMatch(/<ul|<ol/);
// Should have proper button/link elements
expect(html).toMatch(/<button|<a/);
});
});
Part 13: Mobile & Responsive Testing
ChatGPT can be used on mobile, tablet, and desktop. Your app must work on all screen sizes.
Viewport Testing
describe('Mobile & Responsive Design', () => {
const viewports = [
{ name: 'Mobile', width: 375, height: 667 }, // iPhone SE
{ name: 'Tablet', width: 768, height: 1024 }, // iPad
{ name: 'Desktop', width: 1920, height: 1080 } // Desktop
];
viewports.forEach(viewport => {
it(`should render correctly on ${viewport.name}`, async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const rendered = renderWidget(response.structuredContent, {
viewport: viewport
});
// No horizontal overflow
expect(rendered.scrollWidth).toBeLessThanOrEqual(viewport.width);
// All buttons/CTAs should be clickable on touch
const buttons = rendered.querySelectorAll('button');
buttons.forEach(btn => {
const rect = btn.getBoundingClientRect();
expect(rect.height).toBeGreaterThanOrEqual(44); // 44px minimum for touch
expect(rect.width).toBeGreaterThanOrEqual(44);
});
});
});
it('should not have internal scrolling on mobile', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const html = response.structuredContent;
// Cards should not have overflow: scroll
expect(html).not.toMatch(/overflow\s*:\s*scroll/);
expect(html).not.toMatch(/max-height.*overflow/);
});
});
Touch Target Testing
describe('Touch Target Sizing', () => {
it('should have minimum 44px touch targets', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const rendered = renderWidget(response.structuredContent);
const interactiveElements = rendered.querySelectorAll(
'button, a, input, [role="button"]'
);
interactiveElements.forEach(el => {
const rect = el.getBoundingClientRect();
expect(rect.height).toBeGreaterThanOrEqual(44);
expect(rect.width).toBeGreaterThanOrEqual(44);
});
});
it('should have sufficient spacing between touch targets', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const rendered = renderWidget(response.structuredContent);
const buttons = rendered.querySelectorAll('button');
for (let i = 0; i < buttons.length - 1; i++) {
const rect1 = buttons[i].getBoundingClientRect();
const rect2 = buttons[i + 1].getBoundingClientRect();
const spacing = Math.abs(rect1.bottom - rect2.top);
expect(spacing).toBeGreaterThanOrEqual(8); // 8px minimum
}
});
});
Part 14: Visual Regression Testing
Catch UI changes that break widget rendering.
Screenshot Comparison Testing
describe('Visual Regression Testing', () => {
it('should match baseline screenshot on desktop', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const screenshot = renderAndCapture(response.structuredContent, {
viewport: { width: 800, height: 600 }
});
expect(screenshot).toMatchImageSnapshot({
failureThreshold: 0.001,
failureThresholdType: 'percent'
});
});
it('should match baseline screenshot on mobile', async () => {
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
const screenshot = renderAndCapture(response.structuredContent, {
viewport: { width: 375, height: 667 }
});
expect(screenshot).toMatchImageSnapshot({
failureThreshold: 0.001,
failureThresholdType: 'percent'
});
});
});
Part 15: Maintenance & Ongoing Testing
Testing doesn't end after deployment. Continuous testing ensures your app stays production-ready.
Monitoring Test Health
describe('Production Monitoring', () => {
it('should detect performance degradation', async () => {
const baseline = 1200; // 1.2 seconds baseline
const response = await client.callTool('searchClasses', {
date: '2025-12-26'
});
expect(response._meta.executionTime).toBeLessThan(baseline * 1.1); // 10% increase is concerning
});
it('should alert on increased error rate', async () => {
const iterations = 100;
const errors = [];
for (let i = 0; i < iterations; i++) {
try {
await client.callTool('bookClass', {
classId: 1,
userId: `test-user-${i}`
});
} catch (error) {
errors.push(error);
}
}
const errorRate = errors.length / iterations;
expect(errorRate).toBeLessThan(0.05); // Error rate should stay below 5%
});
it('should validate third-party API availability', async () => {
const mindbodyStatus = await checkAPIHealth('https://api.mindbody.io');
expect(mindbodyStatus.statusCode).toBe(200);
const stripeStatus = await checkAPIHealth('https://api.stripe.com');
expect(stripeStatus.statusCode).toBe(200);
});
});
Regression Test Suites
Create a comprehensive regression suite to run before each deployment:
#!/bin/bash
# regression-test.sh
echo "Running full regression test suite..."
# Unit tests
npm test -- --coverage
# Integration tests with MCP Inspector
npm run test:mcp
# E2E tests
npm run test:e2e
# Accessibility tests
npm run test:a11y
# Visual regression
npm run test:visual
# Performance benchmarks
npm run test:performance
# Security tests
npm run test:security
# Generate report
npm run test:report
echo "Regression testing complete!"
Getting Started with MakeAIHQ Testing Tools
Don't build your testing infrastructure from scratch. MakeAIHQ provides:
1. Pre-Built Test Templates Copy-paste test structures for common app types (fitness, restaurants, e-commerce)
2. MCP Inspector Integration One-click setup with your MakeAIHQ apps
3. Automated Test Runner Continuous testing as you build
4. OpenAI Compliance Validator Automatic QA checklist verification
Browse Testing Templates →
Try MCP Inspector Free →
Next Steps
- Set up MCP Inspector with your existing app
- Write unit tests for your tool handlers
- Run integration tests to validate tool composition
- Test E2E workflows in actual ChatGPT
- Run OpenAI compliance checklist before submission
Every app that passes this testing process gets approved by OpenAI on the first submission. Build with confidence.
Related Testing Resources
- Unit Testing MCP Server Tools
- MCP Inspector Setup & Usage
- Error Handling Best Practices for ChatGPT Apps
- Performance Optimization for ChatGPT Widgets
- OpenAI Approval: 12 Critical Requirements
- End-to-End Testing Strategies
- Security Testing for ChatGPT Apps
- Load Testing & Spike Tolerance
- Token Limit Validation
- CI/CD Pipeline for ChatGPT Apps
- Debugging MCP Servers
- Test Automation Frameworks
- Widget Rendering Validation
- Authentication Testing for OAuth
- Conversation Flow Testing
- Third-Party API Mocking
- Database Testing Strategies
- Image Upload Testing
- Real-Time Data Testing
- Stress Testing ChatGPT Apps
- Browser Testing Tools
- Monitoring Production Performance
- Test Report Generation
- QA Automation Best Practices
- Testing ChatGPT App Widgets
- Accessibility Testing
- Regression Testing
- Compliance Testing
Ready to Test Your ChatGPT App?
Start with MakeAIHQ's testing templates. Choose your industry, and we'll give you:
- Pre-built unit tests
- MCP Inspector test cases
- E2E test workflows
- OpenAI compliance checklist
All you do is customize the examples and run. Your app passes OpenAI review on the first submission.
Start Testing Free →
View All Templates →
View Pricing →