Feature Flags for ChatGPT Apps: Progressive Delivery Guide
Feature flags (also called feature toggles) are essential for deploying ChatGPT apps safely and testing new capabilities with controlled user segments. Unlike traditional web applications, ChatGPT apps require specialized flag implementations that respect OpenAI's widget runtime, handle real-time conversations, and integrate with MCP server architectures. This comprehensive guide demonstrates how to implement production-grade feature flag systems that enable progressive delivery, A/B testing, percentage rollouts, and emergency kill switches for ChatGPT applications.
Feature flags decouple deployment from release, allowing you to ship code to production while controlling feature visibility through configuration. For ChatGPT apps submitted to the OpenAI App Store, this means you can deploy updates without triggering re-review, gradually roll out experimental tools to beta users, run A/B tests on widget layouts, and instantly disable problematic features without redeployment. Whether you're building a fitness booking assistant, restaurant reservation system, or real estate search tool, feature flags provide the safety net and flexibility required for confident continuous delivery.
Feature Flag Architecture for ChatGPT Apps
Building a robust feature flag system for ChatGPT apps requires understanding the unique constraints of the MCP protocol and OpenAI's widget runtime. Your architecture must handle flag evaluation in both server-side MCP tool handlers and client-side widget rendering contexts, synchronize flag states across real-time conversations, and minimize latency impact on ChatGPT's conversational flow.
Flag Types and Use Cases
ChatGPT apps benefit from multiple flag types, each serving distinct purposes. Release flags control access to new MCP tools or widget components (e.g., enabling a new "cancel_booking" tool for 10% of users). Experiment flags drive A/B tests on widget layouts, tool prompts, or conversation flows (e.g., testing two different confirmation dialog designs). Operational flags act as kill switches for problematic features (e.g., disabling a flaky payment integration). Permission flags restrict premium features to paid users (e.g., hiding advanced analytics tools for free tier customers).
Evaluation Strategies
Feature flags can be evaluated at different layers of your ChatGPT app architecture. Server-side evaluation occurs in MCP tool handlers before returning responses, ideal for controlling tool availability or modifying tool behavior based on user segments. Client-side evaluation happens in widget code using window.openai.getWidgetState(), perfect for toggling UI components or adjusting widget layouts. Hybrid evaluation combines both approaches, evaluating access flags server-side (security-sensitive) and presentation flags client-side (performance-optimized).
Storage and Synchronization
Production feature flag systems require fast, reliable storage with real-time updates. Redis provides sub-millisecond flag lookups with pub/sub for instant flag changes. Firestore offers real-time listeners and built-in security rules for user-specific flags. LaunchDarkly delivers enterprise-grade flag management with targeting rules, analytics, and gradual rollouts. Environment variables work for simple static flags but lack runtime updates. Choose storage based on your scale: Redis for high-throughput apps (10K+ requests/second), Firestore for moderate loads with complex targeting, LaunchDarkly for teams prioritizing ease of use over infrastructure control.
LaunchDarkly Integration for ChatGPT Apps
LaunchDarkly provides the most mature feature flag platform with SDKs for Node.js (MCP servers) and JavaScript (widgets). Integration requires initializing the SDK in both contexts, defining consistent user contexts, and handling flag evaluation with appropriate fallbacks.
// mcp-server/src/feature-flags/launchdarkly-client.ts
import * as LaunchDarkly from 'launchdarkly-node-server-sdk';
import type { LDClient, LDContext, LDFlagValue } from 'launchdarkly-node-server-sdk';
export interface FeatureFlagContext {
userId: string;
email?: string;
tier?: 'free' | 'starter' | 'professional' | 'business';
organizationId?: string;
custom?: Record<string, string | number | boolean>;
}
export class LaunchDarklyClient {
private client: LDClient | null = null;
private initPromise: Promise<void>;
constructor(private sdkKey: string) {
this.initPromise = this.initialize();
}
private async initialize(): Promise<void> {
try {
this.client = LaunchDarkly.init(this.sdkKey, {
// Optimize for high-throughput ChatGPT apps
stream: true,
sendEvents: true,
capacity: 10000,
flushInterval: 5,
// Cache flags for 30 seconds (reduce API calls)
timeout: 5,
});
await this.client.waitForInitialization({ timeout: 10 });
console.log('LaunchDarkly initialized successfully');
} catch (error) {
console.error('Failed to initialize LaunchDarkly:', error);
// Continue with defaults (fail-open for non-critical flags)
}
}
private buildContext(context: FeatureFlagContext): LDContext {
return {
kind: 'user',
key: context.userId,
email: context.email,
tier: context.tier,
organizationId: context.organizationId,
custom: context.custom,
};
}
async getFlag<T extends LDFlagValue>(
flagKey: string,
context: FeatureFlagContext,
defaultValue: T
): Promise<T> {
await this.initPromise;
if (!this.client) {
console.warn(`LaunchDarkly not initialized, returning default for ${flagKey}`);
return defaultValue;
}
try {
const ldContext = this.buildContext(context);
const value = await this.client.variation(flagKey, ldContext, defaultValue);
return value as T;
} catch (error) {
console.error(`Flag evaluation failed for ${flagKey}:`, error);
return defaultValue;
}
}
async getAllFlags(context: FeatureFlagContext): Promise<Record<string, LDFlagValue>> {
await this.initPromise;
if (!this.client) {
return {};
}
try {
const ldContext = this.buildContext(context);
return await this.client.allFlagsState(ldContext).then(state => state.allValues());
} catch (error) {
console.error('Failed to get all flags:', error);
return {};
}
}
async close(): Promise<void> {
if (this.client) {
await this.client.flush();
await this.client.close();
}
}
}
// Usage in MCP tool handler
import { LaunchDarklyClient } from './feature-flags/launchdarkly-client';
const ldClient = new LaunchDarklyClient(process.env.LAUNCHDARKLY_SDK_KEY!);
async function handleBookingTool(userId: string, params: any) {
const context = {
userId,
email: params.userEmail,
tier: params.subscriptionTier,
};
// Check if cancel feature is enabled for this user
const canCancel = await ldClient.getFlag('enable-cancel-booking', context, false);
if (!canCancel && params.action === 'cancel') {
return {
content: 'Cancellation feature is not available for your account.',
_meta: { error: 'FEATURE_DISABLED' }
};
}
// Proceed with booking logic...
}
Targeting Rules and Gradual Rollouts
LaunchDarkly's dashboard allows sophisticated targeting without code changes. Create a percentage rollout by setting the flag to return true for 10% of users (hashed consistently by userId). Define custom rules like "enable for tier = professional OR organizationId = 'beta-testers'". Use scheduled flag changes to automatically enable features at launch time. Monitor flag evaluation events to track which users see which variants.
Widget-Side Integration
For client-side feature flags in ChatGPT widgets, use LaunchDarkly's JavaScript SDK with careful bundle size management (the SDK adds ~40KB gzipped). Alternatively, fetch all flags server-side and pass them via widgetState:
// In MCP tool handler (server-side)
const allFlags = await ldClient.getAllFlags(context);
return {
structuredContent: {
type: 'inline',
content: { /* widget HTML */ }
},
_meta: {
widgetState: {
featureFlags: allFlags, // Pass flags to widget
userId: context.userId
}
}
};
// In widget (client-side)
const state = window.openai.getWidgetState();
const flags = state.featureFlags || {};
if (flags['show-premium-analytics']) {
renderPremiumAnalytics();
}
Custom Feature Flag Service with Redis
For teams preferring infrastructure control over managed services, a custom feature flag system built on Redis provides millisecond-latency flag evaluation with real-time updates via pub/sub. This architecture suits high-scale ChatGPT apps processing thousands of conversations simultaneously.
// mcp-server/src/feature-flags/redis-flag-manager.ts
import Redis from 'ioredis';
import type { RedisOptions } from 'ioredis';
export interface FlagDefinition {
key: string;
enabled: boolean;
targeting?: {
userIds?: string[];
tiers?: string[];
percentage?: number;
customRules?: Array<{
attribute: string;
operator: 'equals' | 'contains' | 'greaterThan' | 'lessThan';
value: any;
}>;
};
variants?: {
control: any;
treatment: any;
};
createdAt: Date;
updatedAt: Date;
description?: string;
}
export interface FlagEvaluationContext {
userId: string;
tier?: string;
attributes?: Record<string, any>;
}
export class RedisFlagManager {
private client: Redis;
private subscriber: Redis;
private flagCache: Map<string, FlagDefinition> = new Map();
constructor(redisOptions: RedisOptions) {
this.client = new Redis(redisOptions);
this.subscriber = new Redis(redisOptions);
this.setupSubscriber();
this.loadAllFlags();
}
private setupSubscriber(): void {
this.subscriber.subscribe('feature-flags:updates');
this.subscriber.on('message', (channel, message) => {
if (channel === 'feature-flags:updates') {
const { key, definition } = JSON.parse(message);
if (definition) {
this.flagCache.set(key, definition);
console.log(`Flag updated: ${key}`);
} else {
this.flagCache.delete(key);
console.log(`Flag deleted: ${key}`);
}
}
});
}
private async loadAllFlags(): Promise<void> {
const keys = await this.client.keys('flag:*');
for (const key of keys) {
const flagKey = key.replace('flag:', '');
const data = await this.client.get(key);
if (data) {
const definition: FlagDefinition = JSON.parse(data);
this.flagCache.set(flagKey, definition);
}
}
console.log(`Loaded ${this.flagCache.size} feature flags`);
}
async setFlag(definition: Omit<FlagDefinition, 'createdAt' | 'updatedAt'>): Promise<void> {
const now = new Date();
const fullDefinition: FlagDefinition = {
...definition,
createdAt: now,
updatedAt: now,
};
const key = `flag:${definition.key}`;
await this.client.set(key, JSON.stringify(fullDefinition));
// Notify all instances
await this.client.publish('feature-flags:updates', JSON.stringify({
key: definition.key,
definition: fullDefinition,
}));
}
async deleteFlag(flagKey: string): Promise<void> {
await this.client.del(`flag:${flagKey}`);
await this.client.publish('feature-flags:updates', JSON.stringify({
key: flagKey,
definition: null,
}));
}
async evaluate(
flagKey: string,
context: FlagEvaluationContext,
defaultValue: any = false
): Promise<any> {
const definition = this.flagCache.get(flagKey);
if (!definition) {
return defaultValue;
}
if (!definition.enabled) {
return definition.variants?.control ?? defaultValue;
}
// No targeting rules = enabled for everyone
if (!definition.targeting) {
return definition.variants?.treatment ?? true;
}
const { targeting } = definition;
// User ID targeting
if (targeting.userIds && targeting.userIds.includes(context.userId)) {
return definition.variants?.treatment ?? true;
}
// Tier targeting
if (targeting.tiers && context.tier && targeting.tiers.includes(context.tier)) {
return definition.variants?.treatment ?? true;
}
// Custom rules
if (targeting.customRules && context.attributes) {
const rulesMatch = targeting.customRules.every(rule => {
const value = context.attributes![rule.attribute];
switch (rule.operator) {
case 'equals':
return value === rule.value;
case 'contains':
return String(value).includes(String(rule.value));
case 'greaterThan':
return Number(value) > Number(rule.value);
case 'lessThan':
return Number(value) < Number(rule.value);
default:
return false;
}
});
if (rulesMatch) {
return definition.variants?.treatment ?? true;
}
}
// Percentage rollout (consistent hashing)
if (targeting.percentage !== undefined) {
const hash = this.hashUserId(context.userId, flagKey);
const userPercentage = (hash % 100);
if (userPercentage < targeting.percentage) {
return definition.variants?.treatment ?? true;
}
}
return definition.variants?.control ?? defaultValue;
}
private hashUserId(userId: string, salt: string): number {
let hash = 0;
const combined = `${userId}:${salt}`;
for (let i = 0; i < combined.length; i++) {
const char = combined.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
async getAllFlags(): Promise<FlagDefinition[]> {
return Array.from(this.flagCache.values());
}
async close(): Promise<void> {
await this.client.quit();
await this.subscriber.quit();
}
}
// Usage example
const flagManager = new RedisFlagManager({
host: process.env.REDIS_HOST,
port: Number(process.env.REDIS_PORT),
password: process.env.REDIS_PASSWORD,
});
// Create a percentage rollout flag
await flagManager.setFlag({
key: 'new-analytics-widget',
enabled: true,
description: 'New analytics widget with Chart.js visualizations',
targeting: {
percentage: 25, // 25% of users
},
variants: {
control: 'legacy',
treatment: 'new',
},
});
// Evaluate in tool handler
const widgetVersion = await flagManager.evaluate(
'new-analytics-widget',
{ userId: 'user_123', tier: 'professional' },
'legacy'
);
React Feature Flag Hook for Widgets
ChatGPT widgets benefit from a React hook that abstracts flag evaluation, handles loading states, and provides type-safe flag access. This hook integrates with window.openai.getWidgetState() to receive server-evaluated flags while supporting client-side overrides for testing.
// widget/src/hooks/useFeatureFlag.ts
import { useState, useEffect, useMemo } from 'react';
export interface FeatureFlagConfig {
flags?: Record<string, any>;
overrides?: Record<string, any>; // For local testing
}
export function useFeatureFlags(config?: FeatureFlagConfig) {
const [flags, setFlags] = useState<Record<string, any>>(config?.flags || {});
const [loading, setLoading] = useState(true);
useEffect(() => {
// Fetch flags from widget state
const state = window.openai?.getWidgetState();
if (state?.featureFlags) {
setFlags(state.featureFlags);
}
setLoading(false);
// Listen for state updates
const handleStateChange = () => {
const updatedState = window.openai?.getWidgetState();
if (updatedState?.featureFlags) {
setFlags(updatedState.featureFlags);
}
};
window.addEventListener('openai:widgetStateChanged', handleStateChange);
return () => {
window.removeEventListener('openai:widgetStateChanged', handleStateChange);
};
}, []);
const effectiveFlags = useMemo(() => ({
...flags,
...(config?.overrides || {}),
}), [flags, config?.overrides]);
return {
flags: effectiveFlags,
loading,
isEnabled: (flagKey: string, defaultValue: boolean = false): boolean => {
return effectiveFlags[flagKey] ?? defaultValue;
},
getVariant: <T,>(flagKey: string, defaultValue: T): T => {
return effectiveFlags[flagKey] ?? defaultValue;
},
};
}
// Widget component using the hook
import React from 'react';
import { useFeatureFlags } from './hooks/useFeatureFlag';
export function BookingWidget() {
const { isEnabled, getVariant, loading } = useFeatureFlags();
if (loading) {
return <div>Loading...</div>;
}
const canCancelBookings = isEnabled('enable-cancel-booking', false);
const analyticsVersion = getVariant('analytics-widget-version', 'legacy');
const maxBookingsPerDay = getVariant('max-bookings-per-day', 10);
return (
<div className="booking-widget">
<h2>Your Bookings</h2>
{analyticsVersion === 'new' && (
<div className="analytics-dashboard">
{/* New Chart.js analytics */}
</div>
)}
{analyticsVersion === 'legacy' && (
<div className="simple-stats">
{/* Simple statistics table */}
</div>
)}
<ul>
{bookings.map(booking => (
<li key={booking.id}>
{booking.date} - {booking.service}
{canCancelBookings && (
<button onClick={() => handleCancel(booking.id)}>
Cancel
</button>
)}
</li>
))}
</ul>
{bookings.length >= maxBookingsPerDay && (
<p className="warning">
You've reached your daily booking limit ({maxBookingsPerDay}).
</p>
)}
</div>
);
}
Targeting Engine for Advanced Segmentation
Production feature flag systems require sophisticated targeting beyond simple percentage rollouts. A dedicated targeting engine evaluates complex rules combining user attributes, behavioral data, and time-based conditions.
// mcp-server/src/feature-flags/targeting-engine.ts
export interface UserContext {
userId: string;
email?: string;
tier?: string;
organizationId?: string;
createdAt?: Date;
lastActiveAt?: Date;
totalBookings?: number;
lifetimeValue?: number;
location?: {
country?: string;
region?: string;
city?: string;
};
device?: {
type?: 'mobile' | 'desktop' | 'tablet';
os?: string;
};
experiments?: string[]; // Active experiment IDs
}
export type TargetingOperator =
| 'equals'
| 'notEquals'
| 'contains'
| 'notContains'
| 'greaterThan'
| 'lessThan'
| 'greaterThanOrEqual'
| 'lessThanOrEqual'
| 'in'
| 'notIn'
| 'matches' // Regex
| 'before' // Date comparison
| 'after';
export interface TargetingRule {
attribute: string; // e.g., "tier", "totalBookings", "location.country"
operator: TargetingOperator;
value: any;
}
export interface TargetingSegment {
name: string;
rules: TargetingRule[];
logic: 'AND' | 'OR'; // How to combine rules
}
export class TargetingEngine {
evaluateSegment(segment: TargetingSegment, context: UserContext): boolean {
const results = segment.rules.map(rule => this.evaluateRule(rule, context));
return segment.logic === 'AND'
? results.every(r => r)
: results.some(r => r);
}
private evaluateRule(rule: TargetingRule, context: UserContext): boolean {
const actualValue = this.getNestedValue(context, rule.attribute);
const expectedValue = rule.value;
switch (rule.operator) {
case 'equals':
return actualValue === expectedValue;
case 'notEquals':
return actualValue !== expectedValue;
case 'contains':
return String(actualValue).includes(String(expectedValue));
case 'notContains':
return !String(actualValue).includes(String(expectedValue));
case 'greaterThan':
return Number(actualValue) > Number(expectedValue);
case 'lessThan':
return Number(actualValue) < Number(expectedValue);
case 'greaterThanOrEqual':
return Number(actualValue) >= Number(expectedValue);
case 'lessThanOrEqual':
return Number(actualValue) <= Number(expectedValue);
case 'in':
return Array.isArray(expectedValue) && expectedValue.includes(actualValue);
case 'notIn':
return Array.isArray(expectedValue) && !expectedValue.includes(actualValue);
case 'matches':
try {
const regex = new RegExp(expectedValue);
return regex.test(String(actualValue));
} catch {
return false;
}
case 'before':
return new Date(actualValue) < new Date(expectedValue);
case 'after':
return new Date(actualValue) > new Date(expectedValue);
default:
console.warn(`Unknown operator: ${rule.operator}`);
return false;
}
}
private getNestedValue(obj: any, path: string): any {
return path.split('.').reduce((current, key) => current?.[key], obj);
}
// Combine multiple segments with AND/OR
evaluateSegments(
segments: TargetingSegment[],
context: UserContext,
logic: 'AND' | 'OR' = 'OR'
): boolean {
const results = segments.map(segment => this.evaluateSegment(segment, context));
return logic === 'AND'
? results.every(r => r)
: results.some(r => r);
}
}
// Usage: Create sophisticated targeting rules
const engine = new TargetingEngine();
const premiumUsersSegment: TargetingSegment = {
name: 'Premium Users',
logic: 'AND',
rules: [
{ attribute: 'tier', operator: 'in', value: ['professional', 'business'] },
{ attribute: 'totalBookings', operator: 'greaterThan', value: 10 },
],
};
const betaTestersSegment: TargetingSegment = {
name: 'Beta Testers',
logic: 'OR',
rules: [
{ attribute: 'email', operator: 'contains', value: '@makeaihq.com' },
{ attribute: 'organizationId', operator: 'in', value: ['org_beta_1', 'org_beta_2'] },
],
};
const userContext: UserContext = {
userId: 'user_123',
email: 'john@acmefitness.com',
tier: 'professional',
totalBookings: 25,
location: { country: 'US', region: 'CA' },
};
// Check if user is premium user
const isPremium = engine.evaluateSegment(premiumUsersSegment, userContext); // true
// Check if user matches ANY segment
const shouldSeeFeature = engine.evaluateSegments(
[premiumUsersSegment, betaTestersSegment],
userContext,
'OR'
); // true
A/B Testing and Experiment Management
Feature flags enable rigorous A/B testing for ChatGPT apps, allowing you to measure the impact of widget design changes, tool prompt variations, or conversation flow modifications on key metrics like booking completion rate, user engagement, or revenue per conversation.
// mcp-server/src/feature-flags/experiment-manager.ts
import { RedisFlagManager, FlagEvaluationContext } from './redis-flag-manager';
export interface ExperimentConfig {
key: string;
name: string;
hypothesis: string;
variants: Array<{
id: string;
name: string;
weight: number; // Percentage (0-100)
value: any;
}>;
targeting?: {
segments?: string[];
percentage?: number;
};
metrics: Array<{
id: string;
name: string;
type: 'conversion' | 'numeric' | 'duration';
}>;
startDate: Date;
endDate?: Date;
status: 'draft' | 'running' | 'paused' | 'completed';
}
export interface ExperimentAssignment {
userId: string;
experimentKey: string;
variantId: string;
assignedAt: Date;
}
export class ExperimentManager {
constructor(
private flagManager: RedisFlagManager,
private redis: Redis
) {}
async createExperiment(config: ExperimentConfig): Promise<void> {
// Validate weights sum to 100
const totalWeight = config.variants.reduce((sum, v) => sum + v.weight, 0);
if (Math.abs(totalWeight - 100) > 0.01) {
throw new Error('Variant weights must sum to 100');
}
// Store experiment config
await this.redis.set(
`experiment:${config.key}`,
JSON.stringify(config)
);
// Create feature flag for experiment
await this.flagManager.setFlag({
key: config.key,
enabled: config.status === 'running',
description: config.name,
targeting: config.targeting,
});
}
async assignVariant(
experimentKey: string,
context: FlagEvaluationContext
): Promise<string | null> {
// Check for existing assignment (consistency)
const existingAssignment = await this.redis.get(
`assignment:${context.userId}:${experimentKey}`
);
if (existingAssignment) {
return existingAssignment;
}
// Load experiment config
const configData = await this.redis.get(`experiment:${experimentKey}`);
if (!configData) {
return null;
}
const config: ExperimentConfig = JSON.parse(configData);
// Check if experiment is active
if (config.status !== 'running') {
return null;
}
// Check date range
const now = new Date();
if (now < config.startDate || (config.endDate && now > config.endDate)) {
return null;
}
// Evaluate targeting (if user is eligible)
const isEligible = await this.flagManager.evaluate(
experimentKey,
context,
false
);
if (!isEligible) {
return null;
}
// Assign variant based on weighted distribution
const variantId = this.selectVariant(config.variants, context.userId, experimentKey);
// Store assignment
const assignment: ExperimentAssignment = {
userId: context.userId,
experimentKey,
variantId,
assignedAt: now,
};
await this.redis.set(
`assignment:${context.userId}:${experimentKey}`,
variantId,
'EX',
60 * 60 * 24 * 90 // 90-day expiry
);
// Track assignment event
await this.trackEvent({
userId: context.userId,
experimentKey,
variantId,
eventType: 'assigned',
timestamp: now,
});
return variantId;
}
private selectVariant(
variants: ExperimentConfig['variants'],
userId: string,
salt: string
): string {
// Consistent hash-based assignment
const hash = this.hashUserId(userId, salt);
const percentage = hash % 100;
let cumulative = 0;
for (const variant of variants) {
cumulative += variant.weight;
if (percentage < cumulative) {
return variant.id;
}
}
// Fallback to first variant (should never reach here if weights sum to 100)
return variants[0].id;
}
private hashUserId(userId: string, salt: string): number {
let hash = 0;
const combined = `${userId}:${salt}`;
for (let i = 0; i < combined.length; i++) {
const char = combined.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash;
}
return Math.abs(hash);
}
async trackEvent(event: {
userId: string;
experimentKey: string;
variantId: string;
eventType: string;
value?: number;
timestamp: Date;
}): Promise<void> {
// Store event for analytics
await this.redis.rpush(
`events:${event.experimentKey}:${event.variantId}`,
JSON.stringify(event)
);
// Increment metric counters
const key = `metrics:${event.experimentKey}:${event.variantId}:${event.eventType}`;
await this.redis.incr(key);
if (event.value !== undefined) {
await this.redis.incrbyfloat(`${key}:sum`, event.value);
}
}
async getExperimentResults(experimentKey: string): Promise<{
variants: Array<{
id: string;
assignments: number;
metrics: Record<string, { count: number; sum?: number }>;
}>;
}> {
const configData = await this.redis.get(`experiment:${experimentKey}`);
if (!configData) {
throw new Error('Experiment not found');
}
const config: ExperimentConfig = JSON.parse(configData);
const results = await Promise.all(
config.variants.map(async (variant) => {
const assignments = await this.redis.scard(
`assignments:${experimentKey}:${variant.id}`
) || 0;
const metrics: Record<string, { count: number; sum?: number }> = {};
for (const metric of config.metrics) {
const key = `metrics:${experimentKey}:${variant.id}:${metric.id}`;
const count = await this.redis.get(key);
const sum = await this.redis.get(`${key}:sum`);
metrics[metric.id] = {
count: Number(count) || 0,
sum: sum ? Number(sum) : undefined,
};
}
return {
id: variant.id,
assignments,
metrics,
};
})
);
return { variants: results };
}
}
// Usage: Run A/B test on widget layout
const experimentManager = new ExperimentManager(flagManager, redis);
await experimentManager.createExperiment({
key: 'booking-widget-layout',
name: 'Booking Widget Layout Test',
hypothesis: 'Single-column layout increases booking completion by 15%',
variants: [
{ id: 'control', name: 'Two-Column Layout', weight: 50, value: 'two-column' },
{ id: 'treatment', name: 'Single-Column Layout', weight: 50, value: 'single-column' },
],
targeting: {
percentage: 100, // All eligible users
},
metrics: [
{ id: 'booking_completed', name: 'Booking Completed', type: 'conversion' },
{ id: 'time_to_complete', name: 'Time to Complete', type: 'duration' },
],
startDate: new Date('2026-01-20'),
status: 'running',
});
// In tool handler
const variantId = await experimentManager.assignVariant('booking-widget-layout', {
userId: 'user_123',
tier: 'professional',
});
// Track conversion
await experimentManager.trackEvent({
userId: 'user_123',
experimentKey: 'booking-widget-layout',
variantId: variantId!,
eventType: 'booking_completed',
timestamp: new Date(),
});
Best Practices for Feature Flag Management
Production feature flag systems require disciplined lifecycle management to prevent technical debt. Follow these best practices to maintain a clean, performant flag architecture.
Flag Lifecycle and Cleanup
Feature flags have finite lifespans. Release flags should be removed once features reach 100% rollout (typically 2-4 weeks post-launch). Experiment flags must be deleted after experiment conclusion and winning variant deployment. Operational flags (kill switches) can persist indefinitely but require regular testing. Set calendar reminders to review flags monthly, archive unused flags to separate storage, and track flag age with creation timestamps.
Documentation and Naming
Standardized naming conventions prevent confusion across teams. Use prefixes to indicate flag type: release_, experiment_, ops_, permission_. Include context in names: release_cancel_booking_tool not new_feature_123. Maintain a flag registry documenting purpose, owner, target removal date, and related tickets. Update documentation when modifying targeting rules or variants.
Performance Considerations
Feature flag evaluation adds latency to every request. Minimize impact with local caching (cache flags for 30-60 seconds per user), batch evaluation (fetch all flags in one call), and async updates (use pub/sub for instant changes without polling). For ChatGPT apps, server-side evaluation (1-2ms overhead) is preferable to client-side SDK loading (40KB+ bundle size). Monitor P95 latency to ensure flags don't degrade conversational experience.
Testing and Validation
Test feature flags like any critical infrastructure. Write unit tests validating targeting logic with known user contexts. Create integration tests verifying flag changes propagate to all service instances. Use canary deployments to roll out flag changes gradually (update 10% of servers, monitor errors, then full rollout). Implement override mechanisms for local development and QA environments (FEATURE_FLAGS_OVERRIDE=enable-all npm run dev).
Monitoring and Alerting
Track flag evaluation metrics in production: evaluation latency (P50/P95/P99), error rate, cache hit rate, and active flag count. Alert when flags cause elevated error rates (e.g., "ops_disable_payments" enabled for >5 minutes) or when flag evaluation latency exceeds 10ms. Use distributed tracing to correlate flag states with user sessions and debug issues.
Explore related topics: Real-Time Sync for ChatGPT Apps, Multi-Tenant Architecture Guide, ChatGPT App Testing Strategies, Monitoring ChatGPT Apps in Production, and A/B Testing for Conversational UX.
Conclusion
Feature flags transform ChatGPT app development from risky big-bang releases to safe, data-driven progressive delivery. By implementing robust flag evaluation systems with LaunchDarkly, custom Redis-backed managers, or hybrid approaches, you gain precise control over feature visibility, can run rigorous A/B tests on widget designs, and maintain emergency kill switches for production incidents. The targeting engines, experiment frameworks, and React hooks demonstrated in this guide provide production-ready foundations for sophisticated release management.
Ready to deploy ChatGPT apps with confidence? MakeAIHQ.com provides built-in feature flag support, A/B testing infrastructure, and progressive rollout tools integrated with our no-code ChatGPT app builder. Create your first flagged deployment in minutes—start your free trial today and ship features without fear.
External Resources:
- LaunchDarkly Documentation - Enterprise feature flag platform with SDKs for Node.js and JavaScript
- Feature Flag Best Practices - Martin Fowler's comprehensive guide to feature toggle patterns
- OpenFeature Specification - Vendor-agnostic feature flag standard for unified APIs