Cohort Analysis & Retention: User Retention & Churn Prediction

Understanding why users stay engaged with your ChatGPT app—or why they leave—is critical for long-term success. While session analytics show you what's happening right now, cohort analysis reveals patterns over time: which user groups exhibit the strongest retention, when churn occurs, and which behaviors predict long-term engagement.

In this guide, we'll explore how to implement production-ready cohort analysis for ChatGPT apps, including retention tracking, churn prediction models, power user identification, and onboarding optimization strategies. By the end, you'll have the tools to improve your app's Day 30 retention by 50% or more.

What You'll Learn:

  • Build acquisition and behavioral cohorts for ChatGPT users
  • Calculate Day 1/7/30 retention with time-based analysis
  • Predict churn using engagement indicators and ML models
  • Identify power users through usage pattern scoring
  • Optimize onboarding experiences to reduce early churn
  • Monitor cohort performance with real-time dashboards

Whether you're running a fitness coaching app, restaurant booking assistant, or real estate search tool, cohort analysis transforms raw usage data into actionable retention strategies.

Understanding Cohort Analysis for ChatGPT Apps

Cohort analysis groups users who share common characteristics or experiences within a defined time period, then tracks their behavior over time. For ChatGPT apps, this reveals:

  • Retention patterns: Do users who join on weekends have higher retention than weekday signups?
  • Feature impact: Does using your "appointment booking" tool in the first session improve 7-day retention?
  • Seasonal effects: Are November cohorts more engaged than June cohorts?
  • Onboarding effectiveness: What percentage of users complete your onboarding flow, and how does that correlate with retention?

Acquisition Cohorts

Acquisition cohorts group users by their signup date. This is the most common cohort type and reveals time-based trends:

// Acquisition Cohort Definition
interface AcquisitionCohort {
  cohortId: string;          // "2026-W01" (week-based) or "2026-01" (month-based)
  startDate: Date;           // First day of cohort period
  endDate: Date;             // Last day of cohort period
  userCount: number;         // Total users who signed up
  retentionDay1: number;     // % active on Day 1
  retentionDay7: number;     // % active on Day 7
  retentionDay30: number;    // % active on Day 30
  avgSessionsPerUser: number;
  churnRate: number;         // % who haven't returned in 14+ days
}

Example: Your January 2026 cohort has 500 users. After 30 days:

  • Day 1 Retention: 75% (375 users returned the next day)
  • Day 7 Retention: 45% (225 users still active after a week)
  • Day 30 Retention: 28% (140 users remain engaged after a month)

If your February cohort shows 35% Day 30 retention, you've improved by 25%—likely due to a product change or marketing campaign shift.

Behavioral Cohorts

Behavioral cohorts group users by actions they took, not when they signed up. This reveals which behaviors correlate with retention:

// Behavioral Cohort Examples
type BehaviorCohort =
  | { type: 'feature_used'; feature: string }           // Used "book appointment" tool
  | { type: 'engagement_level'; tier: 'low' | 'medium' | 'high' }
  | { type: 'onboarding_completed'; completed: boolean }
  | { type: 'referral_source'; source: string }         // Came from Instagram vs. Google
  | { type: 'conversation_depth'; turns: number };      // Had 10+ turn conversations

interface BehavioralCohortMetrics {
  cohortName: string;
  userCount: number;
  avgLifetimeValue: number;    // Average sessions over 90 days
  retentionCurve: number[];    // Retention at Days 1, 7, 14, 30, 60, 90
  churnProbability: number;    // ML-predicted churn likelihood
  powerUserPercentage: number; // % who become power users (20+ sessions)
}

Example: Users who complete your onboarding wizard have 62% Day 30 retention, compared to 18% for those who skip it. This insight drives your decision to make onboarding mandatory.

Building a Production-Ready Cohort System

Here's a complete TypeScript implementation for cohort tracking in ChatGPT apps:

// cohort-builder.ts - Cohort Analysis Engine (120 lines)
import { Firestore, Timestamp } from 'firebase-admin/firestore';

interface UserEvent {
  userId: string;
  timestamp: Timestamp;
  eventType: 'session_start' | 'tool_call' | 'conversation_turn' | 'feature_used';
  metadata?: Record<string, any>;
}

interface CohortUser {
  userId: string;
  signupDate: Date;
  lastActiveDate: Date;
  totalSessions: number;
  totalToolCalls: number;
  onboardingCompleted: boolean;
  referralSource?: string;
}

class CohortBuilder {
  constructor(private db: Firestore) {}

  /**
   * Create weekly acquisition cohorts
   */
  async buildAcquisitionCohorts(
    startDate: Date,
    endDate: Date,
    granularity: 'week' | 'month' = 'week'
  ): Promise<AcquisitionCohort[]> {
    const cohorts: AcquisitionCohort[] = [];
    const users = await this.getUsersInDateRange(startDate, endDate);

    // Group users by cohort period
    const cohortMap = new Map<string, CohortUser[]>();

    for (const user of users) {
      const cohortId = this.getCohortId(user.signupDate, granularity);
      if (!cohortMap.has(cohortId)) {
        cohortMap.set(cohortId, []);
      }
      cohortMap.get(cohortId)!.push(user);
    }

    // Calculate metrics for each cohort
    for (const [cohortId, cohortUsers] of cohortMap) {
      const cohort = await this.calculateCohortMetrics(cohortId, cohortUsers);
      cohorts.push(cohort);
    }

    return cohorts.sort((a, b) => a.startDate.getTime() - b.startDate.getTime());
  }

  /**
   * Calculate retention metrics for a cohort
   */
  private async calculateCohortMetrics(
    cohortId: string,
    users: CohortUser[]
  ): Promise<AcquisitionCohort> {
    const startDate = this.parseCohor tId(cohortId);
    const endDate = new Date(startDate);
    endDate.setDate(endDate.getDate() + 7); // Weekly cohort

    const userIds = users.map(u => u.userId);

    // Fetch activity data for retention calculation
    const activities = await this.db.collection('user_events')
      .where('userId', 'in', userIds)
      .where('eventType', '==', 'session_start')
      .get();

    const activityMap = new Map<string, Date[]>();
    activities.forEach(doc => {
      const event = doc.data() as UserEvent;
      if (!activityMap.has(event.userId)) {
        activityMap.set(event.userId, []);
      }
      activityMap.get(event.userId)!.push(event.timestamp.toDate());
    });

    // Calculate Day 1, 7, 30 retention
    const retention = this.calculateRetentionRates(users, activityMap);
    const avgSessions = this.calculateAverageSessions(activityMap);
    const churnRate = this.calculateChurnRate(users, activityMap);

    return {
      cohortId,
      startDate,
      endDate,
      userCount: users.length,
      retentionDay1: retention.day1,
      retentionDay7: retention.day7,
      retentionDay30: retention.day30,
      avgSessionsPerUser: avgSessions,
      churnRate
    };
  }

  /**
   * Calculate retention rates for specific time windows
   */
  private calculateRetentionRates(
    users: CohortUser[],
    activityMap: Map<string, Date[]>
  ): { day1: number; day7: number; day30: number } {
    const now = new Date();
    let day1Active = 0, day7Active = 0, day30Active = 0;

    for (const user of users) {
      const activities = activityMap.get(user.userId) || [];
      const signupDate = user.signupDate;

      // Day 1: Active within 24-48 hours of signup
      const day1Window = activities.filter(date => {
        const hoursSinceSignup = (date.getTime() - signupDate.getTime()) / (1000 * 60 * 60);
        return hoursSinceSignup >= 24 && hoursSinceSignup < 48;
      });
      if (day1Window.length > 0) day1Active++;

      // Day 7: Active within 7-8 days of signup
      const day7Window = activities.filter(date => {
        const daysSinceSignup = (date.getTime() - signupDate.getTime()) / (1000 * 60 * 60 * 24);
        return daysSinceSignup >= 7 && daysSinceSignup < 8;
      });
      if (day7Window.length > 0) day7Active++;

      // Day 30: Active within 30-31 days of signup
      const day30Window = activities.filter(date => {
        const daysSinceSignup = (date.getTime() - signupDate.getTime()) / (1000 * 60 * 60 * 24);
        return daysSinceSignup >= 30 && daysSinceSignup < 31;
      });
      if (day30Window.length > 0) day30Active++;
    }

    return {
      day1: (day1Active / users.length) * 100,
      day7: (day7Active / users.length) * 100,
      day30: (day30Active / users.length) * 100
    };
  }

  private getCohortId(date: Date, granularity: 'week' | 'month'): string {
    if (granularity === 'month') {
      return `${date.getFullYear()}-${String(date.getMonth() + 1).padStart(2, '0')}`;
    }
    // ISO week number
    const weekNum = this.getWeekNumber(date);
    return `${date.getFullYear()}-W${String(weekNum).padStart(2, '0')}`;
  }

  private getWeekNumber(date: Date): number {
    const d = new Date(Date.UTC(date.getFullYear(), date.getMonth(), date.getDate()));
    const dayNum = d.getUTCDay() || 7;
    d.setUTCDate(d.getUTCDate() + 4 - dayNum);
    const yearStart = new Date(Date.UTC(d.getUTCFullYear(), 0, 1));
    return Math.ceil((((d.getTime() - yearStart.getTime()) / 86400000) + 1) / 7);
  }

  private async getUsersInDateRange(start: Date, end: Date): Promise<CohortUser[]> {
    const snapshot = await this.db.collection('users')
      .where('createdAt', '>=', Timestamp.fromDate(start))
      .where('createdAt', '<=', Timestamp.fromDate(end))
      .get();

    return snapshot.docs.map(doc => {
      const data = doc.data();
      return {
        userId: doc.id,
        signupDate: data.createdAt.toDate(),
        lastActiveDate: data.lastActiveDate?.toDate() || data.createdAt.toDate(),
        totalSessions: data.totalSessions || 0,
        totalToolCalls: data.totalToolCalls || 0,
        onboardingCompleted: data.onboardingCompleted || false,
        referralSource: data.referralSource
      };
    });
  }

  private calculateAverageSessions(activityMap: Map<string, Date[]>): number {
    const totalSessions = Array.from(activityMap.values())
      .reduce((sum, sessions) => sum + sessions.length, 0);
    return totalSessions / activityMap.size;
  }

  private calculateChurnRate(
    users: CohortUser[],
    activityMap: Map<string, Date[]>
  ): number {
    const now = new Date();
    const churnThresholdDays = 14;
    let churnedUsers = 0;

    for (const user of users) {
      const activities = activityMap.get(user.userId) || [];
      if (activities.length === 0) {
        churnedUsers++;
        continue;
      }

      const lastActivity = new Date(Math.max(...activities.map(d => d.getTime())));
      const daysSinceLastActivity = (now.getTime() - lastActivity.getTime()) / (1000 * 60 * 60 * 24);

      if (daysSinceLastActivity > churnThresholdDays) {
        churnedUsers++;
      }
    }

    return (churnedUsers / users.length) * 100;
  }
}

export { CohortBuilder, AcquisitionCohort, CohortUser };

Retention Analysis: Tracking Long-Term Engagement

Retention curves visualize how user engagement decays over time. Here's how to calculate and monitor retention:

// retention-calculator.ts - Retention Curve Analysis (130 lines)
interface RetentionCurve {
  cohortId: string;
  dataPoints: RetentionDataPoint[];
  halfLifeDays: number;        // Days until 50% retention
  plateauRetention: number;    // Long-term stable retention %
  churnVelocity: number;       // Rate of retention decline
}

interface RetentionDataPoint {
  daysSinceSignup: number;
  activeUsers: number;
  retentionRate: number;       // % of original cohort still active
  weekOverWeekChange: number;  // % change from previous week
}

class RetentionCalculator {
  constructor(private db: Firestore) {}

  /**
   * Generate retention curve for a cohort over 90 days
   */
  async calculateRetentionCurve(
    cohortId: string,
    maxDays: number = 90
  ): Promise<RetentionCurve> {
    const cohortUsers = await this.getCohortUsers(cohortId);
    const userIds = cohortUsers.map(u => u.userId);
    const signupDate = cohortUsers[0].signupDate; // Cohorts share signup date

    const dataPoints: RetentionDataPoint[] = [];
    const checkDays = [1, 3, 7, 14, 21, 30, 45, 60, 75, 90].filter(d => d <= maxDays);

    for (const day of checkDays) {
      const activeUsers = await this.getActiveUsersOnDay(userIds, signupDate, day);
      const retentionRate = (activeUsers / cohortUsers.length) * 100;

      // Calculate week-over-week change
      const previousDataPoint = dataPoints.find(dp => dp.daysSinceSignup === day - 7);
      const weekOverWeekChange = previousDataPoint
        ? ((retentionRate - previousDataPoint.retentionRate) / previousDataPoint.retentionRate) * 100
        : 0;

      dataPoints.push({
        daysSinceSignup: day,
        activeUsers,
        retentionRate,
        weekOverWeekChange
      });
    }

    const halfLifeDays = this.calculateHalfLife(dataPoints);
    const plateauRetention = this.calculatePlateauRetention(dataPoints);
    const churnVelocity = this.calculateChurnVelocity(dataPoints);

    return {
      cohortId,
      dataPoints,
      halfLifeDays,
      plateauRetention,
      churnVelocity
    };
  }

  /**
   * Get active users on a specific day relative to signup
   */
  private async getActiveUsersOnDay(
    userIds: string[],
    signupDate: Date,
    dayOffset: number
  ): Promise<number> {
    const targetDate = new Date(signupDate);
    targetDate.setDate(targetDate.getDate() + dayOffset);

    const startOfDay = new Date(targetDate);
    startOfDay.setHours(0, 0, 0, 0);

    const endOfDay = new Date(targetDate);
    endOfDay.setHours(23, 59, 59, 999);

    // Query activity on that specific day
    const snapshot = await this.db.collection('user_events')
      .where('userId', 'in', userIds)
      .where('timestamp', '>=', Timestamp.fromDate(startOfDay))
      .where('timestamp', '<=', Timestamp.fromDate(endOfDay))
      .where('eventType', '==', 'session_start')
      .get();

    const activeUserIds = new Set(snapshot.docs.map(doc => doc.data().userId));
    return activeUserIds.size;
  }

  /**
   * Calculate half-life: days until 50% retention
   */
  private calculateHalfLife(dataPoints: RetentionDataPoint[]): number {
    // Find first data point where retention drops below 50%
    const halfLifePoint = dataPoints.find(dp => dp.retentionRate < 50);
    if (!halfLifePoint) return dataPoints[dataPoints.length - 1].daysSinceSignup;

    // Interpolate between data points for more accuracy
    const previousPoint = dataPoints[dataPoints.indexOf(halfLifePoint) - 1];
    if (!previousPoint) return halfLifePoint.daysSinceSignup;

    const slope = (halfLifePoint.retentionRate - previousPoint.retentionRate) /
                  (halfLifePoint.daysSinceSignup - previousPoint.daysSinceSignup);
    const daysToHalfLife = (50 - previousPoint.retentionRate) / slope;

    return previousPoint.daysSinceSignup + daysToHalfLife;
  }

  /**
   * Calculate plateau retention: stable long-term retention rate
   */
  private calculatePlateauRetention(dataPoints: RetentionDataPoint[]): number {
    // Plateau is when week-over-week change stabilizes (< 5% change)
    const stablePoints = dataPoints.filter(
      dp => Math.abs(dp.weekOverWeekChange) < 5 && dp.daysSinceSignup > 30
    );

    if (stablePoints.length === 0) {
      return dataPoints[dataPoints.length - 1].retentionRate;
    }

    const avgRetention = stablePoints.reduce((sum, dp) => sum + dp.retentionRate, 0) / stablePoints.length;
    return avgRetention;
  }

  /**
   * Calculate churn velocity: rate of retention decline
   */
  private calculateChurnVelocity(dataPoints: RetentionDataPoint[]): number {
    if (dataPoints.length < 2) return 0;

    // Linear regression on retention over time
    const n = dataPoints.length;
    const sumX = dataPoints.reduce((sum, dp) => sum + dp.daysSinceSignup, 0);
    const sumY = dataPoints.reduce((sum, dp) => sum + dp.retentionRate, 0);
    const sumXY = dataPoints.reduce((sum, dp) => sum + (dp.daysSinceSignup * dp.retentionRate), 0);
    const sumX2 = dataPoints.reduce((sum, dp) => sum + (dp.daysSinceSignup ** 2), 0);

    const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX ** 2);
    return Math.abs(slope); // Churn velocity is absolute rate of decline
  }

  private async getCohortUsers(cohortId: string): Promise<CohortUser[]> {
    // Parse cohort ID to get date range
    const [year, period] = cohortId.split('-');
    const isWeekly = period.startsWith('W');

    // Simplified for example - in production, use CohortBuilder.getUsersInDateRange
    const snapshot = await this.db.collection('users')
      .where('cohortId', '==', cohortId)
      .get();

    return snapshot.docs.map(doc => {
      const data = doc.data();
      return {
        userId: doc.id,
        signupDate: data.createdAt.toDate(),
        lastActiveDate: data.lastActiveDate?.toDate(),
        totalSessions: data.totalSessions || 0,
        totalToolCalls: data.totalToolCalls || 0,
        onboardingCompleted: data.onboardingCompleted || false,
        referralSource: data.referralSource
      };
    });
  }
}

export { RetentionCalculator, RetentionCurve, RetentionDataPoint };

Retention Curve Insights:

  • Half-life: If your half-life is 14 days, you lose 50% of users within 2 weeks—focus on improving Week 1 engagement.
  • Plateau retention: A 25% plateau means 1 in 4 users become long-term engaged—optimize onboarding to increase this.
  • Churn velocity: A velocity of 2% per day means you're losing 2% of remaining users daily—this compounds quickly.

Churn Prediction: Identifying At-Risk Users

Predicting churn before it happens allows you to intervene with re-engagement campaigns. Here's a Python-based ML churn predictor:

# churn_predictor.py - ML-Based Churn Prediction (110 lines)
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
from datetime import datetime, timedelta
import firebase_admin
from firebase_admin import firestore

class ChurnPredictor:
    def __init__(self):
        self.db = firestore.client()
        self.model = RandomForestClassifier(
            n_estimators=100,
            max_depth=10,
            random_state=42
        )
        self.feature_columns = [
            'days_since_signup',
            'total_sessions',
            'sessions_last_7days',
            'sessions_last_14days',
            'avg_session_duration',
            'total_tool_calls',
            'tools_last_7days',
            'unique_tools_used',
            'onboarding_completed',
            'days_since_last_activity',
            'avg_days_between_sessions',
            'conversation_depth_avg',
            'weekend_session_ratio',
            'mobile_session_ratio'
        ]

    def prepare_training_data(self, lookback_days=90):
        """
        Prepare training dataset with churn labels
        """
        users_ref = self.db.collection('users')
        users = list(users_ref.stream())

        training_data = []
        now = datetime.now()
        churn_threshold = timedelta(days=14)  # 14 days inactive = churned

        for user_doc in users:
            user = user_doc.to_dict()
            user_id = user_doc.id

            # Skip users who signed up too recently
            signup_date = user['createdAt'].replace(tzinfo=None)
            if (now - signup_date).days < 30:
                continue

            # Fetch user events
            events = self.get_user_events(user_id)

            # Calculate features
            features = self.extract_features(user, events, now)

            # Label: churned if last activity > 14 days ago
            last_activity = user.get('lastActiveDate', user['createdAt']).replace(tzinfo=None)
            is_churned = (now - last_activity) > churn_threshold

            training_data.append({
                **features,
                'churned': 1 if is_churned else 0,
                'user_id': user_id
            })

        return pd.DataFrame(training_data)

    def extract_features(self, user, events, reference_date):
        """
        Extract ML features from user data and events
        """
        signup_date = user['createdAt'].replace(tzinfo=None)
        last_activity = user.get('lastActiveDate', signup_date).replace(tzinfo=None)

        # Time-based features
        days_since_signup = (reference_date - signup_date).days
        days_since_last_activity = (reference_date - last_activity).days

        # Session features
        session_events = [e for e in events if e['eventType'] == 'session_start']
        total_sessions = len(session_events)

        sessions_last_7days = len([
            e for e in session_events
            if (reference_date - e['timestamp'].replace(tzinfo=None)).days <= 7
        ])

        sessions_last_14days = len([
            e for e in session_events
            if (reference_date - e['timestamp'].replace(tzinfo=None)).days <= 14
        ])

        # Tool usage features
        tool_events = [e for e in events if e['eventType'] == 'tool_call']
        total_tool_calls = len(tool_events)

        tools_last_7days = len([
            e for e in tool_events
            if (reference_date - e['timestamp'].replace(tzinfo=None)).days <= 7
        ])

        unique_tools = len(set([e.get('metadata', {}).get('toolName') for e in tool_events]))

        # Engagement depth
        conversation_events = [e for e in events if e['eventType'] == 'conversation_turn']
        avg_conversation_depth = (
            len(conversation_events) / max(total_sessions, 1)
        )

        # Session frequency
        if len(session_events) > 1:
            session_dates = sorted([e['timestamp'].replace(tzinfo=None) for e in session_events])
            intervals = [(session_dates[i+1] - session_dates[i]).days
                        for i in range(len(session_dates)-1)]
            avg_days_between = sum(intervals) / len(intervals) if intervals else 0
        else:
            avg_days_between = 0

        # Session timing patterns
        weekend_sessions = len([
            e for e in session_events
            if e['timestamp'].replace(tzinfo=None).weekday() >= 5  # Sat/Sun
        ])
        weekend_ratio = weekend_sessions / max(total_sessions, 1)

        # Placeholder for additional features
        avg_session_duration = 15.0  # Would calculate from session end events
        mobile_session_ratio = 0.5   # Would extract from user agent

        return {
            'days_since_signup': days_since_signup,
            'total_sessions': total_sessions,
            'sessions_last_7days': sessions_last_7days,
            'sessions_last_14days': sessions_last_14days,
            'avg_session_duration': avg_session_duration,
            'total_tool_calls': total_tool_calls,
            'tools_last_7days': tools_last_7days,
            'unique_tools_used': unique_tools,
            'onboarding_completed': 1 if user.get('onboardingCompleted') else 0,
            'days_since_last_activity': days_since_last_activity,
            'avg_days_between_sessions': avg_days_between,
            'conversation_depth_avg': avg_conversation_depth,
            'weekend_session_ratio': weekend_ratio,
            'mobile_session_ratio': mobile_session_ratio
        }

    def train_model(self, df):
        """
        Train random forest churn prediction model
        """
        X = df[self.feature_columns]
        y = df['churned']

        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, stratify=y
        )

        self.model.fit(X_train, y_train)

        # Evaluate
        y_pred = self.model.predict(X_test)
        y_prob = self.model.predict_proba(X_test)[:, 1]

        print("Churn Prediction Model Performance:")
        print(classification_report(y_test, y_pred))
        print(f"ROC-AUC Score: {roc_auc_score(y_test, y_prob):.3f}")

        # Feature importance
        feature_importance = pd.DataFrame({
            'feature': self.feature_columns,
            'importance': self.model.feature_importances_
        }).sort_values('importance', ascending=False)

        print("\nTop 5 Churn Predictors:")
        print(feature_importance.head())

        return self.model

    def predict_churn_risk(self, user_id):
        """
        Predict churn probability for a single user
        """
        user_doc = self.db.collection('users').document(user_id).get()
        if not user_doc.exists:
            return None

        user = user_doc.to_dict()
        events = self.get_user_events(user_id)
        features = self.extract_features(user, events, datetime.now())

        X = pd.DataFrame([features])[self.feature_columns]
        churn_probability = self.model.predict_proba(X)[0, 1]

        return {
            'user_id': user_id,
            'churn_probability': float(churn_probability),
            'risk_level': self.classify_risk(churn_probability),
            'top_risk_factors': self.identify_risk_factors(features)
        }

    def classify_risk(self, probability):
        if probability < 0.3:
            return 'low'
        elif probability < 0.6:
            return 'medium'
        else:
            return 'high'

    def identify_risk_factors(self, features):
        # Compare to healthy user baselines
        risk_factors = []

        if features['days_since_last_activity'] > 7:
            risk_factors.append('inactive_7days')
        if features['sessions_last_7days'] == 0:
            risk_factors.append('no_recent_sessions')
        if features['onboarding_completed'] == 0:
            risk_factors.append('onboarding_incomplete')
        if features['avg_days_between_sessions'] > 10:
            risk_factors.append('low_frequency')

        return risk_factors

    def get_user_events(self, user_id):
        events_ref = self.db.collection('user_events').where('userId', '==', user_id)
        return [e.to_dict() for e in events_ref.stream()]

# Usage
predictor = ChurnPredictor()
df = predictor.prepare_training_data()
model = predictor.train_model(df)

# Predict for specific user
risk_assessment = predictor.predict_churn_risk('user_abc123')
print(risk_assessment)
# Output: {'user_id': 'user_abc123', 'churn_probability': 0.73, 'risk_level': 'high',
#          'top_risk_factors': ['inactive_7days', 'no_recent_sessions']}

Power User Identification

Power users drive 80% of engagement. Here's how to identify and nurture them:

// power-user-scorer.ts - Power User Identification (100 lines)
interface PowerUserScore {
  userId: string;
  score: number;              // 0-100 composite score
  tier: 'casual' | 'regular' | 'power' | 'champion';
  metrics: {
    sessionFrequency: number;    // Sessions per week
    toolDiversity: number;       // Unique tools used
    conversationDepth: number;   // Avg turns per conversation
    tenureDays: number;          // Days since signup
    referrals: number;           // Users referred
  };
  engagementTrend: 'increasing' | 'stable' | 'declining';
}

class PowerUserScorer {
  constructor(private db: Firestore) {}

  async calculatePowerUserScore(userId: string): Promise<PowerUserScore> {
    const user = await this.getUserData(userId);
    const events = await this.getUserEvents(userId);

    const metrics = {
      sessionFrequency: this.calculateSessionFrequency(events),
      toolDiversity: this.calculateToolDiversity(events),
      conversationDepth: this.calculateConversationDepth(events),
      tenureDays: this.calculateTenure(user.signupDate),
      referrals: user.referralCount || 0
    };

    // Weighted composite score
    const score = Math.min(100,
      (metrics.sessionFrequency * 0.3) +
      (metrics.toolDiversity * 0.2) +
      (metrics.conversationDepth * 0.2) +
      (Math.min(metrics.tenureDays / 90, 1) * 20) +  // Max 20 pts for tenure
      (Math.min(metrics.referrals * 5, 10))          // Max 10 pts for referrals
    );

    const tier = this.classifyUserTier(score);
    const engagementTrend = this.calculateEngagementTrend(events);

    return {
      userId,
      score: Math.round(score),
      tier,
      metrics,
      engagementTrend
    };
  }

  private calculateSessionFrequency(events: UserEvent[]): number {
    const sessions = events.filter(e => e.eventType === 'session_start');
    const last30Days = sessions.filter(e => {
      const daysSince = (Date.now() - e.timestamp.toMillis()) / (1000 * 60 * 60 * 24);
      return daysSince <= 30;
    });

    const sessionsPerWeek = (last30Days.length / 30) * 7;
    return Math.min(sessionsPerWeek * 10, 30); // Max 30 points (3+ sessions/week)
  }

  private calculateToolDiversity(events: UserEvent[]): number {
    const toolCalls = events.filter(e => e.eventType === 'tool_call');
    const uniqueTools = new Set(
      toolCalls.map(e => e.metadata?.toolName).filter(Boolean)
    );

    return Math.min(uniqueTools.size * 4, 20); // Max 20 points (5+ tools)
  }

  private calculateConversationDepth(events: UserEvent[]): number {
    const conversations = new Map<string, number>();

    events.filter(e => e.eventType === 'conversation_turn').forEach(e => {
      const sessionId = e.metadata?.sessionId || 'default';
      conversations.set(sessionId, (conversations.get(sessionId) || 0) + 1);
    });

    if (conversations.size === 0) return 0;

    const avgDepth = Array.from(conversations.values())
      .reduce((sum, depth) => sum + depth, 0) / conversations.size;

    return Math.min(avgDepth * 2, 20); // Max 20 points (10+ turns/conversation)
  }

  private calculateTenure(signupDate: Date): number {
    return (Date.now() - signupDate.getTime()) / (1000 * 60 * 60 * 24);
  }

  private classifyUserTier(score: number): PowerUserScore['tier'] {
    if (score >= 80) return 'champion';
    if (score >= 60) return 'power';
    if (score >= 40) return 'regular';
    return 'casual';
  }

  private calculateEngagementTrend(events: UserEvent[]): PowerUserScore['engagementTrend'] {
    const sessions = events.filter(e => e.eventType === 'session_start');
    const now = Date.now();

    const recentSessions = sessions.filter(e => {
      const daysSince = (now - e.timestamp.toMillis()) / (1000 * 60 * 60 * 24);
      return daysSince <= 14;
    }).length;

    const olderSessions = sessions.filter(e => {
      const daysSince = (now - e.timestamp.toMillis()) / (1000 * 60 * 60 * 24);
      return daysSince > 14 && daysSince <= 28;
    }).length;

    if (recentSessions > olderSessions * 1.2) return 'increasing';
    if (recentSessions < olderSessions * 0.8) return 'declining';
    return 'stable';
  }

  private async getUserData(userId: string) {
    const doc = await this.db.collection('users').doc(userId).get();
    const data = doc.data()!;
    return {
      signupDate: data.createdAt.toDate(),
      referralCount: data.referralCount || 0
    };
  }

  private async getUserEvents(userId: string): Promise<UserEvent[]> {
    const snapshot = await this.db.collection('user_events')
      .where('userId', '==', userId)
      .get();

    return snapshot.docs.map(doc => doc.data() as UserEvent);
  }
}

export { PowerUserScorer, PowerUserScore };

Power User Strategy:

  • Champions (80-100): Invite to beta features, request testimonials, offer affiliate program
  • Power Users (60-79): Send advanced tips, exclusive content, early access
  • Regular Users (40-59): Educational content, feature discovery nudges
  • Casual Users (0-39): Re-engagement campaigns, onboarding reminders

Onboarding Optimization: Reducing Early Churn

The first 7 days are critical. Here's how to track and optimize onboarding:

// onboarding-tracker.ts - Onboarding Funnel Optimization (80 lines)
interface OnboardingFunnel {
  step: string;
  usersEntered: number;
  usersCompleted: number;
  completionRate: number;
  avgTimeToComplete: number;  // Minutes
  dropoffRate: number;
}

class OnboardingTracker {
  private readonly ONBOARDING_STEPS = [
    'account_created',
    'profile_completed',
    'first_conversation',
    'first_tool_used',
    'onboarding_wizard_completed'
  ];

  constructor(private db: Firestore) {}

  async analyzeOnboardingFunnel(cohortId: string): Promise<OnboardingFunnel[]> {
    const cohortUsers = await this.getCohortUsers(cohortId);
    const funnel: OnboardingFunnel[] = [];

    for (let i = 0; i < this.ONBOARDING_STEPS.length; i++) {
      const step = this.ONBOARDING_STEPS[i];
      const nextStep = this.ONBOARDING_STEPS[i + 1];

      const usersEntered = await this.getUsersWhoReachedStep(cohortUsers, step);
      const usersCompleted = nextStep
        ? await this.getUsersWhoReachedStep(cohortUsers, nextStep)
        : usersEntered; // Last step

      const completionRate = (usersCompleted / usersEntered) * 100;
      const dropoffRate = 100 - completionRate;
      const avgTimeToComplete = await this.getAvgTimeToComplete(cohortUsers, step);

      funnel.push({
        step,
        usersEntered,
        usersCompleted,
        completionRate,
        avgTimeToComplete,
        dropoffRate
      });
    }

    return funnel;
  }

  private async getUsersWhoReachedStep(
    cohortUsers: CohortUser[],
    step: string
  ): Promise<number> {
    const userIds = cohortUsers.map(u => u.userId);
    const snapshot = await this.db.collection('user_events')
      .where('userId', 'in', userIds)
      .where('eventType', '==', step)
      .get();

    return new Set(snapshot.docs.map(d => d.data().userId)).size;
  }

  private async getAvgTimeToComplete(
    cohortUsers: CohortUser[],
    step: string
  ): Promise<number> {
    const userIds = cohortUsers.map(u => u.userId);
    const signupTimes = new Map(
      cohortUsers.map(u => [u.userId, u.signupDate.getTime()])
    );

    const snapshot = await this.db.collection('user_events')
      .where('userId', 'in', userIds)
      .where('eventType', '==', step)
      .get();

    const completionTimes = snapshot.docs.map(doc => {
      const userId = doc.data().userId;
      const stepTime = doc.data().timestamp.toMillis();
      const signupTime = signupTimes.get(userId)!;
      return (stepTime - signupTime) / (1000 * 60); // Minutes
    });

    return completionTimes.reduce((sum, t) => sum + t, 0) / completionTimes.length;
  }

  private async getCohortUsers(cohortId: string): Promise<CohortUser[]> {
    const snapshot = await this.db.collection('users')
      .where('cohortId', '==', cohortId)
      .get();

    return snapshot.docs.map(doc => ({
      userId: doc.id,
      signupDate: doc.data().createdAt.toDate(),
      lastActiveDate: doc.data().lastActiveDate?.toDate(),
      totalSessions: doc.data().totalSessions || 0,
      totalToolCalls: doc.data().totalToolCalls || 0,
      onboardingCompleted: doc.data().onboardingCompleted || false
    }));
  }
}

export { OnboardingTracker, OnboardingFunnel };

Building a Cohort Dashboard

Here's a React component for visualizing cohort metrics:

// CohortDashboard.tsx - Real-Time Cohort Visualization (90 lines)
import React, { useEffect, useState } from 'react';
import { Line, Bar } from 'react-chartjs-2';
import { CohortBuilder, RetentionCalculator, PowerUserScorer } from './analytics';

interface CohortDashboardProps {
  firestore: Firestore;
}

const CohortDashboard: React.FC<CohortDashboardProps> = ({ firestore }) => {
  const [cohorts, setCohorts] = useState<AcquisitionCohort[]>([]);
  const [selectedCohort, setSelectedCohort] = useState<string | null>(null);
  const [retentionCurve, setRetentionCurve] = useState<RetentionCurve | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    loadCohortData();
  }, []);

  const loadCohortData = async () => {
    const builder = new CohortBuilder(firestore);
    const calculator = new RetentionCalculator(firestore);

    const startDate = new Date();
    startDate.setDate(startDate.getDate() - 90); // Last 90 days
    const endDate = new Date();

    const cohortData = await builder.buildAcquisitionCohorts(startDate, endDate, 'week');
    setCohorts(cohortData);

    if (cohortData.length > 0) {
      const latestCohort = cohortData[cohortData.length - 1];
      setSelectedCohort(latestCohort.cohortId);

      const curve = await calculator.calculateRetentionCurve(latestCohort.cohortId);
      setRetentionCurve(curve);
    }

    setLoading(false);
  };

  const retentionChartData = {
    labels: retentionCurve?.dataPoints.map(dp => `Day ${dp.daysSinceSignup}`) || [],
    datasets: [
      {
        label: 'Retention Rate (%)',
        data: retentionCurve?.dataPoints.map(dp => dp.retentionRate) || [],
        borderColor: '#D4AF37',
        backgroundColor: 'rgba(212, 175, 55, 0.1)',
        tension: 0.4,
        fill: true
      }
    ]
  };

  const cohortComparisonData = {
    labels: cohorts.map(c => c.cohortId),
    datasets: [
      {
        label: 'Day 1 Retention',
        data: cohorts.map(c => c.retentionDay1),
        backgroundColor: '#D4AF37'
      },
      {
        label: 'Day 7 Retention',
        data: cohorts.map(c => c.retentionDay7),
        backgroundColor: '#B8956A'
      },
      {
        label: 'Day 30 Retention',
        data: cohorts.map(c => c.retentionDay30),
        backgroundColor: '#8B7355'
      }
    ]
  };

  if (loading) {
    return <div>Loading cohort data...</div>;
  }

  return (
    <div className="cohort-dashboard">
      <h1>Cohort Analysis Dashboard</h1>

      <section className="retention-curve">
        <h2>Retention Curve - {selectedCohort}</h2>
        <div className="metrics-summary">
          <div className="metric">
            <span className="label">Half-Life:</span>
            <span className="value">{retentionCurve?.halfLifeDays.toFixed(1)} days</span>
          </div>
          <div className="metric">
            <span className="label">Plateau Retention:</span>
            <span className="value">{retentionCurve?.plateauRetention.toFixed(1)}%</span>
          </div>
          <div className="metric">
            <span className="label">Churn Velocity:</span>
            <span className="value">{retentionCurve?.churnVelocity.toFixed(2)}%/day</span>
          </div>
        </div>
        <Line data={retentionChartData} options={{ responsive: true }} />
      </section>

      <section className="cohort-comparison">
        <h2>Weekly Cohort Comparison</h2>
        <Bar data={cohortComparisonData} options={{ responsive: true }} />
      </section>

      <section className="cohort-table">
        <h2>Cohort Details</h2>
        <table>
          <thead>
            <tr>
              <th>Cohort</th>
              <th>Users</th>
              <th>Day 1</th>
              <th>Day 7</th>
              <th>Day 30</th>
              <th>Avg Sessions</th>
              <th>Churn Rate</th>
            </tr>
          </thead>
          <tbody>
            {cohorts.map(cohort => (
              <tr
                key={cohort.cohortId}
                onClick={() => setSelectedCohort(cohort.cohortId)}
                className={selectedCohort === cohort.cohortId ? 'selected' : ''}
              >
                <td>{cohort.cohortId}</td>
                <td>{cohort.userCount}</td>
                <td>{cohort.retentionDay1.toFixed(1)}%</td>
                <td>{cohort.retentionDay7.toFixed(1)}%</td>
                <td>{cohort.retentionDay30.toFixed(1)}%</td>
                <td>{cohort.avgSessionsPerUser.toFixed(1)}</td>
                <td>{cohort.churnRate.toFixed(1)}%</td>
              </tr>
            ))}
          </tbody>
        </table>
      </section>
    </div>
  );
};

export default CohortDashboard;

Real-Time Cohort Monitoring

Automate cohort alerts for proactive retention management:

// cohort-monitor.ts - Automated Cohort Alerts (110 lines)
interface CohortAlert {
  cohortId: string;
  alertType: 'retention_drop' | 'churn_spike' | 'onboarding_failure' | 'power_user_decline';
  severity: 'low' | 'medium' | 'high' | 'critical';
  message: string;
  metrics: Record<string, number>;
  recommendedActions: string[];
}

class CohortMonitor {
  constructor(private db: Firestore) {}

  async monitorCohorts(): Promise<CohortAlert[]> {
    const alerts: CohortAlert[] = [];
    const builder = new CohortBuilder(this.db);

    const startDate = new Date();
    startDate.setDate(startDate.getDate() - 30);
    const endDate = new Date();

    const cohorts = await builder.buildAcquisitionCohorts(startDate, endDate, 'week');

    for (const cohort of cohorts) {
      // Check retention drop
      if (cohort.retentionDay7 < 30) {
        alerts.push({
          cohortId: cohort.cohortId,
          alertType: 'retention_drop',
          severity: cohort.retentionDay7 < 20 ? 'critical' : 'high',
          message: `Day 7 retention dropped to ${cohort.retentionDay7.toFixed(1)}%`,
          metrics: {
            retentionDay7: cohort.retentionDay7,
            target: 40
          },
          recommendedActions: [
            'Review onboarding flow for friction points',
            'Send re-engagement emails to inactive users',
            'Analyze feature usage patterns for engaged vs. churned users'
          ]
        });
      }

      // Check churn spike
      if (cohort.churnRate > 50) {
        alerts.push({
          cohortId: cohort.cohortId,
          alertType: 'churn_spike',
          severity: 'high',
          message: `Churn rate spiked to ${cohort.churnRate.toFixed(1)}%`,
          metrics: {
            churnRate: cohort.churnRate,
            baseline: 35
          },
          recommendedActions: [
            'Conduct user surveys to identify pain points',
            'Review recent product changes for negative impact',
            'Launch win-back campaign targeting churned users'
          ]
        });
      }

      // Check onboarding completion
      const onboardingRate = await this.getOnboardingCompletionRate(cohort.cohortId);
      if (onboardingRate < 50) {
        alerts.push({
          cohortId: cohort.cohortId,
          alertType: 'onboarding_failure',
          severity: 'medium',
          message: `Only ${onboardingRate.toFixed(1)}% completed onboarding`,
          metrics: {
            onboardingRate,
            target: 70
          },
          recommendedActions: [
            'Simplify onboarding wizard (reduce steps)',
            'Add progress indicators and motivational messaging',
            'Implement onboarding abandonment recovery emails'
          ]
        });
      }
    }

    return alerts.sort((a, b) => {
      const severityOrder = { critical: 0, high: 1, medium: 2, low: 3 };
      return severityOrder[a.severity] - severityOrder[b.severity];
    });
  }

  private async getOnboardingCompletionRate(cohortId: string): Promise<number> {
    const usersSnapshot = await this.db.collection('users')
      .where('cohortId', '==', cohortId)
      .get();

    const totalUsers = usersSnapshot.size;
    const completedUsers = usersSnapshot.docs.filter(
      doc => doc.data().onboardingCompleted === true
    ).length;

    return (completedUsers / totalUsers) * 100;
  }

  async sendAlertNotifications(alerts: CohortAlert[]) {
    // Send critical alerts to Slack/email
    const criticalAlerts = alerts.filter(a => a.severity === 'critical' || a.severity === 'high');

    for (const alert of criticalAlerts) {
      await this.sendSlackNotification(alert);
      await this.logAlertToFirestore(alert);
    }
  }

  private async sendSlackNotification(alert: CohortAlert) {
    // Placeholder - implement Slack webhook integration
    console.log(`[ALERT] ${alert.severity.toUpperCase()}: ${alert.message}`);
  }

  private async logAlertToFirestore(alert: CohortAlert) {
    await this.db.collection('cohort_alerts').add({
      ...alert,
      createdAt: Timestamp.now()
    });
  }
}

// Run monitoring every 24 hours
export async function scheduleCohortMonitoring(firestore: Firestore) {
  const monitor = new CohortMonitor(firestore);

  setInterval(async () => {
    const alerts = await monitor.monitorCohorts();
    if (alerts.length > 0) {
      await monitor.sendAlertNotifications(alerts);
      console.log(`Generated ${alerts.length} cohort alerts`);
    }
  }, 24 * 60 * 60 * 1000); // 24 hours
}

export { CohortMonitor, CohortAlert };

Conclusion: Turn Insights into Action

Cohort analysis transforms raw ChatGPT usage data into a strategic retention roadmap. By implementing the systems above, you can:

  1. Track retention curves to identify when users churn (Day 7 vs. Day 30)
  2. Predict churn before it happens using ML models trained on engagement patterns
  3. Identify power users and nurture them into champions and advocates
  4. Optimize onboarding to reduce early drop-off and improve long-term retention
  5. Monitor cohorts in real-time with automated alerts for retention drops

Next Steps:

  • Deploy the CohortBuilder to generate weekly cohorts automatically
  • Train the ChurnPredictor model on your historical data
  • Set up the CohortMonitor with Slack alerts for critical retention issues
  • A/B test onboarding changes and measure impact on Day 7 retention

Ready to build retention-focused ChatGPT apps? Start your free trial on MakeAIHQ and get access to built-in cohort analytics, churn prediction dashboards, and automated retention monitoring—no coding required.


Internal Links

  • Analytics Dashboard Design - Visualize cohort data with production-ready dashboards
  • User Property Tracking - Enrich cohort analysis with custom user attributes
  • Retention Analysis Strategies - Advanced techniques for improving long-term engagement
  • Funnel Analysis & Conversion - Track user journeys and identify drop-off points
  • Churn Reduction Tactics - Proven strategies to win back at-risk users
  • Session Tracking Implementation - Capture accurate session data for cohort analysis
  • Event-Driven Analytics - Build real-time cohort updates with event streams
  • ChatGPT App Analytics Guide - Complete analytics implementation guide
  • MakeAIHQ Analytics Features - Explore built-in cohort analysis tools
  • ROI Calculator - Calculate the business impact of improved retention

External Links

Schema Markup

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Implement Cohort Analysis for ChatGPT Apps",
  "description": "Step-by-step guide to building cohort analysis systems for user retention and churn prediction in ChatGPT applications",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Build Cohort Tracking System",
      "text": "Implement CohortBuilder to group users by signup date and track retention metrics over time",
      "itemListElement": [
        {
          "@type": "HowToDirection",
          "text": "Create acquisition cohorts grouped by week or month"
        },
        {
          "@type": "HowToDirection",
          "text": "Calculate Day 1, 7, and 30 retention rates"
        },
        {
          "@type": "HowToDirection",
          "text": "Track average sessions per user and churn rate"
        }
      ]
    },
    {
      "@type": "HowToStep",
      "name": "Analyze Retention Curves",
      "text": "Use RetentionCalculator to generate retention curves and identify engagement patterns",
      "itemListElement": [
        {
          "@type": "HowToDirection",
          "text": "Calculate half-life (days until 50% retention)"
        },
        {
          "@type": "HowToDirection",
          "text": "Identify plateau retention (stable long-term rate)"
        },
        {
          "@type": "HowToDirection",
          "text": "Measure churn velocity (rate of retention decline)"
        }
      ]
    },
    {
      "@type": "HowToStep",
      "name": "Predict Churn with ML",
      "text": "Train ChurnPredictor model to identify at-risk users before they churn",
      "itemListElement": [
        {
          "@type": "HowToDirection",
          "text": "Extract 14 behavioral features from user activity"
        },
        {
          "@type": "HowToDirection",
          "text": "Train Random Forest classifier on historical churn data"
        },
        {
          "@type": "HowToDirection",
          "text": "Generate churn probability scores and risk levels"
        }
      ]
    },
    {
      "@type": "HowToStep",
      "name": "Identify Power Users",
      "text": "Use PowerUserScorer to segment users by engagement level",
      "itemListElement": [
        {
          "@type": "HowToDirection",
          "text": "Calculate composite score from session frequency, tool diversity, and tenure"
        },
        {
          "@type": "HowToDirection",
          "text": "Classify users into casual, regular, power, and champion tiers"
        },
        {
          "@type": "HowToDirection",
          "text": "Track engagement trends (increasing, stable, declining)"
        }
      ]
    },
    {
      "@type": "HowToStep",
      "name": "Optimize Onboarding",
      "text": "Track onboarding funnel completion and reduce early churn",
      "itemListElement": [
        {
          "@type": "HowToDirection",
          "text": "Analyze drop-off rates at each onboarding step"
        },
        {
          "@type": "HowToDirection",
          "text": "Measure average time to complete onboarding"
        },
        {
          "@type": "HowToDirection",
          "text": "Correlate onboarding completion with Day 30 retention"
        }
      ]
    },
    {
      "@type": "HowToStep",
      "name": "Monitor Cohorts in Real-Time",
      "text": "Set up CohortMonitor with automated alerts for retention issues",
      "itemListElement": [
        {
          "@type": "HowToDirection",
          "text": "Run daily monitoring to detect retention drops and churn spikes"
        },
        {
          "@type": "HowToDirection",
          "text": "Send Slack notifications for critical alerts"
        },
        {
          "@type": "HowToDirection",
          "text": "Generate recommended actions based on alert type"
        }
      ]
    }
  ],
  "totalTime": "PT4H",
  "tool": [
    "TypeScript",
    "Python",
    "Firebase Firestore",
    "scikit-learn",
    "React",
    "Chart.js"
  ]
}