Kubernetes Operators for ChatGPT Apps: Custom Resource Management
Managing ChatGPT applications at scale requires sophisticated orchestration beyond basic Kubernetes deployments. Kubernetes operators provide the automation framework needed to deploy, configure, and maintain ChatGPT apps with the same operational expertise that human operators bring to complex systems.
This comprehensive guide explores building production-ready Kubernetes operators specifically designed for ChatGPT applications. You'll learn how to create Custom Resource Definitions (CRDs), implement reconciliation controllers, handle lifecycle management, and deploy operators that can manage hundreds of ChatGPT apps across multiple clusters.
By mastering the operator pattern, you'll transform ChatGPT app deployment from manual kubectl commands into declarative, self-healing infrastructure that scales effortlessly. Whether you're running a single ChatGPT app or managing a fleet of AI-powered services, operators provide the foundation for reliable, automated operations.
Understanding the Operator Pattern for ChatGPT Apps
The Kubernetes operator pattern extends Kubernetes functionality to manage complex, stateful applications through custom controllers that encode operational knowledge as code. For ChatGPT apps, operators automate tasks like MCP server deployment, widget configuration, OAuth credential rotation, and traffic management.
Core Components of ChatGPT Operators
Operators consist of three fundamental components working together:
Custom Resource Definitions (CRDs) define new Kubernetes API types representing ChatGPT apps. A ChatGPTApp CRD might include specifications for MCP endpoints, widget templates, authentication providers, and deployment strategies. CRDs extend the Kubernetes API surface, allowing you to manage ChatGPT apps using familiar kubectl commands.
Controllers implement the reconciliation logic that drives operators. They watch ChatGPT app resources, compare desired state (defined in CRDs) with actual cluster state, and execute actions to achieve convergence. Controllers run continuously, ensuring ChatGPT apps remain configured correctly even when infrastructure changes.
Reconciliation loops form the operational heart of controllers. Every 10-30 seconds, controllers query the Kubernetes API for ChatGPT app resources, detect configuration drift, and apply corrective actions. This control loop provides self-healing capabilities—if a ChatGPT app's MCP server crashes, the operator automatically recreates it based on the CRD specification.
Why Operators Excel for ChatGPT App Management
Traditional Kubernetes resources like Deployments and Services lack the domain-specific logic needed for ChatGPT apps. Operators bridge this gap:
- MCP Server Lifecycle: Operators manage MCP server deployments, health checks, version upgrades, and traffic routing based on ChatGPT-specific requirements
- Configuration Management: Automatically generate widget templates, OAuth configurations, and API schemas from high-level ChatGPT app specifications
- Credential Rotation: Implement secure OAuth token refresh, API key rotation, and certificate renewal without manual intervention
- Multi-Tenancy: Isolate ChatGPT apps across namespaces with tenant-specific resource quotas, network policies, and RBAC rules
- Observability: Create standardized monitoring dashboards, alerts, and log aggregation for all ChatGPT apps managed by the operator
The operator pattern transforms ChatGPT app infrastructure from imperative scripts into declarative, version-controlled manifests that capture organizational knowledge about operating AI applications at scale.
Operator Capabilities vs. Traditional Deployments
| Capability | Traditional K8s | ChatGPT Operator |
|---|---|---|
| Deploy MCP Server | Manual YAML | Declarative CRD |
| Widget Updates | kubectl apply | Automated rollout |
| OAuth Setup | External scripts | Built-in controller |
| Health Monitoring | Basic probes | Domain-specific checks |
| Scaling Logic | Generic HPA | ChatGPT-aware autoscaling |
| Disaster Recovery | Manual restore | Automated reconciliation |
Operators encapsulate the expertise of experienced ChatGPT platform engineers, making that knowledge executable and repeatable across your entire infrastructure. Learn more about ChatGPT app architecture patterns and MCP server deployment strategies.
Building Operators with the Operator SDK
The Operator SDK provides scaffolding tools and frameworks that dramatically simplify operator development. Instead of writing thousands of lines of boilerplate Kubernetes client code, the SDK generates project structure, RBAC configurations, and CRD manifests, letting you focus on ChatGPT-specific business logic.
Operator SDK Project Structure
Initialize a new ChatGPT operator project:
# Install Operator SDK (v1.33+)
curl -LO https://github.com/operator-framework/operator-sdk/releases/download/v1.33.0/operator-sdk_linux_amd64
chmod +x operator-sdk_linux_amd64
sudo mv operator-sdk_linux_amd64 /usr/local/bin/operator-sdk
# Create new Go-based operator
operator-sdk init --domain=makeaihq.com --repo=github.com/makeaihq/chatgpt-operator
operator-sdk create api --group apps --version v1alpha1 --kind ChatGPTApp --resource --controller
# Generate CRD manifests
make manifests
This creates a project structure:
chatgpt-operator/
├── api/v1alpha1/
│ ├── chatgptapp_types.go # CRD schema
│ └── zz_generated.deepcopy.go # Auto-generated
├── config/
│ ├── crd/ # CRD manifests
│ ├── rbac/ # RBAC rules
│ └── manager/ # Operator deployment
├── controllers/
│ └── chatgptapp_controller.go # Reconciliation logic
└── main.go # Entry point
The SDK supports three operator types:
- Go Operators: Full flexibility, custom reconciliation logic, best for complex ChatGPT app management
- Ansible Operators: Declarative playbook-based automation, simpler for configuration-heavy workflows
- Helm Operators: Wrap existing Helm charts with operator lifecycle management
For ChatGPT apps requiring sophisticated MCP server orchestration, OAuth flows, and widget rendering, Go operators provide the necessary control and performance.
Defining the ChatGPTApp CRD
Edit api/v1alpha1/chatgptapp_types.go to define your ChatGPT app schema:
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// ChatGPTAppSpec defines the desired state of ChatGPTApp
type ChatGPTAppSpec struct {
// DisplayName is the human-readable name shown in ChatGPT Store
// +kubebuilder:validation:MinLength=3
// +kubebuilder:validation:MaxLength=50
DisplayName string `json:"displayName"`
// Description provides context for the ChatGPT app
// +kubebuilder:validation:MaxLength=300
Description string `json:"description"`
// MCPServer defines the Model Context Protocol server configuration
MCPServer MCPServerConfig `json:"mcpServer"`
// Widgets defines UI components rendered in ChatGPT
// +optional
Widgets []WidgetConfig `json:"widgets,omitempty"`
// OAuth configures authentication for the ChatGPT app
// +optional
OAuth *OAuthConfig `json:"oauth,omitempty"`
// Replicas specifies the number of MCP server instances
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10
// +kubebuilder:default=2
Replicas int32 `json:"replicas"`
// Resources defines CPU/memory requests and limits
// +optional
Resources *ResourceRequirements `json:"resources,omitempty"`
}
// MCPServerConfig defines MCP server deployment parameters
type MCPServerConfig struct {
// Image is the container image for the MCP server
// +kubebuilder:validation:Pattern=`^[a-z0-9\-\.\/]+:[a-z0-9\-\.]+$`
Image string `json:"image"`
// Port is the HTTP port for MCP protocol communication
// +kubebuilder:validation:Minimum=1024
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:default=8080
Port int32 `json:"port"`
// Tools defines the MCP tools exposed to ChatGPT
Tools []MCPTool `json:"tools"`
// HealthCheckPath is the endpoint for liveness/readiness probes
// +kubebuilder:default="/health"
HealthCheckPath string `json:"healthCheckPath,omitempty"`
}
// MCPTool represents a single tool in the MCP server
type MCPTool struct {
// Name is the tool identifier
Name string `json:"name"`
// Description explains the tool's purpose
Description string `json:"description"`
// InputSchema is the JSON schema for tool parameters
InputSchema map[string]interface{} `json:"inputSchema"`
}
// WidgetConfig defines a UI widget rendered in ChatGPT
type WidgetConfig struct {
// Name identifies the widget
Name string `json:"name"`
// Template is the HTML template with Skybridge API
Template string `json:"template"`
// MaxTokens limits the widget content size
// +kubebuilder:validation:Maximum=4000
// +kubebuilder:default=2000
MaxTokens int32 `json:"maxTokens,omitempty"`
}
// OAuthConfig defines OAuth 2.1 authentication
type OAuthConfig struct {
// ClientID is the OAuth client identifier
ClientID string `json:"clientId"`
// ClientSecretRef references a Kubernetes Secret
ClientSecretRef SecretReference `json:"clientSecretRef"`
// AuthorizationURL is the OAuth authorization endpoint
AuthorizationURL string `json:"authorizationUrl"`
// TokenURL is the OAuth token exchange endpoint
TokenURL string `json:"tokenUrl"`
// Scopes defines the requested OAuth scopes
Scopes []string `json:"scopes"`
}
// SecretReference points to a Kubernetes Secret
type SecretReference struct {
// Name is the Secret name
Name string `json:"name"`
// Key is the Secret data key
Key string `json:"key"`
}
// ResourceRequirements defines compute resources
type ResourceRequirements struct {
// Requests defines minimum resources
Requests ResourceList `json:"requests"`
// Limits defines maximum resources
Limits ResourceList `json:"limits"`
}
// ResourceList specifies CPU and memory quantities
type ResourceList struct {
// CPU in cores (e.g., "500m" = 0.5 cores)
CPU string `json:"cpu"`
// Memory in bytes (e.g., "512Mi")
Memory string `json:"memory"`
}
// ChatGPTAppStatus defines the observed state of ChatGPTApp
type ChatGPTAppStatus struct {
// Phase represents the current deployment phase
// +kubebuilder:validation:Enum=Pending;Deploying;Ready;Failed
Phase string `json:"phase"`
// Conditions represent the latest observations
Conditions []metav1.Condition `json:"conditions,omitempty"`
// ObservedGeneration is the last processed generation
ObservedGeneration int64 `json:"observedGeneration,omitempty"`
// Endpoint is the public MCP server URL
Endpoint string `json:"endpoint,omitempty"`
// ReadyReplicas counts healthy MCP server instances
ReadyReplicas int32 `json:"readyReplicas"`
// LastUpdated timestamp of the last reconciliation
LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
}
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
// +kubebuilder:printcolumn:name="Endpoint",type=string,JSONPath=`.status.endpoint`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`
// ChatGPTApp is the Schema for the chatgptapps API
type ChatGPTApp struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ChatGPTAppSpec `json:"spec,omitempty"`
Status ChatGPTAppStatus `json:"status,omitempty"`
}
// +kubebuilder:object:root=true
// ChatGPTAppList contains a list of ChatGPTApp
type ChatGPTAppList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []ChatGPTApp `json:"items"`
}
func init() {
SchemeBuilder.Register(&ChatGPTApp{}, &ChatGPTAppList{})
}
After defining the CRD schema, regenerate manifests:
make manifests
make generate
This CRD schema enables declarative ChatGPT app management with type-safe validation, default values, and automatic status tracking. Explore ChatGPT app configuration patterns and MCP protocol implementation.
Implementing the Controller Reconciliation Loop
The controller reconciliation loop is where operator logic executes. Controllers watch ChatGPT app resources, detect changes, and perform actions to align actual cluster state with desired CRD specifications.
Core Reconcile Function
Edit controllers/chatgptapp_controller.go:
package controllers
import (
"context"
"fmt"
"time"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/intstr"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
appsv1alpha1 "github.com/makeaihq/chatgpt-operator/api/v1alpha1"
)
const (
chatGPTAppFinalizer = "apps.makeaihq.com/finalizer"
reconcileInterval = 30 * time.Second
)
// ChatGPTAppReconciler reconciles a ChatGPTApp object
type ChatGPTAppReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// +kubebuilder:rbac:groups=apps.makeaihq.com,resources=chatgptapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps.makeaihq.com,resources=chatgptapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps.makeaihq.com,resources=chatgptapps/finalizers,verbs=update
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch
// Reconcile processes ChatGPTApp resources
func (r *ChatGPTAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
logger.Info("Reconciling ChatGPTApp", "name", req.Name, "namespace", req.Namespace)
// Fetch ChatGPTApp instance
app := &appsv1alpha1.ChatGPTApp{}
if err := r.Get(ctx, req.NamespacedName, app); err != nil {
if errors.IsNotFound(err) {
logger.Info("ChatGPTApp resource not found, ignoring")
return ctrl.Result{}, nil
}
logger.Error(err, "Failed to get ChatGPTApp")
return ctrl.Result{}, err
}
// Handle deletion with finalizers
if app.ObjectMeta.DeletionTimestamp != nil {
if controllerutil.ContainsFinalizer(app, chatGPTAppFinalizer) {
if err := r.finalizeChatGPTApp(ctx, app); err != nil {
return ctrl.Result{}, err
}
controllerutil.RemoveFinalizer(app, chatGPTAppFinalizer)
if err := r.Update(ctx, app); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Add finalizer if not present
if !controllerutil.ContainsFinalizer(app, chatGPTAppFinalizer) {
controllerutil.AddFinalizer(app, chatGPTAppFinalizer)
if err := r.Update(ctx, app); err != nil {
return ctrl.Result{}, err
}
}
// Update status to Deploying
if app.Status.Phase != "Deploying" && app.Status.Phase != "Ready" {
app.Status.Phase = "Deploying"
if err := r.Status().Update(ctx, app); err != nil {
logger.Error(err, "Failed to update status to Deploying")
return ctrl.Result{}, err
}
}
// Reconcile Deployment
deployment := r.buildDeployment(app)
if err := controllerutil.SetControllerReference(app, deployment, r.Scheme); err != nil {
logger.Error(err, "Failed to set controller reference on Deployment")
return ctrl.Result{}, err
}
existingDeployment := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: deployment.Name, Namespace: deployment.Namespace}, existingDeployment)
if err != nil && errors.IsNotFound(err) {
logger.Info("Creating Deployment", "name", deployment.Name)
if err := r.Create(ctx, deployment); err != nil {
logger.Error(err, "Failed to create Deployment")
return r.updateStatusFailed(ctx, app, "DeploymentCreateFailed", err.Error())
}
} else if err != nil {
logger.Error(err, "Failed to get Deployment")
return ctrl.Result{}, err
} else {
// Update existing Deployment if spec changed
if !deploymentSpecEqual(existingDeployment, deployment) {
logger.Info("Updating Deployment", "name", deployment.Name)
existingDeployment.Spec = deployment.Spec
if err := r.Update(ctx, existingDeployment); err != nil {
logger.Error(err, "Failed to update Deployment")
return r.updateStatusFailed(ctx, app, "DeploymentUpdateFailed", err.Error())
}
}
}
// Reconcile Service
service := r.buildService(app)
if err := controllerutil.SetControllerReference(app, service, r.Scheme); err != nil {
logger.Error(err, "Failed to set controller reference on Service")
return ctrl.Result{}, err
}
existingService := &corev1.Service{}
err = r.Get(ctx, types.NamespacedName{Name: service.Name, Namespace: service.Namespace}, existingService)
if err != nil && errors.IsNotFound(err) {
logger.Info("Creating Service", "name", service.Name)
if err := r.Create(ctx, service); err != nil {
logger.Error(err, "Failed to create Service")
return r.updateStatusFailed(ctx, app, "ServiceCreateFailed", err.Error())
}
} else if err != nil {
logger.Error(err, "Failed to get Service")
return ctrl.Result{}, err
}
// Update status based on Deployment readiness
if err := r.updateStatus(ctx, app, existingDeployment); err != nil {
logger.Error(err, "Failed to update status")
return ctrl.Result{}, err
}
logger.Info("Reconciliation complete", "phase", app.Status.Phase)
return ctrl.Result{RequeueAfter: reconcileInterval}, nil
}
// buildDeployment creates a Deployment for the MCP server
func (r *ChatGPTAppReconciler) buildDeployment(app *appsv1alpha1.ChatGPTApp) *appsv1.Deployment {
labels := map[string]string{
"app": app.Name,
"app.kubernetes.io/name": app.Name,
"app.kubernetes.io/component": "mcp-server",
"app.kubernetes.io/managed-by": "chatgpt-operator",
}
replicas := app.Spec.Replicas
deployment := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: app.Name + "-mcp",
Namespace: app.Namespace,
Labels: labels,
},
Spec: appsv1.DeploymentSpec{
Replicas: &replicas,
Selector: &metav1.LabelSelector{
MatchLabels: labels,
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: labels,
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "mcp-server",
Image: app.Spec.MCPServer.Image,
Ports: []corev1.ContainerPort{
{
Name: "http",
ContainerPort: app.Spec.MCPServer.Port,
Protocol: corev1.ProtocolTCP,
},
},
Env: r.buildEnvVars(app),
LivenessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
HTTPGet: &corev1.HTTPGetAction{
Path: app.Spec.MCPServer.HealthCheckPath,
Port: intstr.FromInt(int(app.Spec.MCPServer.Port)),
},
},
InitialDelaySeconds: 10,
PeriodSeconds: 10,
TimeoutSeconds: 5,
FailureThreshold: 3,
},
ReadinessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
HTTPGet: &corev1.HTTPGetAction{
Path: app.Spec.MCPServer.HealthCheckPath,
Port: intstr.FromInt(int(app.Spec.MCPServer.Port)),
},
},
InitialDelaySeconds: 5,
PeriodSeconds: 5,
TimeoutSeconds: 3,
FailureThreshold: 2,
},
},
},
},
},
},
}
// Add resource requests/limits if specified
if app.Spec.Resources != nil {
deployment.Spec.Template.Spec.Containers[0].Resources = corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse(app.Spec.Resources.Requests.CPU),
corev1.ResourceMemory: resource.MustParse(app.Spec.Resources.Requests.Memory),
},
Limits: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse(app.Spec.Resources.Limits.CPU),
corev1.ResourceMemory: resource.MustParse(app.Spec.Resources.Limits.Memory),
},
}
}
return deployment
}
// buildService creates a Service for the MCP server
func (r *ChatGPTAppReconciler) buildService(app *appsv1alpha1.ChatGPTApp) *corev1.Service {
labels := map[string]string{
"app": app.Name,
"app.kubernetes.io/name": app.Name,
"app.kubernetes.io/component": "mcp-server",
}
return &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: app.Name + "-mcp",
Namespace: app.Namespace,
Labels: labels,
},
Spec: corev1.ServiceSpec{
Selector: labels,
Ports: []corev1.ServicePort{
{
Name: "http",
Protocol: corev1.ProtocolTCP,
Port: 80,
TargetPort: intstr.FromInt(int(app.Spec.MCPServer.Port)),
},
},
Type: corev1.ServiceTypeClusterIP,
},
}
}
// buildEnvVars constructs environment variables for the MCP server
func (r *ChatGPTAppReconciler) buildEnvVars(app *appsv1alpha1.ChatGPTApp) []corev1.EnvVar {
envVars := []corev1.EnvVar{
{Name: "CHATGPT_APP_NAME", Value: app.Spec.DisplayName},
{Name: "CHATGPT_APP_DESCRIPTION", Value: app.Spec.Description},
{Name: "MCP_PORT", Value: fmt.Sprintf("%d", app.Spec.MCPServer.Port)},
}
// Add OAuth configuration if present
if app.Spec.OAuth != nil {
envVars = append(envVars,
corev1.EnvVar{Name: "OAUTH_CLIENT_ID", Value: app.Spec.OAuth.ClientID},
corev1.EnvVar{Name: "OAUTH_AUTH_URL", Value: app.Spec.OAuth.AuthorizationURL},
corev1.EnvVar{Name: "OAUTH_TOKEN_URL", Value: app.Spec.OAuth.TokenURL},
corev1.EnvVar{
Name: "OAUTH_CLIENT_SECRET",
ValueFrom: &corev1.EnvVarSource{
SecretKeyRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{
Name: app.Spec.OAuth.ClientSecretRef.Name,
},
Key: app.Spec.OAuth.ClientSecretRef.Key,
},
},
},
)
}
return envVars
}
// finalizeChatGPTApp performs cleanup before deletion
func (r *ChatGPTAppReconciler) finalizeChatGPTApp(ctx context.Context, app *appsv1alpha1.ChatGPTApp) error {
logger := log.FromContext(ctx)
logger.Info("Finalizing ChatGPTApp", "name", app.Name)
// TODO: Deregister from ChatGPT Store API
// TODO: Revoke OAuth credentials
// TODO: Clean up external resources
return nil
}
// updateStatus updates the ChatGPTApp status based on Deployment state
func (r *ChatGPTAppReconciler) updateStatus(ctx context.Context, app *appsv1alpha1.ChatGPTApp, deployment *appsv1.Deployment) error {
readyReplicas := deployment.Status.ReadyReplicas
desiredReplicas := app.Spec.Replicas
app.Status.ReadyReplicas = readyReplicas
app.Status.ObservedGeneration = app.Generation
app.Status.LastUpdated = metav1.Now()
app.Status.Endpoint = fmt.Sprintf("http://%s-mcp.%s.svc.cluster.local", app.Name, app.Namespace)
if readyReplicas == desiredReplicas {
app.Status.Phase = "Ready"
r.setCondition(app, "Ready", metav1.ConditionTrue, "AllReplicasReady", "All MCP server replicas are healthy")
} else {
app.Status.Phase = "Deploying"
r.setCondition(app, "Ready", metav1.ConditionFalse, "ReplicasNotReady", fmt.Sprintf("%d/%d replicas ready", readyReplicas, desiredReplicas))
}
return r.Status().Update(ctx, app)
}
// updateStatusFailed updates status to Failed
func (r *ChatGPTAppReconciler) updateStatusFailed(ctx context.Context, app *appsv1alpha1.ChatGPTApp, reason, message string) (ctrl.Result, error) {
app.Status.Phase = "Failed"
app.Status.LastUpdated = metav1.Now()
r.setCondition(app, "Ready", metav1.ConditionFalse, reason, message)
if err := r.Status().Update(ctx, app); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: reconcileInterval}, nil
}
// setCondition updates or adds a status condition
func (r *ChatGPTAppReconciler) setCondition(app *appsv1alpha1.ChatGPTApp, conditionType string, status metav1.ConditionStatus, reason, message string) {
condition := metav1.Condition{
Type: conditionType,
Status: status,
Reason: reason,
Message: message,
LastTransitionTime: metav1.Now(),
ObservedGeneration: app.Generation,
}
// Find existing condition
for i, existing := range app.Status.Conditions {
if existing.Type == conditionType {
if existing.Status != status {
app.Status.Conditions[i] = condition
}
return
}
}
// Add new condition
app.Status.Conditions = append(app.Status.Conditions, condition)
}
// deploymentSpecEqual compares two Deployments for equality
func deploymentSpecEqual(a, b *appsv1.Deployment) bool {
return a.Spec.Replicas != nil && b.Spec.Replicas != nil &&
*a.Spec.Replicas == *b.Spec.Replicas &&
a.Spec.Template.Spec.Containers[0].Image == b.Spec.Template.Spec.Containers[0].Image
}
// SetupWithManager sets up the controller with the Manager
func (r *ChatGPTAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&appsv1alpha1.ChatGPTApp{}).
Owns(&appsv1.Deployment{}).
Owns(&corev1.Service{}).
Complete(r)
}
This reconciliation loop creates and maintains Deployments and Services for ChatGPT apps, with automatic health checks, resource management, and status updates. See Kubernetes deployment patterns and container orchestration strategies.
Advanced Operator Patterns
Production ChatGPT operators require sophisticated patterns beyond basic reconciliation: admission webhooks for validation, leader election for high availability, and RBAC for secure multi-tenancy.
Admission Webhook Implementation
Webhooks intercept API requests before resources are persisted, enabling validation and mutation logic:
// api/v1alpha1/chatgptapp_webhook.go
package v1alpha1
import (
"fmt"
"regexp"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
)
var (
imageRegex = regexp.MustCompile(`^[a-z0-9\-\.\/]+:[a-z0-9\-\.]+$`)
urlRegex = regexp.MustCompile(`^https?://[^\s]+$`)
)
// SetupWebhookWithManager registers the webhook with the manager
func (r *ChatGPTApp) SetupWebhookWithManager(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).
For(r).
Complete()
}
// +kubebuilder:webhook:path=/mutate-apps-makeaihq-com-v1alpha1-chatgptapp,mutating=true,failurePolicy=fail,sideEffects=None,groups=apps.makeaihq.com,resources=chatgptapps,verbs=create;update,versions=v1alpha1,name=mchatgptapp.kb.io,admissionReviewVersions=v1
var _ webhook.Defaulter = &ChatGPTApp{}
// Default implements webhook.Defaulter for setting default values
func (r *ChatGPTApp) Default() {
// Set default replicas if not specified
if r.Spec.Replicas == 0 {
r.Spec.Replicas = 2
}
// Set default health check path
if r.Spec.MCPServer.HealthCheckPath == "" {
r.Spec.MCPServer.HealthCheckPath = "/health"
}
// Set default port
if r.Spec.MCPServer.Port == 0 {
r.Spec.MCPServer.Port = 8080
}
// Set default resources if not specified
if r.Spec.Resources == nil {
r.Spec.Resources = &ResourceRequirements{
Requests: ResourceList{
CPU: "100m",
Memory: "128Mi",
},
Limits: ResourceList{
CPU: "500m",
Memory: "512Mi",
},
}
}
}
// +kubebuilder:webhook:path=/validate-apps-makeaihq-com-v1alpha1-chatgptapp,mutating=false,failurePolicy=fail,sideEffects=None,groups=apps.makeaihq.com,resources=chatgptapps,verbs=create;update,versions=v1alpha1,name=vchatgptapp.kb.io,admissionReviewVersions=v1
var _ webhook.Validator = &ChatGPTApp{}
// ValidateCreate implements webhook.Validator for creation
func (r *ChatGPTApp) ValidateCreate() (admission.Warnings, error) {
return r.validateChatGPTApp()
}
// ValidateUpdate implements webhook.Validator for updates
func (r *ChatGPTApp) ValidateUpdate(old runtime.Object) (admission.Warnings, error) {
return r.validateChatGPTApp()
}
// ValidateDelete implements webhook.Validator for deletion
func (r *ChatGPTApp) ValidateDelete() (admission.Warnings, error) {
return nil, nil
}
// validateChatGPTApp performs comprehensive validation
func (r *ChatGPTApp) validateChatGPTApp() (admission.Warnings, error) {
var warnings admission.Warnings
// Validate display name
if len(r.Spec.DisplayName) < 3 {
return nil, fmt.Errorf("displayName must be at least 3 characters")
}
if len(r.Spec.DisplayName) > 50 {
return nil, fmt.Errorf("displayName must be at most 50 characters")
}
// Validate MCP server image format
if !imageRegex.MatchString(r.Spec.MCPServer.Image) {
return nil, fmt.Errorf("mcpServer.image must match format: registry/image:tag")
}
// Validate port range
if r.Spec.MCPServer.Port < 1024 || r.Spec.MCPServer.Port > 65535 {
return nil, fmt.Errorf("mcpServer.port must be between 1024 and 65535")
}
// Validate MCP tools
if len(r.Spec.MCPServer.Tools) == 0 {
return nil, fmt.Errorf("mcpServer.tools must contain at least one tool")
}
for _, tool := range r.Spec.MCPServer.Tools {
if tool.Name == "" {
return nil, fmt.Errorf("all tools must have a name")
}
if tool.Description == "" {
warnings = append(warnings, fmt.Sprintf("Tool '%s' has no description", tool.Name))
}
if tool.InputSchema == nil {
return nil, fmt.Errorf("tool '%s' must have an inputSchema", tool.Name)
}
}
// Validate OAuth configuration if present
if r.Spec.OAuth != nil {
if r.Spec.OAuth.ClientID == "" {
return nil, fmt.Errorf("oauth.clientId is required when OAuth is configured")
}
if !urlRegex.MatchString(r.Spec.OAuth.AuthorizationURL) {
return nil, fmt.Errorf("oauth.authorizationUrl must be a valid HTTP(S) URL")
}
if !urlRegex.MatchString(r.Spec.OAuth.TokenURL) {
return nil, fmt.Errorf("oauth.tokenUrl must be a valid HTTP(S) URL")
}
if r.Spec.OAuth.ClientSecretRef.Name == "" || r.Spec.OAuth.ClientSecretRef.Key == "" {
return nil, fmt.Errorf("oauth.clientSecretRef must specify both name and key")
}
}
// Validate widgets
for _, widget := range r.Spec.Widgets {
if widget.Name == "" {
return nil, fmt.Errorf("all widgets must have a name")
}
if widget.Template == "" {
return nil, fmt.Errorf("widget '%s' must have a template", widget.Name)
}
if widget.MaxTokens > 4000 {
return nil, fmt.Errorf("widget '%s' maxTokens exceeds OpenAI limit of 4000", widget.Name)
}
}
// Validate resource requirements
if r.Spec.Resources != nil {
if err := validateResourceQuantity(r.Spec.Resources.Requests.CPU, "requests.cpu"); err != nil {
return nil, err
}
if err := validateResourceQuantity(r.Spec.Resources.Requests.Memory, "requests.memory"); err != nil {
return nil, err
}
if err := validateResourceQuantity(r.Spec.Resources.Limits.CPU, "limits.cpu"); err != nil {
return nil, err
}
if err := validateResourceQuantity(r.Spec.Resources.Limits.Memory, "limits.memory"); err != nil {
return nil, err
}
}
return warnings, nil
}
// validateResourceQuantity validates Kubernetes resource quantities
func validateResourceQuantity(quantity, field string) error {
validPattern := regexp.MustCompile(`^[0-9]+(\.[0-9]+)?(m|Mi|Gi)?$`)
if !validPattern.MatchString(quantity) {
return fmt.Errorf("%s must be a valid Kubernetes quantity (e.g., '500m', '1Gi')", field)
}
return nil
}
Webhooks ensure only valid ChatGPT apps enter the cluster, preventing configuration errors before they cause runtime failures.
RBAC Configuration for Multi-Tenancy
Secure multi-tenant ChatGPT operators require granular RBAC:
# config/rbac/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: chatgpt-operator-manager-role
rules:
# ChatGPTApp CRD permissions
- apiGroups:
- apps.makeaihq.com
resources:
- chatgptapps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- apps.makeaihq.com
resources:
- chatgptapps/status
verbs:
- get
- patch
- update
- apiGroups:
- apps.makeaihq.com
resources:
- chatgptapps/finalizers
verbs:
- update
# Deployment management
- apiGroups:
- apps
resources:
- deployments
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
# Service management
- apiGroups:
- ""
resources:
- services
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
# Secret read access (for OAuth credentials)
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- list
- watch
# ConfigMap management (for widget templates)
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
# Event creation for audit trail
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
# Ingress management (for public endpoints)
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: chatgpt-operator-metrics-reader
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: chatgpt-operator-proxy-role
rules:
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: chatgpt-operator-manager-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: chatgpt-operator-manager-role
subjects:
- kind: ServiceAccount
name: chatgpt-operator-controller-manager
namespace: chatgpt-operator-system
---
# Namespace-scoped role for tenant isolation
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: chatgpt-app-viewer
namespace: tenant-namespace
rules:
- apiGroups:
- apps.makeaihq.com
resources:
- chatgptapps
verbs:
- get
- list
- watch
- apiGroups:
- apps.makeaihq.com
resources:
- chatgptapps/status
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: chatgpt-app-viewer-binding
namespace: tenant-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: chatgpt-app-viewer
subjects:
- kind: ServiceAccount
name: tenant-user
namespace: tenant-namespace
This RBAC configuration isolates tenants while allowing the operator full cluster access for ChatGPT app management. Learn more about Kubernetes security patterns.
Production Operator Deployment
Deploying operators to production requires Operator Lifecycle Manager (OLM) integration, versioning strategies, and comprehensive monitoring.
OLM Bundle Configuration
Create an OLM bundle for operator distribution:
# bundle/manifests/chatgpt-operator.clusterserviceversion.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
name: chatgpt-operator.v1.0.0
namespace: placeholder
annotations:
alm-examples: |-
[
{
"apiVersion": "apps.makeaihq.com/v1alpha1",
"kind": "ChatGPTApp",
"metadata": {
"name": "fitness-assistant"
},
"spec": {
"displayName": "Fitness Studio Assistant",
"description": "AI-powered class scheduling and nutrition advice",
"replicas": 2,
"mcpServer": {
"image": "makeaihq/mcp-fitness:1.0.0",
"port": 8080,
"tools": [
{
"name": "schedule_class",
"description": "Schedule a fitness class",
"inputSchema": {
"type": "object",
"properties": {
"className": {"type": "string"},
"date": {"type": "string", "format": "date-time"}
}
}
}
]
}
}
}
]
capabilities: Deep Insights
categories: AI/ML,Developer Tools
containerImage: makeaihq/chatgpt-operator:1.0.0
description: Kubernetes operator for managing ChatGPT applications
repository: https://github.com/makeaihq/chatgpt-operator
spec:
displayName: ChatGPT Operator
description: |
The ChatGPT Operator automates deployment and lifecycle management of ChatGPT applications on Kubernetes.
Features:
- Declarative ChatGPT app configuration via CRDs
- Automated MCP server deployment and scaling
- OAuth credential management
- Widget template rendering
- Multi-tenant isolation
- Health monitoring and auto-healing
version: 1.0.0
maturity: stable
maintainers:
- name: MakeAIHQ Engineering
email: engineering@makeaihq.com
provider:
name: MakeAIHQ
url: https://makeaihq.com
keywords:
- chatgpt
- openai
- ai
- mcp
- operator
links:
- name: Documentation
url: https://docs.makeaihq.com/operator
- name: GitHub
url: https://github.com/makeaihq/chatgpt-operator
icon:
- base64data: iVBORw0KGgoAAAANSUhEUg... # Base64 icon
mediatype: image/png
minKubeVersion: 1.24.0
installModes:
- type: OwnNamespace
supported: true
- type: SingleNamespace
supported: true
- type: MultiNamespace
supported: true
- type: AllNamespaces
supported: true
customresourcedefinitions:
owned:
- name: chatgptapps.apps.makeaihq.com
version: v1alpha1
kind: ChatGPTApp
displayName: ChatGPT App
description: Represents a ChatGPT application with MCP server and widgets
statusDescriptors:
- path: phase
displayName: Phase
description: Current deployment phase
x-descriptors:
- urn:alm:descriptor:io.kubernetes.phase
- path: readyReplicas
displayName: Ready Replicas
description: Number of healthy MCP server instances
x-descriptors:
- urn:alm:descriptor:com.tectonic.ui:podCount
- path: endpoint
displayName: Endpoint
description: MCP server endpoint URL
x-descriptors:
- urn:alm:descriptor:org.w3:link
install:
strategy: deployment
spec:
clusterPermissions:
- serviceAccountName: chatgpt-operator-controller-manager
rules:
- apiGroups: ["apps.makeaihq.com"]
resources: ["chatgptapps", "chatgptapps/status", "chatgptapps/finalizers"]
verbs: ["*"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["*"]
- apiGroups: [""]
resources: ["services", "secrets", "configmaps", "events"]
verbs: ["*"]
deployments:
- name: chatgpt-operator-controller-manager
spec:
replicas: 1
selector:
matchLabels:
control-plane: controller-manager
template:
metadata:
labels:
control-plane: controller-manager
spec:
serviceAccountName: chatgpt-operator-controller-manager
containers:
- name: manager
image: makeaihq/chatgpt-operator:1.0.0
command:
- /manager
args:
- --leader-elect
- --health-probe-bind-address=:8081
- --metrics-bind-address=:8080
env:
- name: ENABLE_WEBHOOKS
value: "true"
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
Deploy with OLM:
# Build and push bundle
make bundle IMG=makeaihq/chatgpt-operator:1.0.0
make bundle-build bundle-push BUNDLE_IMG=makeaihq/chatgpt-operator-bundle:1.0.0
# Install with OLM
operator-sdk run bundle makeaihq/chatgpt-operator-bundle:1.0.0
Monitoring and Observability
Integrate Prometheus metrics:
// controllers/metrics.go
package controllers
import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)
var (
chatGPTAppsTotal = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "chatgpt_apps_total",
Help: "Total number of ChatGPT apps by phase",
},
[]string{"phase", "namespace"},
)
reconcileCount = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "chatgpt_app_reconcile_total",
Help: "Total number of reconciliations",
},
[]string{"status"},
)
reconcileDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "chatgpt_app_reconcile_duration_seconds",
Help: "Duration of reconciliation in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"status"},
)
)
func init() {
metrics.Registry.MustRegister(chatGPTAppsTotal, reconcileCount, reconcileDuration)
}
Create Grafana dashboard:
{
"dashboard": {
"title": "ChatGPT Operator Metrics",
"panels": [
{
"title": "ChatGPT Apps by Phase",
"targets": [
{
"expr": "chatgpt_apps_total",
"legendFormat": "{{phase}} ({{namespace}})"
}
],
"type": "graph"
},
{
"title": "Reconciliation Rate",
"targets": [
{
"expr": "rate(chatgpt_app_reconcile_total[5m])",
"legendFormat": "{{status}}"
}
],
"type": "graph"
}
]
}
}
Production operators require comprehensive monitoring to detect reconciliation failures, resource exhaustion, and performance degradation. Explore Kubernetes monitoring strategies and operator best practices.
Conclusion: Self-Healing ChatGPT Infrastructure
Kubernetes operators transform ChatGPT app deployment from manual, error-prone processes into declarative, self-healing infrastructure that scales automatically. By encoding operational expertise as code, operators eliminate toil, reduce outages, and enable platform teams to manage hundreds of ChatGPT applications with minimal manual intervention.
The operator pattern provides:
- Declarative Configuration: Define ChatGPT apps as CRDs, version control them with Git, and apply changes with kubectl
- Automated Lifecycle Management: Controllers handle deployment, scaling, upgrades, and cleanup without human operators
- Self-Healing Capabilities: Reconciliation loops detect and correct configuration drift, automatically recovering from failures
- Multi-Tenancy and Security: RBAC and namespace isolation protect ChatGPT apps across organizational boundaries
- Production-Ready Operations: OLM integration, monitoring, and webhooks ensure operators meet enterprise reliability standards
Ready to automate your ChatGPT app infrastructure? Build your first ChatGPT app with MakeAIHQ—our no-code platform includes operator-managed deployments, auto-scaling MCP servers, and built-in OAuth configuration. From zero to production ChatGPT app in 48 hours, with Kubernetes operators handling the complexity.
Start automating your AI infrastructure today with operator-driven ChatGPT app management.
Related Resources:
- Kubernetes Deployment Patterns for ChatGPT Apps
- MCP Server Deployment Strategies
- ChatGPT App Architecture Patterns
- Container Orchestration for AI Applications
- Kubernetes Security Best Practices
- Kubernetes Monitoring Strategies
- ChatGPT App Configuration Management
- Complete MCP Protocol Guide
External Resources: