Skip to content

Migration & Upgrade Guide

This guide covers upgrading the Keycloak operator and comparing this operator with the official Keycloak operator.

Table of Contents

  1. Upgrading the Operator
  2. Upgrading Keycloak Version
  3. Comparison with Official Keycloak Operator
  4. Backup & Rollback

Upgrading the Operator

Pre-Upgrade Checklist

  • Backup current state - Export all Keycloak resources
  • Review release notes - Check for breaking changes
  • Test in non-production - Upgrade staging environment first
  • Check database backups - Ensure recent backup exists
  • Document current versions - Record operator and Keycloak versions

Step 1: Backup Current State

# Backup all Keycloak resources
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces -o yaml \
  > keycloak-resources-backup-$(date +%Y%m%d).yaml

# Backup operator configuration
helm get values keycloak-operator -n keycloak-operator-system \
  > operator-values-backup-$(date +%Y%m%d).yaml

# Backup CRDs
kubectl get crd -o yaml | grep -A1000 "vriesdemichael.github.io" \
  > crds-backup-$(date +%Y%m%d).yaml

Step 2: Check Current Version

# Get current operator version
helm list -n keycloak-operator-system

# Get operator image version
kubectl get deployment keycloak-operator -n keycloak-operator-system \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

Step 3: Review Release Notes

Check the Releases Page for: - Breaking changes - New features - Bug fixes - Migration requirements

Step 4: Upgrade Operator (Helm)

# Check available versions (OCI)
helm show chart oci://ghcr.io/vriesdemichael/charts/keycloak-operator

# Upgrade operator
helm upgrade keycloak-operator oci://ghcr.io/vriesdemichael/charts/keycloak-operator \
  --namespace keycloak-operator-system \
  --values operator-values-backup-$(date +%Y%m%d).yaml \
  --version <version> \
  --wait

Important: The --wait flag ensures the upgrade completes before returning.

Step 5: Verify Upgrade

# Check operator pods are running new version
kubectl get pods -n keycloak-operator-system

# Check operator logs for startup
kubectl logs -n keycloak-operator-system -l app=keycloak-operator --tail=50

# Verify CRDs updated
kubectl get crd keycloaks.vriesdemichael.github.io -o yaml | grep -A5 version

# Check all resources still healthy
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces

All resources should remain in Ready phase.

Step 6: Test Reconciliation

# Trigger reconciliation on a test realm
kubectl annotate keycloakrealm <test-realm> -n <test-namespace> \
  reconcile=$(date +%s) --overwrite

# Watch logs
kubectl logs -n keycloak-operator-system -l app=keycloak-operator -f

# Verify realm still Ready
kubectl get keycloakrealm <test-realm> -n <test-namespace>

Rollback Procedure

If upgrade fails:

# Rollback Helm release
helm rollback keycloak-operator -n keycloak-operator-system

# Verify operator rolled back
kubectl get pods -n keycloak-operator-system

# Check resources still healthy
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces

Important: CRD changes cannot be automatically rolled back. You may need to manually restore CRDs from backup:

kubectl apply -f crds-backup-<date>.yaml

Upgrading Keycloak Version

Supported Keycloak Versions

  • Minimum: Keycloak 25.0.0 (management port 9000 requirement)
  • Recommended: Keycloak 26.0.0+
  • Maximum: Latest Keycloak release

Automated Pre-Upgrade Backups

Automatic for CNPG and Managed tiers

When upgrading Keycloak to a new major or minor version, the operator automatically creates a backup before applying the change. Patch-level upgrades (e.g., 26.0.126.0.2) skip this step.

The backup behavior depends on your database tier:

  • CNPG: Creates a CNPG Backup CR and waits for completion before proceeding.
  • Managed: Creates a VolumeSnapshot of the database PVC.
  • External: Cannot back up automatically. The operator logs a warning and proceeds. Users with external databases must handle backups independently before upgrading. Flat-field (legacy) configs are treated identically (ADR-091).

See Backup & Restore: Automated Pre-Upgrade Backups for full configuration details.

Pre-Upgrade Checklist

  • Check Keycloak release notes - Review breaking changes
  • Verify backup configuration - Ensure upgradePolicy settings match your requirements
  • Test in non-production - Verify compatibility
  • Schedule maintenance window - Plan for brief downtime (rolling update) or zero-downtime (blue-green, Phase 3)

Upgrade Strategy

Blue-Green Deployment (Future — Phase 3): 1. Deploy new Keycloak version alongside old version 2. Migrate database schema via Liquibase job 3. Switch traffic to new version 4. Keep old version for quick rollback 5. Remove old version after verification

Rolling Update (Current): 1. Update Keycloak resource with new image tag 2. Operator triggers pre-upgrade backup (automatic for CNPG/Managed) 3. Operator performs rolling update 4. Brief downtime during pod restarts

Rolling Update Procedure

# Check current Keycloak version
kubectl get keycloak <name> -n <namespace> \
  -o jsonpath='{.spec.image.tag}'

# Update to new version
kubectl patch keycloak <name> -n <namespace> --type=merge -p '
spec:
  image:
    tag: "26.0.0"
'

# Watch rollout
kubectl rollout status statefulset/<keycloak-name> -n <namespace>

# Verify all pods running new version
kubectl get pods -n <namespace> -l app=keycloak \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'

Verify Upgrade

# Check Keycloak status
kubectl get keycloak <name> -n <namespace>
# Should show PHASE=Ready

# Check all realms still working
kubectl get keycloakrealm --all-namespaces

# Test OAuth2 flow
# (Use test client to verify authentication)

# Check database schema version
kubectl exec -it -n <namespace> <keycloak-pod> -- \
  psql -h <db-host> -U keycloak -d keycloak \
  -c "SELECT * FROM databasechangelog ORDER BY orderexecuted DESC LIMIT 5;"

Rollback to Previous Version

# Revert to previous image tag
kubectl patch keycloak <name> -n <namespace> --type=merge -p '
spec:
  image:
    tag: "25.0.6"
'

# Watch rollout
kubectl rollout status statefulset/<keycloak-name> -n <namespace>

# Verify rollback
kubectl get pods -n <namespace> -l app=keycloak

Note: Keycloak database migrations are forward-only. Rolling back may require database restore if schema was upgraded.


Cache Isolation During Upgrades

When running multiple Keycloak pods, all members must form a single Infinispan/JGroups cluster to share distributed caches (user sessions, action tokens, login flows in progress). If pods from two different major versions try to cluster together during a rolling upgrade, they may encounter serialization incompatibilities in the JGroups protocol, causing subtle split-brain issues.

The cacheIsolation feature solves this by restricting which pods can join the same JGroups cluster using a Kubernetes label selector on the headless discovery service.

If you tag your Keycloak image with a proper semver version (e.g., quay.io/keycloak/keycloak:26.4.1), enable automatic revision-based isolation:

# keycloak-values.yaml
keycloak:
  image: quay.io/keycloak/keycloak
  version: "26.4.1"
  cacheIsolation:
    autoRevision: true

The operator derives the cluster identity from the major version only (v26), so patch and minor upgrades (e.g., 26.4.126.5.0) remain in the same cluster. A major upgrade (e.g., 26.x27.x) automatically creates a new isolated cluster.

Pod label set by the operator:

vriesdemichael.github.io/cache-cluster: <name>-v26

The discovery service selector is updated to match the current major version on every reconcile loop, so stale selectors after upgrades are corrected automatically.

Alternative: clusterName for non-semver or custom names

If you use non-semver tags (e.g., :nightly, :latest, or custom CI builds), use an explicit cluster name:

keycloak:
  cacheIsolation:
    clusterName: my-keycloak-prod

:latest and autoRevision

If autoRevision: true is set but the image tag is non-semver (:latest, :nightly, SHA digest), the operator cannot determine a major version. It will log a warning and disable cache isolation for that instance. You must use clusterName or a semver-tagged image in this case.

What is isolated — and what survives an upgrade

Data Survives upgrade? Why
User sessions ✅ Yes Stored in the database
Realm & client config ✅ Yes Stored in the database
Offline tokens ✅ Yes Stored in the database
In-progress login flows (action tokens) ⚠️ Lost during upgrade window Stored in Infinispan only
Active SSO sessions (not yet flushed) ⚠️ May be lost Flushed periodically to DB

Users mid-flow during the rolling upgrade window (typically seconds to a few minutes) may need to restart the flow. This is the same behaviour as any rolling pod restart.

Priority of cacheIsolation options

If multiple fields are set, the operator applies this resolution order (highest priority first):

  1. clusterName — explicit static name, always wins
  2. autoRevision — derives <name>-v<major> from image tag
  3. autoSuffix — appends the full image tag as-is
  4. No isolation — pods join Keycloak's default cluster

Comparison with Official Keycloak Operator

Overview

Aspect This Operator Official Keycloak Operator
Primary Focus GitOps-native, multi-tenant General Keycloak deployment
Language Python (Kopf) Go (Operator SDK)
CRDs Keycloak, KeycloakRealm, KeycloakClient Keycloak, KeycloakRealmImport
Authorization Namespace grant lists + RBAC RBAC + direct access
Multi-tenancy First-class support Limited
GitOps Compatibility Excellent Good
Secret Management Kubernetes-native Kubernetes + Keycloak
Database CloudNativePG (CNPG) primary External PostgreSQL

When to Use This Operator

Choose this operator if: - Multi-tenant environment (10+ teams) - GitOps-first workflow (ArgoCD, Flux) - Strong namespace isolation required - Declarative authorization via grant lists - CloudNativePG database management preferred

When to Use Official Operator

Choose official operator if: - Single-tenant environment - Need Keycloak's built-in security model - Organization policy requires official/upstream operators - Integration with Red Hat/RHSSO required - Prefer Go-based operators - Need features not yet in this operator

Feature Comparison

Realm Management

Feature This Operator Official Operator
Declarative realm config ✅ KeycloakRealm CRD ✅ KeycloakRealmImport
Live realm updates ✅ Automatic reconciliation ⚠️ Import-based
Drift detection ✅ Built-in ❌ Not supported
Multi-namespace realms ✅ Fully supported ⚠️ Limited
Realm deletion ✅ Automatic ⚠️ Manual

Client Management

Feature This Operator Official Operator
Declarative client config ✅ KeycloakClient CRD ⚠️ Via RealmImport
Client secret management ✅ Automatic Kubernetes secret ⚠️ Via RealmImport
Protocol mappers ✅ CRD support ✅ Via RealmImport
Service accounts ✅ CRD support ✅ Via RealmImport
Cross-namespace clients ✅ Fully supported ❌ Not supported

Security Model

Feature This Operator Official Operator
Authorization method Namespace Grant + RBAC Keycloak admin credentials
Client secret rotation ✅ Automatic ❌ Manual
Multi-tenant isolation ✅ Namespace Grant Lists ⚠️ RBAC-based
Audit trail ✅ K8s API + ConfigMap ⚠️ Keycloak logs
Secret distribution ✅ GitOps-friendly ⚠️ Manual

Operations

Feature This Operator Official Operator
Database management ✅ CNPG integration ⚠️ External required
Backup/restore ✅ Via CNPG ⚠️ Manual
High availability ✅ Multi-replica support ✅ Multi-replica support
Monitoring ✅ Prometheus metrics ✅ Prometheus metrics
Rate limiting ✅ Built-in API rate limiting ❌ Not supported

Migration from Official Operator

Not Automated - Migration requires manual steps:

  1. Export data from existing Keycloak:

    # Export realms from existing Keycloak
    kubectl exec -it <keycloak-pod> -- \
      /opt/keycloak/bin/kc.sh export --dir /tmp/export
    

  2. Deploy this operator alongside (different namespace)

  3. Create new Keycloak instance with this operator

  4. Import realm exports:

  5. Create KeycloakRealm CRDs based on exports
  6. Create KeycloakClient CRDs for each client

  7. Switch application traffic to new Keycloak

  8. Decommission old operator after verification

Note: Direct migration is complex. Recommend running both operators in parallel during transition.


Backup & Rollback

Pre-Upgrade Backup

Always backup before major changes:

# Full backup script
#!/bin/bash
BACKUP_DIR="keycloak-backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p ${BACKUP_DIR}

# Backup resources
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces -o yaml \
  > ${BACKUP_DIR}/resources.yaml

# Backup operator config
helm get values keycloak-operator -n keycloak-operator-system \
  > ${BACKUP_DIR}/operator-values.yaml

# Backup CRDs
kubectl get crd -o yaml | grep -A1000 "vriesdemichael.github.io" \
  > ${BACKUP_DIR}/crds.yaml

# Backup database (if using CNPG)
kubectl cnpg backup keycloak-db -n keycloak-db

echo "Backup complete: ${BACKUP_DIR}"

Database Backup (CloudNativePG)

# Trigger manual backup
kubectl cnpg backup keycloak-db -n keycloak-db

# List backups
kubectl get backup -n keycloak-db

# Verify backup succeeded
kubectl describe backup <backup-name> -n keycloak-db

Restore from Backup

Restore Kubernetes Resources:

# Restore all resources
kubectl apply -f keycloak-backup-<date>/resources.yaml

# Verify resources restored
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces

Restore Database (see Backup & Restore Guide):

# Restore from specific backup
kubectl cnpg restore keycloak-db \
  --backup <backup-name> \
  --namespace keycloak-db

Rollback Operator

# Rollback to previous Helm release
helm rollback keycloak-operator -n keycloak-operator-system

# Or rollback to specific revision
helm history keycloak-operator -n keycloak-operator-system
helm rollback keycloak-operator <revision> -n keycloak-operator-system

# Verify rollback
kubectl get pods -n keycloak-operator-system

Emergency Procedures

Operator Completely Broken:

# Uninstall operator (resources remain)
helm uninstall keycloak-operator -n keycloak-operator-system

# Resources continue working (Keycloak still serves traffic)
# Reinstall operator when ready:
helm install keycloak-operator ./charts/keycloak-operator \
  --namespace keycloak-operator-system \
  --values operator-values-backup.yaml

Keycloak Database Corrupted:

# Restore from backup (requires downtime)
kubectl delete cluster keycloak-db -n keycloak-db
kubectl cnpg restore keycloak-db \
  --backup <backup-name> \
  --namespace keycloak-db

# Wait for database to come back
kubectl wait --for=condition=Ready cluster/keycloak-db \
  -n keycloak-db --timeout=10m

# Restart Keycloak pods
kubectl rollout restart statefulset/<keycloak-name> -n <namespace>

Best Practices

Upgrade Strategy

  1. Test First - Always test upgrades in non-production
  2. Backup Always - Never upgrade without recent backup
  3. Read Release Notes - Check for breaking changes
  4. Rolling Updates - Use rolling updates for zero downtime
  5. Verify Thoroughly - Test all critical flows after upgrade
  6. Monitor - Watch metrics and logs during upgrade
  7. Have Rollback Plan - Know how to rollback before starting

Maintenance Windows

Schedule upgrades during low-traffic periods:

# Check current traffic
kubectl exec -n keycloak-operator-system deployment/keycloak-operator -- \
  curl -s localhost:8081/metrics | grep keycloak_operator_reconciliation_total

# Notify users of maintenance window
# Perform upgrade
# Verify and re-enable traffic

Documentation

Document your upgrade:

  • Pre-upgrade state (versions, configurations)
  • Steps taken
  • Issues encountered
  • Resolution steps
  • Post-upgrade verification
  • Rollback procedure used (if any)