Migration & Upgrade Guide¶
This guide covers upgrading the Keycloak operator and comparing this operator with the official Keycloak operator.
Table of Contents¶
- Upgrading the Operator
- Upgrading Keycloak Version
- Comparison with Official Keycloak Operator
- Backup & Rollback
Upgrading the Operator¶
Pre-Upgrade Checklist¶
- Backup current state - Export all Keycloak resources
- Review release notes - Check for breaking changes
- Test in non-production - Upgrade staging environment first
- Check database backups - Ensure recent backup exists
- Document current versions - Record operator and Keycloak versions
Step 1: Backup Current State¶
# Backup all Keycloak resources
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces -o yaml \
> keycloak-resources-backup-$(date +%Y%m%d).yaml
# Backup operator configuration
helm get values keycloak-operator -n keycloak-operator-system \
> operator-values-backup-$(date +%Y%m%d).yaml
# Backup CRDs
kubectl get crd -o yaml | grep -A1000 "vriesdemichael.github.io" \
> crds-backup-$(date +%Y%m%d).yaml
Step 2: Check Current Version¶
# Get current operator version
helm list -n keycloak-operator-system
# Get operator image version
kubectl get deployment keycloak-operator -n keycloak-operator-system \
-o jsonpath='{.spec.template.spec.containers[0].image}'
Step 3: Review Release Notes¶
Check the Releases Page for: - Breaking changes - New features - Bug fixes - Migration requirements
Step 4: Upgrade Operator (Helm)¶
# Check available versions (OCI)
helm show chart oci://ghcr.io/vriesdemichael/charts/keycloak-operator
# Upgrade operator
helm upgrade keycloak-operator oci://ghcr.io/vriesdemichael/charts/keycloak-operator \
--namespace keycloak-operator-system \
--values operator-values-backup-$(date +%Y%m%d).yaml \
--version <version> \
--wait
Important: The --wait flag ensures the upgrade completes before returning.
Step 5: Verify Upgrade¶
# Check operator pods are running new version
kubectl get pods -n keycloak-operator-system
# Check operator logs for startup
kubectl logs -n keycloak-operator-system -l app=keycloak-operator --tail=50
# Verify CRDs updated
kubectl get crd keycloaks.vriesdemichael.github.io -o yaml | grep -A5 version
# Check all resources still healthy
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces
All resources should remain in Ready phase.
Step 6: Test Reconciliation¶
# Trigger reconciliation on a test realm
kubectl annotate keycloakrealm <test-realm> -n <test-namespace> \
reconcile=$(date +%s) --overwrite
# Watch logs
kubectl logs -n keycloak-operator-system -l app=keycloak-operator -f
# Verify realm still Ready
kubectl get keycloakrealm <test-realm> -n <test-namespace>
Rollback Procedure¶
If upgrade fails:
# Rollback Helm release
helm rollback keycloak-operator -n keycloak-operator-system
# Verify operator rolled back
kubectl get pods -n keycloak-operator-system
# Check resources still healthy
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces
Important: CRD changes cannot be automatically rolled back. You may need to manually restore CRDs from backup:
Upgrading Keycloak Version¶
Supported Keycloak Versions¶
- Minimum: Keycloak 25.0.0 (management port 9000 requirement)
- Recommended: Keycloak 26.0.0+
- Maximum: Latest Keycloak release
Automated Pre-Upgrade Backups¶
Automatic for CNPG and Managed tiers
When upgrading Keycloak to a new major or minor version, the operator automatically creates a backup before applying the change. Patch-level upgrades (e.g., 26.0.1 → 26.0.2) skip this step.
The backup behavior depends on your database tier:
- CNPG: Creates a CNPG
BackupCR and waits for completion before proceeding. - Managed: Creates a
VolumeSnapshotof the database PVC. - External: Cannot back up automatically. The operator logs a warning and proceeds. Users with external databases must handle backups independently before upgrading. Flat-field (legacy) configs are treated identically (ADR-091).
See Backup & Restore: Automated Pre-Upgrade Backups for full configuration details.
Pre-Upgrade Checklist¶
- Check Keycloak release notes - Review breaking changes
- Verify backup configuration - Ensure
upgradePolicysettings match your requirements - Test in non-production - Verify compatibility
- Schedule maintenance window - Plan for brief downtime (rolling update) or zero-downtime (blue-green, Phase 3)
Upgrade Strategy¶
Blue-Green Deployment (Future — Phase 3): 1. Deploy new Keycloak version alongside old version 2. Migrate database schema via Liquibase job 3. Switch traffic to new version 4. Keep old version for quick rollback 5. Remove old version after verification
Rolling Update (Current): 1. Update Keycloak resource with new image tag 2. Operator triggers pre-upgrade backup (automatic for CNPG/Managed) 3. Operator performs rolling update 4. Brief downtime during pod restarts
Rolling Update Procedure¶
# Check current Keycloak version
kubectl get keycloak <name> -n <namespace> \
-o jsonpath='{.spec.image.tag}'
# Update to new version
kubectl patch keycloak <name> -n <namespace> --type=merge -p '
spec:
image:
tag: "26.0.0"
'
# Watch rollout
kubectl rollout status statefulset/<keycloak-name> -n <namespace>
# Verify all pods running new version
kubectl get pods -n <namespace> -l app=keycloak \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
Verify Upgrade¶
# Check Keycloak status
kubectl get keycloak <name> -n <namespace>
# Should show PHASE=Ready
# Check all realms still working
kubectl get keycloakrealm --all-namespaces
# Test OAuth2 flow
# (Use test client to verify authentication)
# Check database schema version
kubectl exec -it -n <namespace> <keycloak-pod> -- \
psql -h <db-host> -U keycloak -d keycloak \
-c "SELECT * FROM databasechangelog ORDER BY orderexecuted DESC LIMIT 5;"
Rollback to Previous Version¶
# Revert to previous image tag
kubectl patch keycloak <name> -n <namespace> --type=merge -p '
spec:
image:
tag: "25.0.6"
'
# Watch rollout
kubectl rollout status statefulset/<keycloak-name> -n <namespace>
# Verify rollback
kubectl get pods -n <namespace> -l app=keycloak
Note: Keycloak database migrations are forward-only. Rolling back may require database restore if schema was upgraded.
Cache Isolation During Upgrades¶
When running multiple Keycloak pods, all members must form a single Infinispan/JGroups cluster to share distributed caches (user sessions, action tokens, login flows in progress). If pods from two different major versions try to cluster together during a rolling upgrade, they may encounter serialization incompatibilities in the JGroups protocol, causing subtle split-brain issues.
The cacheIsolation feature solves this by restricting which pods can join the same JGroups cluster using a Kubernetes label selector on the headless discovery service.
Recommended: autoRevision: true for semver images¶
If you tag your Keycloak image with a proper semver version (e.g., quay.io/keycloak/keycloak:26.4.1), enable automatic revision-based isolation:
# keycloak-values.yaml
keycloak:
image: quay.io/keycloak/keycloak
version: "26.4.1"
cacheIsolation:
autoRevision: true
The operator derives the cluster identity from the major version only (v26), so patch and minor upgrades (e.g., 26.4.1 → 26.5.0) remain in the same cluster. A major upgrade (e.g., 26.x → 27.x) automatically creates a new isolated cluster.
Pod label set by the operator:
The discovery service selector is updated to match the current major version on every reconcile loop, so stale selectors after upgrades are corrected automatically.
Alternative: clusterName for non-semver or custom names¶
If you use non-semver tags (e.g., :nightly, :latest, or custom CI builds), use an explicit cluster name:
:latest and autoRevision
If autoRevision: true is set but the image tag is non-semver (:latest, :nightly, SHA digest), the operator cannot determine a major version. It will log a warning and disable cache isolation for that instance. You must use clusterName or a semver-tagged image in this case.
What is isolated — and what survives an upgrade¶
| Data | Survives upgrade? | Why |
|---|---|---|
| User sessions | ✅ Yes | Stored in the database |
| Realm & client config | ✅ Yes | Stored in the database |
| Offline tokens | ✅ Yes | Stored in the database |
| In-progress login flows (action tokens) | ⚠️ Lost during upgrade window | Stored in Infinispan only |
| Active SSO sessions (not yet flushed) | ⚠️ May be lost | Flushed periodically to DB |
Users mid-flow during the rolling upgrade window (typically seconds to a few minutes) may need to restart the flow. This is the same behaviour as any rolling pod restart.
Priority of cacheIsolation options¶
If multiple fields are set, the operator applies this resolution order (highest priority first):
clusterName— explicit static name, always winsautoRevision— derives<name>-v<major>from image tagautoSuffix— appends the full image tag as-is- No isolation — pods join Keycloak's default cluster
Comparison with Official Keycloak Operator¶
Overview¶
| Aspect | This Operator | Official Keycloak Operator |
|---|---|---|
| Primary Focus | GitOps-native, multi-tenant | General Keycloak deployment |
| Language | Python (Kopf) | Go (Operator SDK) |
| CRDs | Keycloak, KeycloakRealm, KeycloakClient | Keycloak, KeycloakRealmImport |
| Authorization | Namespace grant lists + RBAC | RBAC + direct access |
| Multi-tenancy | First-class support | Limited |
| GitOps Compatibility | Excellent | Good |
| Secret Management | Kubernetes-native | Kubernetes + Keycloak |
| Database | CloudNativePG (CNPG) primary | External PostgreSQL |
When to Use This Operator¶
✅ Choose this operator if: - Multi-tenant environment (10+ teams) - GitOps-first workflow (ArgoCD, Flux) - Strong namespace isolation required - Declarative authorization via grant lists - CloudNativePG database management preferred
When to Use Official Operator¶
✅ Choose official operator if: - Single-tenant environment - Need Keycloak's built-in security model - Organization policy requires official/upstream operators - Integration with Red Hat/RHSSO required - Prefer Go-based operators - Need features not yet in this operator
Feature Comparison¶
Realm Management¶
| Feature | This Operator | Official Operator |
|---|---|---|
| Declarative realm config | ✅ KeycloakRealm CRD | ✅ KeycloakRealmImport |
| Live realm updates | ✅ Automatic reconciliation | ⚠️ Import-based |
| Drift detection | ✅ Built-in | ❌ Not supported |
| Multi-namespace realms | ✅ Fully supported | ⚠️ Limited |
| Realm deletion | ✅ Automatic | ⚠️ Manual |
Client Management¶
| Feature | This Operator | Official Operator |
|---|---|---|
| Declarative client config | ✅ KeycloakClient CRD | ⚠️ Via RealmImport |
| Client secret management | ✅ Automatic Kubernetes secret | ⚠️ Via RealmImport |
| Protocol mappers | ✅ CRD support | ✅ Via RealmImport |
| Service accounts | ✅ CRD support | ✅ Via RealmImport |
| Cross-namespace clients | ✅ Fully supported | ❌ Not supported |
Security Model¶
| Feature | This Operator | Official Operator |
|---|---|---|
| Authorization method | Namespace Grant + RBAC | Keycloak admin credentials |
| Client secret rotation | ✅ Automatic | ❌ Manual |
| Multi-tenant isolation | ✅ Namespace Grant Lists | ⚠️ RBAC-based |
| Audit trail | ✅ K8s API + ConfigMap | ⚠️ Keycloak logs |
| Secret distribution | ✅ GitOps-friendly | ⚠️ Manual |
Operations¶
| Feature | This Operator | Official Operator |
|---|---|---|
| Database management | ✅ CNPG integration | ⚠️ External required |
| Backup/restore | ✅ Via CNPG | ⚠️ Manual |
| High availability | ✅ Multi-replica support | ✅ Multi-replica support |
| Monitoring | ✅ Prometheus metrics | ✅ Prometheus metrics |
| Rate limiting | ✅ Built-in API rate limiting | ❌ Not supported |
Migration from Official Operator¶
Not Automated - Migration requires manual steps:
-
Export data from existing Keycloak:
-
Deploy this operator alongside (different namespace)
-
Create new Keycloak instance with this operator
-
Import realm exports:
- Create KeycloakRealm CRDs based on exports
-
Create KeycloakClient CRDs for each client
-
Switch application traffic to new Keycloak
-
Decommission old operator after verification
Note: Direct migration is complex. Recommend running both operators in parallel during transition.
Backup & Rollback¶
Pre-Upgrade Backup¶
Always backup before major changes:
# Full backup script
#!/bin/bash
BACKUP_DIR="keycloak-backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p ${BACKUP_DIR}
# Backup resources
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces -o yaml \
> ${BACKUP_DIR}/resources.yaml
# Backup operator config
helm get values keycloak-operator -n keycloak-operator-system \
> ${BACKUP_DIR}/operator-values.yaml
# Backup CRDs
kubectl get crd -o yaml | grep -A1000 "vriesdemichael.github.io" \
> ${BACKUP_DIR}/crds.yaml
# Backup database (if using CNPG)
kubectl cnpg backup keycloak-db -n keycloak-db
echo "Backup complete: ${BACKUP_DIR}"
Database Backup (CloudNativePG)¶
# Trigger manual backup
kubectl cnpg backup keycloak-db -n keycloak-db
# List backups
kubectl get backup -n keycloak-db
# Verify backup succeeded
kubectl describe backup <backup-name> -n keycloak-db
Restore from Backup¶
Restore Kubernetes Resources:
# Restore all resources
kubectl apply -f keycloak-backup-<date>/resources.yaml
# Verify resources restored
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces
Restore Database (see Backup & Restore Guide):
# Restore from specific backup
kubectl cnpg restore keycloak-db \
--backup <backup-name> \
--namespace keycloak-db
Rollback Operator¶
# Rollback to previous Helm release
helm rollback keycloak-operator -n keycloak-operator-system
# Or rollback to specific revision
helm history keycloak-operator -n keycloak-operator-system
helm rollback keycloak-operator <revision> -n keycloak-operator-system
# Verify rollback
kubectl get pods -n keycloak-operator-system
Emergency Procedures¶
Operator Completely Broken:
# Uninstall operator (resources remain)
helm uninstall keycloak-operator -n keycloak-operator-system
# Resources continue working (Keycloak still serves traffic)
# Reinstall operator when ready:
helm install keycloak-operator ./charts/keycloak-operator \
--namespace keycloak-operator-system \
--values operator-values-backup.yaml
Keycloak Database Corrupted:
# Restore from backup (requires downtime)
kubectl delete cluster keycloak-db -n keycloak-db
kubectl cnpg restore keycloak-db \
--backup <backup-name> \
--namespace keycloak-db
# Wait for database to come back
kubectl wait --for=condition=Ready cluster/keycloak-db \
-n keycloak-db --timeout=10m
# Restart Keycloak pods
kubectl rollout restart statefulset/<keycloak-name> -n <namespace>
Best Practices¶
Upgrade Strategy¶
- Test First - Always test upgrades in non-production
- Backup Always - Never upgrade without recent backup
- Read Release Notes - Check for breaking changes
- Rolling Updates - Use rolling updates for zero downtime
- Verify Thoroughly - Test all critical flows after upgrade
- Monitor - Watch metrics and logs during upgrade
- Have Rollback Plan - Know how to rollback before starting
Maintenance Windows¶
Schedule upgrades during low-traffic periods:
# Check current traffic
kubectl exec -n keycloak-operator-system deployment/keycloak-operator -- \
curl -s localhost:8081/metrics | grep keycloak_operator_reconciliation_total
# Notify users of maintenance window
# Perform upgrade
# Verify and re-enable traffic
Documentation¶
Document your upgrade:
- Pre-upgrade state (versions, configurations)
- Steps taken
- Issues encountered
- Resolution steps
- Post-upgrade verification
- Rollback procedure used (if any)