Backup & Restore Guide¶
Backup and restore procedures for Keycloak and PostgreSQL database using CloudNativePG, including automated pre-upgrade backups.
Automated Pre-Upgrade Backups¶
The operator automatically orchestrates backups before Keycloak major or minor version upgrades. The backup strategy depends on your database tier:
| Database Tier | Backup Action | Blocks Upgrade? |
|---|---|---|
| CNPG (Tier 1) | Creates a CNPG Backup CR, waits for completion |
Yes, until backup succeeds |
| Managed (Tier 2) | Creates a VolumeSnapshot of the database PVC |
Yes, until snapshot is ready |
| External (Tier 3) | Logs a warning; operator cannot back up automatically | No — warn-and-proceed |
Flat-field (legacy) config
Flat-field database configs (top-level host, database, etc. without a cnpg, managed, or external sub-object) are treated as External tier: warn-and-proceed, no automated backup (ADR-091).
Patch-level version changes (e.g., 26.0.1 to 26.0.2) skip the backup step entirely.
Configuration¶
Configure the upgrade policy in your Helm values:
keycloak:
upgradePolicy:
# Max seconds to wait for CNPG/VolumeSnapshot backup completion
backupTimeout: 600 # default: 600 (10 minutes), range: 60-3600
Or directly in the Keycloak CR:
apiVersion: vriesdemichael.github.io/v1
kind: Keycloak
metadata:
name: keycloak
spec:
upgradePolicy:
backupTimeout: 600
Behavior During Backup¶
When a version upgrade triggers a pre-upgrade backup, the operator blocks the
upgrade by raising a TemporaryError, which causes Kopf to retry reconciliation
until the backup completes. The Keycloak CR status phase does not change to a
special backup phase — it remains in its current phase (typically Ready or
Provisioning) and the operator logs indicate the backup is in progress.
For external/legacy tiers, the operator logs a warning and proceeds immediately. Users with external databases are responsible for their own backup procedures.
Monitoring Backup Progress¶
# Check Keycloak CR status phase
kubectl get keycloak <name> -n <namespace> -o jsonpath='{.status.phase}'
# Check operator logs for backup progress
kubectl logs deploy/<operator-deployment> -n <operator-namespace> | grep -i backup
# For CNPG tier: check the Backup CR
kubectl get backup -n <cnpg-namespace> -l vriesdemichael.github.io/keycloak-managed-by=keycloak-operator
# For Managed tier: check VolumeSnapshot
kubectl get volumesnapshot -n <namespace> -l vriesdemichael.github.io/keycloak-managed-by=keycloak-operator
What Gets Backed Up¶
| Component | Content | Backup Method |
|---|---|---|
| Database | Users, realms, clients, sessions | CloudNativePG barman |
| Kubernetes Resources | CRDs, manifests | kubectl export |
| Token Metadata | Token rotation state | ConfigMap backup |
| Secrets | Credentials (⚠️ encrypt) | kubectl export |
Not Backed Up: Operator code, container images (use image registry).
Quick Backup¶
One-Command Backup¶
#!/bin/bash
BACKUP_DIR="keycloak-backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p ${BACKUP_DIR}
# Backup Kubernetes resources
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces -o yaml \
> ${BACKUP_DIR}/resources.yaml
# Backup token metadata
kubectl get configmap keycloak-operator-token-metadata \
-n keycloak-operator-system -o yaml \
> ${BACKUP_DIR}/token-metadata.yaml
# Trigger database backup
kubectl cnpg backup keycloak-db -n keycloak-db
echo "Backup complete: ${BACKUP_DIR}"
Database Backup¶
Configure Automatic Backups¶
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: keycloak-db
namespace: keycloak-db
spec:
instances: 3
backup:
barmanObjectStore:
destinationPath: s3://my-backup-bucket/keycloak-db
s3Credentials:
accessKeyId:
name: backup-s3-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-s3-credentials
key: ACCESS_SECRET_KEY
wal:
compression: gzip
data:
compression: gzip
retentionPolicy: "30d"
Scheduled Backups¶
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: keycloak-db-daily
namespace: keycloak-db
spec:
schedule: "0 2 * * *" # Daily at 2 AM
backupOwnerReference: self
cluster:
name: keycloak-db
Manual Backup¶
# Trigger backup
kubectl cnpg backup keycloak-db -n keycloak-db
# List backups
kubectl get backup -n keycloak-db
# Check backup status
kubectl describe backup <backup-name> -n keycloak-db
Kubernetes Resources Backup¶
Backup Script¶
#!/bin/bash
BACKUP_DIR="k8s-backup-$(date +%Y%m%d)"
mkdir -p ${BACKUP_DIR}
# Backup all Keycloak resources
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces -o yaml \
> ${BACKUP_DIR}/keycloak-resources.yaml
# Backup operator configuration
helm get values keycloak-operator -n keycloak-operator-system \
> ${BACKUP_DIR}/operator-values.yaml
# Backup token metadata
kubectl get configmap keycloak-operator-token-metadata \
-n keycloak-operator-system -o yaml \
> ${BACKUP_DIR}/token-metadata.yaml
# Backup CRDs
kubectl get crd -o yaml | grep -A1000 "vriesdemichael.github.io" \
> ${BACKUP_DIR}/crds.yaml
# Backup secrets (⚠️ ENCRYPT THIS FILE)
kubectl get secret --all-namespaces -l vriesdemichael.github.io/managed-by=keycloak-operator \
-o yaml > ${BACKUP_DIR}/secrets.yaml
echo "Backup saved to: ${BACKUP_DIR}"
echo "⚠️ IMPORTANT: Encrypt secrets.yaml before storing!"
Encrypt Secrets¶
# Using GPG
gpg --symmetric --cipher-algo AES256 ${BACKUP_DIR}/secrets.yaml
# Using age
age -p ${BACKUP_DIR}/secrets.yaml > ${BACKUP_DIR}/secrets.yaml.age
# Remove plaintext
rm ${BACKUP_DIR}/secrets.yaml
Database Restore¶
Full Cluster Restore¶
# 1. Delete existing cluster (⚠️ DOWNTIME)
kubectl delete cluster keycloak-db -n keycloak-db
# 2. Wait for PVCs to be deleted
kubectl get pvc -n keycloak-db
# 3. Create restore manifest
cat <<EOF | kubectl apply -f -
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: keycloak-db
namespace: keycloak-db
spec:
instances: 3
bootstrap:
recovery:
source: keycloak-db-backup
externalClusters:
- name: keycloak-db-backup
barmanObjectStore:
destinationPath: s3://my-backup-bucket/keycloak-db
s3Credentials:
accessKeyId:
name: backup-s3-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-s3-credentials
key: ACCESS_SECRET_KEY
EOF
# 4. Wait for restore
kubectl wait --for=condition=Ready cluster/keycloak-db \
-n keycloak-db --timeout=10m
# 5. Verify data
kubectl exec -it -n keycloak-db keycloak-db-1 -- \
psql -U keycloak -d keycloak -c "SELECT COUNT(*) FROM public.realm;"
Point-in-Time Restore¶
bootstrap:
recovery:
source: keycloak-db-backup
recoveryTarget:
targetTime: "2025-01-15 10:00:00+00" # UTC timestamp
Restore to New Cluster¶
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: keycloak-db-restored # Different name
namespace: keycloak-db
spec:
instances: 3
bootstrap:
recovery:
source: keycloak-db-backup
externalClusters:
- name: keycloak-db-backup
barmanObjectStore:
destinationPath: s3://my-backup-bucket/keycloak-db
s3Credentials:
accessKeyId:
name: backup-s3-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-s3-credentials
key: ACCESS_SECRET_KEY
Kubernetes Resources Restore¶
Restore All Resources¶
# 1. Restore CRDs first
kubectl apply -f k8s-backup-20250115/crds.yaml
# 2. Restore secrets (decrypt first)
gpg --decrypt k8s-backup-20250115/secrets.yaml.gpg | kubectl apply -f -
# 3. Restore token metadata
kubectl apply -f k8s-backup-20250115/token-metadata.yaml
# 4. Restore Keycloak resources
kubectl apply -f k8s-backup-20250115/keycloak-resources.yaml
# 5. Verify
kubectl get keycloak,keycloakrealm,keycloakclient --all-namespaces
Selective Restore¶
# Restore single realm
kubectl get -f k8s-backup-20250115/keycloak-resources.yaml \
keycloakrealm/my-realm -n my-app -o yaml | kubectl apply -f -
# Restore single namespace
kubectl get -f k8s-backup-20250115/keycloak-resources.yaml \
--namespace=my-app -o yaml | kubectl apply -f -
Disaster Recovery Procedures¶
Scenario 1: Database Corruption¶
Symptoms: Data integrity errors, query failures.
Recovery:
# 1. Scale down Keycloak (prevent new writes)
kubectl scale keycloak keycloak -n keycloak-system --replicas=0
# 2. Restore database from backup (see above)
# 3. Verify database integrity
kubectl exec -it -n keycloak-db keycloak-db-1 -- \
psql -U keycloak -d keycloak -c "
SELECT tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables WHERE schemaname = 'public' LIMIT 5;
"
# 4. Scale up Keycloak
kubectl scale keycloak keycloak -n keycloak-system --replicas=3
# 5. Test authentication
RTO: 15-30 minutes RPO: Time since last backup
Scenario 2: Accidental Resource Deletion¶
Symptoms: Realm/client deleted from Kubernetes and Keycloak.
Recovery:
# 1. Find resource in backup
grep -A50 "name: my-realm" k8s-backup-20250115/keycloak-resources.yaml
# 2. Restore resource
kubectl apply -f - <<EOF
# (paste resource YAML)
EOF
# 3. Verify reconciliation
kubectl describe keycloakrealm my-realm -n my-app
RTO: 5-10 minutes RPO: Last backup time
Scenario 3: Complete Cluster Loss¶
Symptoms: Entire Kubernetes cluster destroyed.
Recovery:
# 1. Deploy new Kubernetes cluster
# 2. Install operators
helm install cnpg cnpg/cloudnative-pg -n cnpg-system --create-namespace
helm install keycloak-operator ./charts/keycloak-operator -n keycloak-operator-system --create-namespace
# 3. Restore database
# (Use Full Cluster Restore procedure above)
# 4. Restore Kubernetes resources
# (Use Kubernetes Resources Restore procedure above)
# 5. Verify end-to-end
RTO: 2-4 hours RPO: Last backup time
Backup Verification¶
Test Restore Monthly¶
#!/bin/bash
# Monthly backup test script
# 1. Create test namespace
kubectl create namespace backup-test
# 2. Restore database to test cluster
cat <<EOF | kubectl apply -f -
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: keycloak-db-test
namespace: backup-test
spec:
instances: 1
bootstrap:
recovery:
source: keycloak-db-backup
externalClusters:
- name: keycloak-db-backup
barmanObjectStore:
destinationPath: s3://my-backup-bucket/keycloak-db
s3Credentials:
accessKeyId:
name: backup-s3-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-s3-credentials
key: ACCESS_SECRET_KEY
EOF
# 3. Wait for restore
kubectl wait --for=condition=Ready cluster/keycloak-db-test -n backup-test --timeout=10m
# 4. Verify data
kubectl exec -it -n backup-test keycloak-db-test-1 -- \
psql -U keycloak -d keycloak -c "
SELECT COUNT(*) FROM public.realm;
SELECT COUNT(*) FROM public.user_entity;
SELECT COUNT(*) FROM public.client;
"
# 5. Cleanup
kubectl delete namespace backup-test
echo "Backup test complete ✓"
Backup Monitoring¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: backup-alerts
namespace: keycloak-db
spec:
groups:
- name: backups
rules:
- alert: BackupFailed
expr: increase(cnpg_backup_failures_total[1h]) > 0
labels:
severity: critical
annotations:
summary: "Backup failed for {{ $labels.cluster }}"
- alert: BackupOld
expr: time() - cnpg_backup_last_success_timestamp > 86400
labels:
severity: warning
annotations:
summary: "No successful backup in 24h for {{ $labels.cluster }}"
Best Practices¶
1. Backup Frequency¶
| Environment | Database | Kubernetes Resources | Retention |
|---|---|---|---|
| Production | Hourly | Daily | 30 days |
| Staging | Daily | Weekly | 14 days |
| Development | Daily | Weekly | 7 days |
2. Storage Strategy¶
- Primary: S3/GCS/Azure Blob (encrypted)
- Secondary: Different region/provider
- Tertiary: Offline/tape (compliance)
3. Encryption¶
Always encrypt backups containing: - Kubernetes secrets - Database dumps - Token metadata
4. Testing¶
- Monthly restore tests (automated)
- Quarterly disaster recovery drills
- Document restore procedures
- Train team on restore process
5. Retention¶
Troubleshooting¶
Backup Fails with S3 Error¶
# Test S3 access
kubectl run aws-cli --rm -it --image=amazon/aws-cli -- \
s3 ls s3://my-backup-bucket/ --region us-east-1
# Verify credentials
kubectl get secret backup-s3-credentials -n keycloak-db -o yaml
Restore Hangs¶
# Check cluster events
kubectl describe cluster keycloak-db -n keycloak-db
# Check pod logs
kubectl logs -n keycloak-db keycloak-db-1
# Verify backup exists
kubectl run aws-cli --rm -it --image=amazon/aws-cli -- \
s3 ls s3://my-backup-bucket/keycloak-db/base/
Data Mismatch After Restore¶
# Check backup timestamp
kubectl describe backup <backup-name> -n keycloak-db
# Verify you restored correct backup
# Consider point-in-time recovery if needed