ADR-045: Active-standby HA using Kopf peering¶

Category: architecture Provenance: human

Decision¶

Support high availability via active-standby replication (replicas > 1) using Kopf's built-in peering mechanism. One replica is active (leader), others are standby. Standby replicas take over if active fails. This is for availability, not workload capacity scaling.

Rationale¶

Kopf provides built-in active-standby HA via peering - minimal implementation effort. One active replica processes all events, standbys are ready to take over. Prevents downtime during upgrades or failures. Operator is stateless so failover is fast and safe. HA is justified for production environments where brief reconciliation pauses are unacceptable. Standby replicas are not wasted - they provide fault tolerance. Important distinction - this HA mechanism does NOT improve throughput or capacity. Keycloak API is almost certainly the bottleneck, not the operator.

Agent Instructions¶

Operator supports active-standby HA via Kopf peering. Increasing replicas improves availability but does NOT increase reconciliation capacity.

Rejected Alternatives¶

No HA (single replica only)¶

Acceptable for dev/test but production environments benefit from availability during upgrades and failures.

Active-active with work distribution¶

Very complex to implement. Risk of conflicts and split-brain. Kopf's active-standby is simpler and sufficient for availability needs.