Skip to content

ADR-045: Active-standby HA using Kopf peering

Category: architecture Provenance: human

Decision

Support high availability via active-standby replication (replicas > 1) using Kopf's built-in peering mechanism. One replica is active (leader), others are standby. Standby replicas take over if active fails. This is for availability, not workload capacity scaling.

Rationale

Kopf provides built-in active-standby HA via peering - minimal implementation effort. One active replica processes all events, standbys are ready to take over. Prevents downtime during upgrades or failures. Operator is stateless so failover is fast and safe. HA is justified for production environments where brief reconciliation pauses are unacceptable. Standby replicas are not wasted - they provide fault tolerance. Important distinction - this HA mechanism does NOT improve throughput or capacity. Keycloak API is almost certainly the bottleneck, not the operator.

Agent Instructions

Operator supports active-standby HA via Kopf peering. Increasing replicas improves availability but does NOT increase reconciliation capacity.

Rejected Alternatives

No HA (single replica only)

Acceptable for dev/test but production environments benefit from availability during upgrades and failures.

Active-active with work distribution

Very complex to implement. Risk of conflicts and split-brain. Kopf's active-standby is simpler and sufficient for availability needs.