Custom Kubernetes HPA Controller

Role

DevOps Architect

Timeline

2023

Team

3 Engineers

Tech Stack

Python, Kubernetes API, Prometheus, Kafka

Overview

Metric-driven scaling for bursty workloads

A custom Kubernetes Horizontal Pod Autoscaler (HPA) that scales applications based on custom queue-depth metrics from Kafka. This solution addresses the lag inherent in standard CPU-based scaling for event-driven architectures.

The Problem

Reactive scaling lags behind traffic bursts

Standard HPA failed during sudden traffic spikes because CPU usage is a lagging indicator. By the time pods scaled up, message lag had already accumulated, causing SLA breaches for real-time data processing pipelines.

The Solution

Proactive scaling based on queue depth

We built a custom controller using the Python Kopf framework that watches Kafka consumer group lag. It calculates the required number of pods to drain the lag within a target time window and directly updates the deployment replica count via the K8s API.

The Flow

Control Loop Logic

The controller continuously polls Prometheus for Kafka metrics. If lag exceeds the threshold, it calculates the desired replica count and patches the Deployment resource. It includes a cool-down period to prevent flapping.

graph LR Kafka[(Kafka Cluster)] --> Collector[Metric Collector] Collector --> Prom[Prometheus] Prom --> CustomHPA[Custom HPA Controller] CustomHPA --> K8sAPI[K8s API Server] K8sAPI --> Pods[App Pods]

Controller Architecture

Reflection

The power of custom operators

Building a native K8s operator simplified the operational model significantly compared to running external scripts. It taught us that extending Kubernetes is often a cleaner solution than working around its limitations.

See more work