[kt cloud TECH UP] The Auto-Scaling Trap: Why HPA Won’t Save Your Ticketing Server

Welcome to Part 3 of the kt cloud TECH UP mentoring retrospective series. Moving beyond backend code, today we discuss the operational challenges we faced regarding auto-scaling, specifically the 'Auto-Scaling Trap'.)

1. The Junior’s "Perfect" Infrastructure Plan

While architecting the marathon ticketing platform, our infrastructure lead and backend team proudly finalized an undeniably "textbook-perfect" response strategy built entirely upon Kubernetes (K8s).

The Master Plan:

During normal tranquil hours, to minimize cloud billing, we maintain the absolute minimum number of Pods (containers) running.
At the precise moment ticketing opens (e.g., exactly 10:00 AM), massive traffic floods in. When CPU utilization breaches the 70% threshold...
The HPA (Horizontal Pod Autoscaler) magically detects this and automatically commands K8s to fiercely scale out our Pod count from 2, to 4, to 10! We will bask in the glorious benefits of Cloud-Native technology!

It was a flawless theoretical design. However, the kt cloud Infrastructure Mentor shattered our dreams with a heavy dose of empirical reality.

2. The Mentor's Diagnosis: "Never Underestimate a Flash Crowd"

Upon hearing our confident scaling strategy, the mentor sighed lightly and revealed the gruesome reality of high-stakes live operations.

Mentor (kt cloud Infrastructure Team): "Major events like concert ticketing or Black Friday sales do not involve 'gradually increasing traffic'. It is a Flash Crowd phenomenon—traffic violently spikes vertically from 0 to 15,000 requests in literally one second.

Consider the lifecycle: K8s takes time to detect the CPU load spike. HPA calculates the needed nodes. K8s commands new Pods to spawn. Docker images are pulled. Finally, the heavyweight Spring Boot application initiates and reports 'Ready'. This entire cold-start sequence takes anywhere from several dozens of seconds to a few minutes.

During those several dozens of seconds it takes for reinforcements to arrive, your tiny cluster of original, minimal Pods takes the full brute force of 15,000 requests straight to the face. They will all instantly crash from OutOfMemory (OOM) errors. Auto-scaling will eventually spin up new servers... long after your system has already died."

This explanation highlighted that we had relied too heavily on automation, overlooking the physical reality of Cold Start Latency and Warm-up time/guarantees.

3. The Enterprise Standard: Pre-Provisioning (Infra Warm-up)

The mentor then taught us the true operational secret utilized by infrastructure teams handling nationwide events (e.g., massive holiday train reservations). The answer is 'Pre-Provisioning' (Warm-up).

Before the Event: Never blindly trust auto-scaling. Hours before the event (or even the night before), engineers manually (or via scheduled scripts) aggressively force the K8s cluster to Scale-Out to the absolute maximum expected peak capacity (Max Replicas).
During the Event: An overwhelming legion of fully booted, pre-warmed servers peacefully absorbs the chaotic flash crowd. During this phase, auto-scaling is merely relegated to acting as a 'last-resort safety net' just in case traffic somehow miraculously exceeds historical maximums.
After the Event: Once the traffic tide finally recedes, the infra team executes a Scale-In, returning the cluster to minimal capacity to conserve budget.

In the context of violent transactional servers (like ticketing), Auto-scaling is not a bulletproof shield. It is merely a seatbelt.

4. Knowing Cloud vs. Operating Cloud

We were taught in textbooks that Auto-scaling is "the magic that seamlessly grows with your user base." Yet, in live operations, we learned that the tiny latency penalty hidden behind that automation could subject tens of thousands of users to a catastrophic "502 Bad Gateway" crash screen.

Mastering cloud-native technology does not mean blindly applying a K8s YAML configuration file. It means understanding the precise mathematical nature of your service's traffic (gradual incline vs. vertical flash crowd) and ruthlessly mitigating the latency limitations of the tools you deploy.

Our next technical dilemma revolves around a fierce debate between the Frontend and Backend regarding authentication architecture: "Between Next.js and Spring Boot, who truly bears the responsibility of Auth?"