Cache Problems & Real World Solutions

Caching issues can turn a snappy application into a sluggish mess (or a total outage) in seconds. Here are some real-world-inspired scenarios and how high-scale companies handle them.

1. Thundering Herd (The “Midnight Expiry”)

This happens when you set a fixed TTL (Time To Live) for a large batch of data.

Case Study: A major E-commerce platform during a holiday sale.
- The Scenario: They cached 100,000 product descriptions at exactly 12:00 AM with a 24-hour expiry. At 12:00 AM the next day, all 100,000 cache keys expired simultaneously.
- The Result: The next wave of user requests found “Cache Misses” for everything. The database was suddenly hit with 50,000+ concurrent queries to “re-warm” the cache, causing a database CPU spike to 100% and crashing the site.
- The Fix: Jitter. By adding a random offset to the TTL (e.g., 24 hours + rand(0, 300) seconds), the expirations are staggered over several minutes rather than hitting all at once.

2. Cache Penetration (The “Ghost Key” Attack)

This occurs when requests are made for data that exists neither in the cache nor the database.

Case Study: A Social Media Startup under a malicious bot attack.
- The Scenario: An attacker used a script to request profiles with non-existent IDs (e.g., example.com/user/-9999 or random UUIDs).
- The Result: The application checked the cache (Miss) and then queried the database (Null). Since the result was null, nothing was cached. The attacker sent millions of these requests, bypassing the cache entirely and overwhelming the database disk I/O.
- The Fix: Bloom Filters. They implemented a Bloom filter—a space-efficient probabilistic data structure—at the application level.
- If the Bloom filter says “No,” the app rejects the request immediately without even touching the database.

3. Cache Breakdown (The “Hot Key” Collapse)

Unlike Thundering Herd, this involves a single extremely popular key (a “Hot Key”).

Case Study: A News Outlet during a breaking world event.
- The Scenario: A viral news article was being requested 10,000 times per second. The cache key for that article expired.
- The Result: In the few milliseconds it took for the first “miss” to fetch the data from the DB and write it back to the cache, 5,000 other concurrent requests also saw a “miss” and rushed the database for the exact same row. This is often called “Cache Stampede.”
- The Fix: Mutex Locks. The first request to see a “miss” acquires a lock. Other requests for that same key are told to wait or “sleep” for a few milliseconds until the first request updates the cache.

4. Cache Crash (The “Total Blackout”)

This is the nightmare scenario where the entire caching layer (e.g., Redis cluster) goes offline.

Case Study:Facebook (2010 Outage).
- The Scenario: An automated system attempted to fix a configuration error but instead triggered a feedback loop that took down their caching cluster.
- The Result: With the cache down, the massive volume of traffic shifted directly to the databases. The databases, not scaled for that kind of load, buckled instantly.
- The Fix: Circuit Breakers and Multi-Level Caching. Modern architectures use a “Circuit Breaker” pattern. If the cache is down, the system might serve “stale” data from a local backup or simply return an error/limited version of the site to protect the database from a total permanent crash.

Comparison Summary

Problem	Target	Primary Solution
Thundering Herd	Many keys	TTL Jitter
Penetration	Non-existent keys	Bloom Filter / Cache Nulls
Breakdown	One “Hot” key	Mutex Locks (SetNX)
Crash	Entire System	Circuit Breakers / Redundancy

Cache Problems & Real World Solutions

1. Thundering Herd (The “Midnight Expiry”)

2. Cache Penetration (The “Ghost Key” Attack)

3. Cache Breakdown (The “Hot Key” Collapse)

4. Cache Crash (The “Total Blackout”)

Comparison Summary

Published by shashwatsai

Leave a Comment Cancel reply

1. Thundering Herd (The “Midnight Expiry”)

2. Cache Penetration (The “Ghost Key” Attack)

3. Cache Breakdown (The “Hot Key” Collapse)

4. Cache Crash (The “Total Blackout”)

Comparison Summary

Share this:

Published by shashwatsai

Leave a Comment Cancel reply