HTTP Caching and Cache Poisoning

Understanding HTTP Caching

HTTP caching is a performance optimization technique that stores copies of resources to reduce server load, bandwidth usage, and latency. When implemented correctly, caching significantly improves web application performance, but when implemented incorrectly, it can lead to security vulnerabilities.

How HTTP Caching Works

HTTP Caching Flow Diagram

Types of HTTP Caches

  • Browser Caches: Store resources locally on the user's device.
  • Proxy Caches: Intermediate caches that serve multiple users.
  • Gateway Caches: Server-side caches like reverse proxies and CDNs.
  • Application Caches: Custom caching implemented within web applications.

HTTP Caching Headers

HeaderDescriptionExample
Cache-ControlDirectives for caching mechanisms in requests and responsesCache-Control: max-age=3600, public
ExpiresDate/time after which the response is considered staleExpires: Wed, 21 Oct 2023 07:28:00 GMT
ETagUnique identifier for a specific version of a resourceETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
Last-ModifiedDate and time the resource was last modifiedLast-Modified: Wed, 21 Oct 2023 07:28:00 GMT
VarySpecifies which headers should be used to determine cache keyVary: Accept-Encoding, User-Agent

Cache Keys and Cache Operation

Web caches identify cached responses using a cache key, typically composed of:

  • The request URL (usually the primary component)
  • The HTTP method (GET, POST, etc.)
  • Headers specified in the Vary header

When a request matches a cache key, the cached response is returned without contacting the origin server. This process is what makes cache poisoning attacks possible.

Cache Key Visualization

Typical Cache Key Components
  • URL Path
  • Query Parameters
  • HTTP Method
  • Headers in Vary
Often Excluded Components
  • Custom HTTP Headers
  • Cookies
  • Request Body
  • Client IP Address

Cache Poisoning: The Basics

Cache poisoning occurs when an attacker manipulates a web cache to store and serve malicious content to users. The fundamental issue is that the cache key doesn't include all inputs that influence the response, creating a mismatch between what the cache considers "same" and what actually produces different responses.

Key Takeaways

Performance Benefits

  • Reduced server load and bandwidth usage
  • Improved page load times and user experience
  • Decreased latency for repeat visitors
  • Better scalability for high-traffic websites

Security Risks

  • Cache poisoning via unkeyed inputs
  • Information disclosure through cache deception
  • Cross-site scripting (XSS) via cached responses
  • Privacy violations from improper caching

Further Reading

Quick Reference

No Caching

Cache-Control: no-store

Public Caching

Cache-Control: public, max-age=3600

Private Caching

Cache-Control: private, max-age=3600

Validation Required

Cache-Control: no-cache