This document summarizes a presentation about caching at different levels of a web application stack. It discusses caching client-side assets with HTTP headers, full-page caching with tools like Varnish, fragment caching of dynamic content, object caching with memcached, and database query caching. It also covers challenges like cache invalidation, fragmentation, and complexity that caching can introduce. The presentation emphasizes measuring performance to determine if caching is worthwhile.
1. Caching Up and Down
the Stack
Long Island/Queens Django Meetup 5/20/14
2. Hi, I’m Dan Kuebrich
● Software engineer, python fan
● Web performance geek
● Founder of Tracelytics, now part of AppNeta
● Once (and future?) Queens resident
5. What is “caching”?
● Caching is avoiding doing expensive work
o by doing cheaper work
● Common examples?
o On repeat visits, your browser doesn’t download
images that haven’t changed
o Your CPU caches instructions, data so it doesn’t
have to go to RAM… or to disk!
9. “Latency Numbers Every Programmer Should Know”
Systems Performance: Enterprise and the Cloud by Brendan Gregg
http://books.google.com/books?id=xQdvAQAAQBAJ&pg=PA20&lpg=PA20&source=bl&ots=hlTgyxdrnR&sig=CCjddHrY1H6muMVW9BFcbdO7DDo&hl=en&sa=X&ei=dS7oUquhOYr9oAT9oYGoDw&ved=0CCkQ6AEwAA#v=onepage
&q&f=false
10. A whole mess of caching:
● Browser cache
● CDN
● Proxy / optimizer
● Application-based
o Full-page
o Fragment
o Object cache
● Database
o Query cache
o Denormalization
Closer to the user
Closer to the data
14. Client-side assets
● Use HTTP caches!
o Browser
o CDN
o Intermediate proxies
● Set policy with cache headers
o Cache-Control / Expires
o ETag / Last-Modified
15. HTTP Cache-Control and Expires
● Stop the browser from even asking for it
● Expires
o Pick a date in the future, good til then
● Cache-control
o More flexible
o Introduced in HTTP 1.1
o Use this one
16. HTTP Cache-Control and Expires
dan@JLTM21:~$ curl -I https://login.tv.appneta.com/cache/tl-layouts_base_unauth-
compiled-162c2ceecd9a7ff1e65ab460c2b99852a49f5a43.css
HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=315360000
Content-length: 5955
Content-Type: text/css
Date: Tue, 20 May 2014 23:12:16 GMT
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Last-Modified: Fri, 16 May 2014 20:51:19 GMT
Server: nginx
Connection: keep-alive
17. HTTP Cache Control in Django
https://docs.djangoproject.com/en/dev/topics/cache/
21. ETag vs Last-Modified
● Last-Modified is date-based
● ETag is content-based
● Most webservers generate both
● Some webservers (Apache) generate etags
that depend on local state
o If you have a load-balanced pool of servers working
here, they might not be using the same etags!
22. A whole mess of caching:
● Browser cache
● CDN
● Proxy / optimizer
● Application-based
o Full-page
o Fragment
o Object cache
● Database
o Query cache
o Denormalization
23. CDNs
● Put content closer to your end-users
o and offload HTTP requests from
your servers
● Best for static assets
● Same cache control policies apply
26. A whole mess of caching:
● Browser cache
● CDN
● Proxy / optimizer
● Application-based
o Full-page
o Fragment
o Object cache
● Database
o Query cache
o Denormalization
35. Object caching
def get_item_by_id(key):
# Check in cache
val = mc.get(key)
# If exists, return it
if val:
return val
# If not, get the val, store it in the cache
val = return session.query(User)
.filter_by(id=key)
.first()
mc.set(key, val)
return val
36. Object caching
@decorator
def cache(expensive_func, key):
# Check in cache
val = mc.get(key)
# If exists, return it
if val:
return val
# If not, get the val, store it in the cache
val = expensive_func(key)
mc.set(key, val)
return val
39. A whole mess of caching:
● Browser cache
● CDN
● Proxy / optimizer
● Application-based
o Full-page
o Fragment
o Object cache
● Database
o Query cache
o Denormalization
44. Denormalization
mysql> select table1.x, table2.y from table1 join table2 on table1.z = table2.q
where table1.z > 100;
mysql> select table1.x, table1.y from table1 where table1.z > 100;
45. A whole mess of caching:
● Browser cache
● CDN
● Proxy / optimizer
● Application-based
o Full-page
o Fragment
o Object cache
● Database
o Query cache
o Denormalization
46. Caching: what can go wrong?
● Invalidation
● Fragmentation
● Stampedes
● Complexity
48. Invalidation on page-scale
● Browser cache
● CDN
● Proxy / optimizer
● Application-based
o Full-page
o Fragment
o Object cache
● Database
o Query cache
o Denormalization
More savings,
generally more invalidation...
Smaller savings,
generally less invalidation
49. Fragmentation
● What if I have a lot of different things to
cache?
o More misses
o Potential cache eviction
54. Complexity
● How much caching do I need, and where?
● What is the invalidation process
o on data update? on release?
● What happens if the caches fall over?
● How do I debug it?
55. Takeaways
● The ‘how’ of caching:
o What are you caching?
o Where are you caching it?
o How bad is a cache miss?
o How and when are you invalidating?
56. Takeaways
● The ‘why’ of caching:
o Did it actually get faster?
o Is speed worth extra complexity?
o Don’t guess – measure!
o Always use real-world conditions.
58. Thanks!
● Interested in measuring your Django app’s
performance?
o Free trial of TraceView:
www.appneta.com/products/traceview
● See you at Velocity NYC this fall?
● Twitter: @appneta / @dankosaur
59. Resources
● Django documentation on caching: https://docs.djangoproject.com/en/dev/topics/cache/
● Varnish caching, via Disqus: http://blog.disqus.com/post/62187806135/scaling-django-to-8-
billion-page-views
● Django cache option comparisons: http://codysoyland.com/2010/jan/17/evaluating-django-
caching-options/
● More Django-specific tips: http://www.slideshare.net/csky/where-django-caching-bust-at-the-
seams
● Guide to cache-related HTTP headers: http://www.mobify.com/blog/beginners-guide-to-http-
cache-headers/
● Google PageSpeed: https://developers.google.com/speed/pagespeed/module