Mark Korthaus from Syseleven talks about the challenges of running, tuning and scaling a Magnolia platform. He explores caching and high availability best practices and will provide a deeper insight into the infrastructure side of things.
3. About SysEleven
Born out of a macnews.de (1997 - 2010†)
Founded 2007 in Berlin, 2 datacenters, 50 people (40+ engineers), ISO 27001,
Akamai Partner, SuperCache Developer, specialized in Application Management,
fully virtualized datacenter, a lot of big iron
„If there are two or more ways to do something, and one of those ways can result in a
catastrophe, then someone will do it.“
4. Why yet another Hosting Company?
• Way to expensive offers from enterprise hosters
• nobody offered application expertise for scaling applications
• strict budget but application knowledge
• … pressure makes diamonds
5. But some definitions first…
Scalabilty
… is horizontal aka means capacity, serve more users at the same time, reached
via clustering (Session sharing, Shared Storage, Load balancing etc.),
Performance
… is vertical, is about zippiness, means to optimize for time to first byte and
general load time, needs bigger iron, caching (much cheaper, just do it) and
application optimization (a must).
!
9. Typical eCommerce Server Setup
MySQL Master
F5 Loadbalancer
Shared Storage
MySQL-
Replikation
Application Cluster
SuperCache (NginX & Couchbase)
MySQL Slaves
DB Writes DB Reads
Prudsys Server
FACT Finder Cluster
Cache API
Backend & Syslog
Search
Application Server
Shared Storage
DBS
Log
Processing
Recommendation
Engine
Cache Frontend
Loadbalancing
Caching API
DBM
10. How to improve Performance
• „Normally“ bigger Iron will help, doesn´t really help with bad Application code
• SSDs (SAS vs. PCI cards), higher CPU frequency per Thread, more memory
• General approach: Optimize for „time to first byte“
• Database Tuning (slow query, read from slave, write to master)
• Profile your application: NewRelic, JProfiler, hprof, AppDynamics, DynaTrace
• Caching, Caching and Caching
11. Scalabality & Clustering
• JackRabbit on different nodes for author and app server (you already knew that)
• Big Iron is good, too
(Hardware Loadbalancers, Web Application Firewalls, DDoS-Protection incl.)
• High Performance Shared Storage
(go with NetApp or Isilon - don´t even think of working with Linux NFS servers!)
• Database clustering
Master/Master & Master/Slave, maybe even sharding?
• Elasticity, automatic deployment (Puppet, Chef, Go), DevOps in general
12. Caching: Ehcache, Varnish & SuperCache
• Good ol’ Ehcache is active by default (so good so far)
• Passive Caching (Varnish)
serving full pages (or snippets via ESI), no advanced flushing out of the box,
needs a lifetime per URL, no shared memory backend per instances, no warm
restart, slow start is painful
• Active Caching (SuperCache)
separates memory backend, infinite Cache Lifetime, active flushing via mapping
service, API based, full flexibility
• Best is: Client for Magnolia is in development
13. SuperCache in detail
Hardware
Loadbalancer
http-Request
Proxy incl. API
(NginX + Lua)
Cache Memory Backend
(Couchbase Server)
Magnolia
Application Servers
Magnolia
Authoring Server
Hit
Miss Tag
14. SuperCache - What´s in for you
• RESTful API (not for a single Application), mapping & tags per cache item
• infinite Cache Lifetime, no cold cache, even after startup
• memory & disk backend, multi level scalability
• DDoS proof, very little CPU resources are used (2 Threads)
• ESI as a way to combine content of different applications
• HTTP Layer 7 Manipulation
15. Cache hit rates
Customer C
MISS
29 %
HIT
71 %
Measure your Hitrate!
Customer B
MISS
87 %
HIT
13 %
Customer A
MISS!
50 %
HIT!
50 %
16. How to ruin your cache hit rate
• Put sessions in URL
• Do lot of AB-Testing (and implement it lousy)
• make a lot of content updates (≥ thousands of updates per day)
• not having a page with content but a search only site
(…yes, it´s a trend: „Don´t use a CMS, just do a search!“)
17. Time to first byte
Application
some other cache :-)
SuperCache
0,00 Sec. 0,50 Sec. 1,00 Sec. 1,50 Sec. 2,00 Sec.
18. If you're not failing 90% of the time,
then you're probably not working on
sufficiently challenging problems.