Rackspace monitors tens of thousands of servers using several open source tools like Apache Cassandra, Zookeeper, and Scribe. They developed their own tools like Virgo and Dreadnot to deploy agents and configure clusters across datacenters. Regular testing, deployment automation with Dreadnot and Chef, and documentation help support the large-scale monitoring system.
4. • Many tens of thousands of servers
• Several solutions in place
– Merged segments
– Wouldn’t scale to all
– Tedious (Manpower)
– Expensive
• Wanted more
19. A Tale of Two Clusters
Used Differently
Control Cluster
Data Cluster
20. Data Model
Relational Mismatch
Our Approach
One row per account
Prefixed Ids for col names
Concatenate for parent/child
21. Data Model
Foo objects get ids like ‘foXXX’
Bar objects get ids like ‘baXXX’
Assume 1:* relationship from foo:bar
Bar 456 that is attached to Foo 123 gets
column name ‘fo123:ba456’
Select ‘fo123:ba’..’fo123:baxefxbfxfb’
23. Open Source Contribution
Dreadnot˚
Inspired by Deployinator
Multi-region deployment tool
https://github.com/racker/dreadnot
Blog Post - http://bit.ly/xks0qT
29. Image Credits
Scale http://www.flickr.com/photos/puuikibeach/4765115333
Pyramids http://www.flickr.com/photos/gracewong/93631410
Hercules http://www.flickr.com/photos/istolethetv/2203377554/
Dandelion http://www.flickr.com/photos/8047705@N02/5572197407/
Ear http://www.flickr.com/photos/perpetualplum/3974880498
Hourglass http://www.flickr.com/photos/22244945@N00/3278869535
Eyeball http://www.flickr.com/photos/miran/6567911705
few gears http://www.flickr.com/photos/arthurjohnpicton/5364226117
Many Gears http://www.flickr.com/photos/mwichary/2294174641
Tools http://www.flickr.com/photos/zzpza/3269784239/
Crane http://www.flickr.com/photos/katatoniq/2075966238/
Ants http://www.flickr.com/photos/dendroica/6170146527
Choice http://www.flickr.com/photos/-bast-/349497988
Old Car http://www.flickr.com/photos/dok1/353601845/
Monster Truck http://www.flickr.com/photos/beadmobile/3279378483/
Clusters http://www.flickr.com/photos/nanagyei/6318995952/
Model http://www.flickr.com/photos/inl/5097547405/
Agent http://www.flickr.com/photos/erix/191965832/
Chef http://www.flickr.com/photos/londonmatt/417683733/
Arrows http://www.flickr.com/photos/generated/2084287794/
Dashboard http://www.flickr.com/photos/80502454@N00/4172458435/
Legos http://www.flickr.com/photos/great8/6820722517/
Xray http://www.flickr.com/photos/karen_roe/4417259305/
Document http://www.flickr.com/photos/tusnelda/6140792529/
Fowers http://www.flickr.com/photos/petercastleton/5905455717/
Editor's Notes
Problems = Specifically what we’re trying to solveHopes/Dreams = Things that would be really nice.How We Did It = Processes & tools
As business has changed/segments were merged, others were added.Different teamsNone of the existing solutions will scale to meet the needs of everyone.Not harnessing the data.
Different needs.Internal – hook into ticketing and support.External – monitor anythingOne API for everybody.
Obvious reallyProactive
TrendingCapacity planningAuto scaling
ToolingAPI Driven – Self-serviceAwesome reporting
Open sourceDistributed systemsEmbeddedDevOps
Prefer open source
Polyglot system
Control:Metadata + stateHigh high replication (5 DC)Wide rowsEasy dump and loadData:ArchivalConstant writes3 DC replication
Crud operationsData descriptionSimple parent-child – Simplified relationsPrefixed IDsObject called ‘foo’, id becomes ‘foXXXXXXXX’.Object called ‘bar’, id becomes ‘baXXXXXXX’.1:* relationship from foo:bar, then concatenate column names: foXXXXXXXX:baXXXXXXXXSelect all the bar that belong to fo12345678 becomes a slice from fo12345678:ba..fo12345678:ba\\xef\\xbf\\xbfOne to many from Objects are slices.
Crud operationsData descriptionSimple parent-child – Simplified relationsPrefixed IDsObject called ‘foo’, id becomes ‘foXXXXXXXX’.Object called ‘bar’, id becomes ‘baXXXXXXX’.1:* relationship from foo:bar, then concatenate column names: foXXXXXXXX:baXXXXXXXXSelect all the bar that belong to fo12345678 becomes a slice from fo12345678:ba..fo12345678:ba\\xef\\xbf\\xbfOne to many from Objects are slices.
Stress VISIBILITY!
Deployed 120 times in JanuarySystem changes treated the same way as code changes.