My talk at PuppetConf 2012 on PuppetDB, a Clojure-based centralized storage daemon for Puppet.
Video of this talk: http://www.youtube.com/watch?v=xw83cRofkpM&list=PLV86BgbREluVFB73Wwqp_tCbw5Z9TMLX1&index=4&feature=plpp_video
7. persistent data long term
data ephemeral data mach
local data meticulously str
aapuppet generates data fr
d lots of data free form da
human readable data mach
data resource data depend
data ssl certificate data ho
28. “There's a war out there,
old friend. A world war.
And it's not about who's
got the most bullets.
It’s about who controls
the information.
-- Sneakers (1992)
What we see and hear,
how we work, what we
think... it's all about the
information!”
29. every resource
every parameter
every relationship
every class
every fact
for every node
128. Resource dedupe
Compute unique hashes for resources
We quickly hash all the resources in a catalog,
and use bulk operations to compare them to
hashes stored.
129. Resource dedupe
Significant speed improvement!
Internal to Puppet Labs, we see ~83% resource
duplication; this number is consistent with what
we’ve seen in most customer environments.
133. Catalog dedupe
Compute unique hashes for catalogs
Puppet Labs sees ~88% catalog duplication, rest
of the planet sees even bigger numbers
Big savings!
137. Parallel
We can pat our heads and rub our tummies at
the same time
Database operations don’t block MQ operations
don’t block HTTP operations don’t block hash
computation operations don’t block metric
calculations don’t block...
Dozens of threads, zero locks
146. Many production
deployments
Small shops with a dozen hosts,
large shops with thousands of
hosts, intercontinental
deployments...
over a billion resources served!
Not just in terms of volume, but also many different kinds of data\n
persistent, ephemeral, free form, machine readable, that’s a lot of stuff to sift through!\n\nso where to start?\n
\n
this is a resource, pictured as what you’d type\n
this is the same resource, only post-compilation. way more useful stuff in here!\n
so most of the time, you’ll have more than one resource at play on a node.\nBut you don’t want them just applied randomly; order is important!\n
You want the things at the top to happen first.\nInternally, Puppet represents this as a directed-graph we call...\n
\n
a collection of resources and their relationships is a catalog\n
a collection of resources and their relationships is a catalog\n
a collection of resources and their relationships is a catalog\n
This is the catalog for a fresh install of Puppet Enterprise\nNot too different from the one I showed earlier, just with more resources and relationships.\nBut this is actually a bit of a lie, because the _entire_ catalog...\n
Hard to see, but we can zoom in on a tiny area\n
So, same stuff you’re used to: users, groups, files, etc.\nTrying to make sense of this manually is insane, but that’s where Puppet comes in!\nInfrastructure is messy, but puppet untangles that web.\n
But that’s only one half of the story\n
\n
\n
and if that doesn’t cut it, you can always make your own\n
So, catalogs and facts are great, by why is having access to that stuff important?\n\n
Information is powerful!\n\nAs operators, our decisions are only as good as the information upon which they’re based. Tools are no different; they’re only as smart as the input data.\n
Information is powerful!\n\nAs operators, our decisions are only as good as the information upon which they’re based. Tools are no different; they’re only as smart as the input data.\n
Information is powerful!\n\nAs operators, our decisions are only as good as the information upon which they’re based. Tools are no different; they’re only as smart as the input data.\n
So what can you do when you’ve got all this juicy data sitting around?\n
And it’s not just for things like key distribution and monitoring,\n
...this is a generic pattern that can be applied to all manner of situations.\n\nAnything where you need information from one node to configure another\n
Utilizing your data in this way catapults you into the world of Higher-order puppet.\nBy which, I mean...\n
Delightfully meta, don&#x2019;t you think? <advance>\n\nDoubles-down on your automation, making it even more powerful.\n
Delightfully meta, don&#x2019;t you think? <advance>\n\nDoubles-down on your automation, making it even more powerful.\n
\n
\n
\n
\n
\n
\n
\n
\n...but it&#x2019;s not just a question of scalability...\n
\n
...and therein lies the problem\n
\n
A lot of people ask me what the DB stands for <advance>\n
I&#x2019;ll talk more about how we make that happen in a bit\n
We&#x2019;ve tested this on our own code, running Puppet Labs\n
\n
...and it&#x2019;s all built using...\n
So let&#x2019;s talk about how it works, at a high-level.\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
So earlier, I mentioned that we need PuppetDB to be reliable, queryable, and fast\n
\n
\n
In fact, let&#x2019;s try an experiment...\n
\n
\n
for example...\n
<explain what&#x2019;s going on>\n\nWow, that&#x2019;s a big wall of text, isn&#x2019;t it? <advance> But hopefully it illustrates how much instrumentation we&#x2019;ve put in.\n
<explain what&#x2019;s going on>\n\nWow, that&#x2019;s a big wall of text, isn&#x2019;t it? <advance> But hopefully it illustrates how much instrumentation we&#x2019;ve put in.\n
\n
Query language is documented and pretty versatile.\nSuper-set of resource-collection features (like complex boolean operators).\n...and this is why people have written libraries for use with puppet.\n