Injustice - Developers Among Us (SciFiDevCon 2024)
Rackspace Open Sources Atom Nuke, The Fast Atom Framework
1. Rackspace Open Sources Atom Nuke, The Fast
Atom Framework
Filed in Product & Development by Chad Lung | September 11, 2012 3:30 pm
What if you had a tremendous mountain of data, broken up and stored across thousands of servers, and your
client wanted some specific portion of that data? You could assemble the whole mountain and send the whole
thing to your client, leaving the client to pick out what’s needed. But there are reasons you split it up in the
first place: it’s too big to store in one place or to transfer without interruption. Additionally there are reasons
you manage the data, including security and privacy, so this mountain moving might not be a good idea.
What if you could create something as complex as this, with data in multiple
formats from multiple origins stored across multiple servers but aggregated for
multiple consumers, who could then repackage it for consumers of their own?
If you couldn’t give your client a copy of all your data, you could ask the client to describe the specific data
that’s needed and then assemble those items the client needs. However, if you had many clients, each with
their own mountains of data, would you have to create a direct path from every consumer to every fragment
of data they need?
What you need is to easily create a bridge, integrating any number of data origins with any number of data
consumers. Enter in Atom Nuke.
2. [1]
With Atom Nuke[2], no matter where your data originates and who consumes
the data, it could be this simple to think about.
Atom Nuke Simplifies Integration
We created Atom Nuke[2] to give ourselves two kinds of power related to the high volumes of data produced
by our Atom feeds.
fission, making it easy to divide data in new ways
fusion, making it easy to combine data in new ways
[3]
A six-way integration requires eighteen paths, connecting three data origins
with three data consumers so each has direct and equal access. Adding one
new origin or consumer requires adding many new paths.
Atom Nuke is an open-source collection of utilities built on a simple, fast Atom implementation that aims for
a footprint of minimal dependency. The Atom implementation has its own model and utilizes a SAX parser
3. and a StAX writer.
SAX[4] (Simple API for XML) makes it simple to read existing data
StAX[5] (Streaming API for XML) makes it simple to stream data to and from applications
With Atom Nuke providing a bridge, a six-way integration requires six paths,
one from each of the three origins and three clients, with each path terminating
at Atom Nuke. Adding one new origin or consumer requires adding one new
path.
We designed our Nuke implementation for immutability, maximum simplicity and memory efficiency. Nuke
also contains a polling event framework that can poll multiple sources. Each source may be registered with a
configured polling interval that governs how often the source is polled during normal operation. That source
may have any number of Atom listeners added to its dispatch list. These listeners will begin receiving events
on the next scheduled poll.
Atom as a Building Block
Atom is a self-discoverable and generic syndication protocol. The Internet Engineering Task Force (IETF)
describes Atom in several ratified Requests for Comments (RFCs):
the Atom RFC[6]
the Atom Paging and Archiving RFC[7]
the Atom Publishing Protocol RFC[8]
The unique properties of the Atom specification have made it popular as a protocol for generic event
distribution, syndication and aggregation. Using Atom as a common interchange format, event publishers add
their domain-specific events to an Atom publication endpoint. Downstream, subscribers are notified of events
they’ve pre-identified as relevant, controlling what they consume from potentially-vast collections of
published data.
Atom Nuke Within Rackspace
4. Within Rackspace, the Cloud Integration team builds tools for all our software development teams to use. We
need to provide high-quality tools but we also need them to be easy to use and work smoothly together so that
we can encourage adoption throughout Rackspace.
Using Atom Nuke, we collect data from the Atom feeds supplied by Atom Hopper[9], another of our open-
source tools. We then take that Atom data and feed it into several systems, including those that perform
analytics on OpenStack[10] deployments throughout our data centers. The analytics engine uses Nuke to
collect the entire Atom feed data so it can be marshalled into a Hadoop[11] cluster. By combining our Atom
Nuke and Atom Hopper tools, we’ve enabled complete portability of data: we can combine Atom events with
data from other sources such as Rabbit MQ[12] messages and Flume[13] logs without requiring consumers of
that data to deal with the complexities of interacting with those dissimilar sources.
Nuke Makes Working with Atom Easy
Atom Nuke excels as a an Atom feed crawler, since you can poll multiple feeds from multiple endpoints as
well as define the polling intervals down to milliseconds. In addition, you can select events in response to
specific triggers, such as when a specific Atom entry contains a subscribed category. However, Nuke is much
more than a feed crawler, it can create its own Atom feeds if needed.
We built Atom Nuke with Java[14] but we recently extended support to Python[15]. Nuke is licensed under
the Apache 2 license[16] and was created by John Hopper[17], a software engineer on the Rackspace Cloud
Integration team. We’ve created some tutorials to get developers started with Nuke[18].
Building with Boxes, Not Bricks
Writing about a different kind of atom in a world that was just beginning to understand atomic structure and
atomic energy, H.G. Wells (1866-1946) imagined a future in which using the power stored within atoms
transformed many aspects of human life:
“I feel that we are but beginning the list. And we know now that the atom, that once we thought
hard and impenetrable, and indivisible and final and–lifeless–lifeless, is really a reservoir of
immense energy. That is the most wonderful thing about all this work. A little while ago we
though of the atoms as we thought of bricks, as solid building material, as substantial matter, as
unit masses of lifeless stuff, and behold! these bricks are boxes, treasure boxes, boxes full of the
intensest force.”
—H.G. Wells, The World Set Free, 1914
We’re now at a similar point with the technology of our time. We have explored enabling technologies, such
as Atom, and have begun fully using and building upon their capabilities, putting them to work in new ways
to make new things possible. As we begin building with Atom Nuke, we’re using Atom not as a brick, but as
a treasure box, containing amazing possibilities for fission and fusion, dividing and combining data to make
new applications possible. By making Atom Nuke and some of our other projects such as Atom Hopper[9]
available as open source, we hope we are also creating treasure boxes filled with ideas and possibilities.
To learn more about Atom Nuke, visit our project site[19] and check out the source code on GitHub[20].
Endnotes:
1. [Image]: http://ddf912383141a8d7bbe4-
e053e711fc85de3290f121ef0f0e3a1f.r87.cf1.rackcdn.com/atom-nuke-inall-outall.png
2. Atom Nuke: http://atomnuke.org/