SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
The Internet-of-things:
Architecting for the
deluge of data
CTO
bryan@joyent.com
Bryan Cantrill
@bcantrill
Big Data circa 1994: Pre-Internet
Source: BusinessWeek, September 5, 1994
Aside: Internet circa 1994
Source: BusinessWeek, October 10, 1994
Big Data circa 2004: Internet exhaust
• Through the 1990s, Moore’s Law + Kryder’s Law grew
faster than transaction rates, and what was
“overwhelming” in 1994 was manageable by 2004
• But large internet concerns (Google, Facebook, Yahoo!)
encountered a new class of problem: analyzing massive
amounts of data emitted as a byproduct of activity
• Data scaled with activity, not transactions — changing
both data sizes and economics
• Data sizes were too large for extant data warehousing
solutions — and were embarrassingly parallel besides
Big Data circa 2004: MapReduce
• MapReduce, pioneered by Google and later emulated
by Hadoop, pointed to a new paradigm where compute
tasks are broken into map and reduce phases
• Serves to explicitly divide the work that can be
parallelized from that which must be run sequentially
• Map phases are farmed out to a storage layer that
attempts to co-locate them with the data being mapped
• Made for commodity scale-out systems; relatively cheap
storage allowed for sloppy but effective solutions (e.g.
storing data in triplicate)
Big Data circa 2014
• Hadoop has become the de facto big data processing
engine — and HDFS the de facto storage substrate
• But HDFS is designed around availability during/for
computation; it is not designed to be authoritative
• HDFS is used primarily for data that is redundant,
transient, replaceable or otherwise fungible
• Authoritative storage remains either enterprise storage
(on premises) or object storage (in the cloud)
• For analysis of non-fungible data, pattern is to ingest
data into a Hadoop cluster from authoritative storage
• But a new set of problems is poised to emerge...
Big Data circa 2014: Internet-of-things
• IDC forecasts that the “digital universe” will grow from
130 exabytes in 2005 to 40,000 exabytes in 2020 —
with as much of a third having “analytic value”
• This doesn’t even factor in the (long forecasted) rise of
the internet-of-things/industrial internet...
• Machine-generated data at the edge will effect a step
function in data sizes and processing methodologies
• No one really knows how much data will be generated
by IoT, but the numbers are insane (e.g., HD camera
generates 20 GB/hour; a Ford Energi engine generates
25 GB/hour; a GE jet engine generates 1TB/flight)
How to cope with IoT-generated data?
• IoT presents so much more data that we will
increasingly need data science to make sense of it
• To assure data, we need to retain as much raw data as
possible, storing it once and authoritatively
• Storing data authoritatively has ramifications for the
storage substrate
• To allow for science, we need to place an emphasis on
hypothesis exploration: it must be quick to iterate from
hypothesis to experiment to result to new hypothesis
• Emphasizing hypothesis exploration has ramifications
for the compute abstractions and data movement
The coming ramifications of IoT
• It will no longer be acceptable to discard data: all data
will need to be retained to explore future hypotheses
• It will no longer be acceptable to store three copies: 3X
on storage costs is too acute when data is massive
• It will no longer be acceptable to move data for analysis:
in some cases, not even over the internet!
• It will no longer be acceptable to dictate the abstraction:
we must accommodate anything that can process data
• These shifts are as significant as the shift from
traditional data warehousing to scale-out MapReduce!
IoT: Authoritative storage?
• “Filesystems” that are really just user-level programs
layered on local filesystems lack device-level visibility,
sacrificing reliability and performance
• Even in-kernel, we have seen the corrosiveness of an
abstraction divide in the historic divide between logical
volume management and the filesystem:
• The volume manager understands multiple disks, but
nothing of the higher level semantics of the filesystem
• The filesystem understands the higher semantics of the
data, but has no physical device understanding
• This divide became entrenched over the 1990s, and had
devastating ramifications for reliability and performance
The ZFS revolution
• Starting in 2001, Sun began a revolutionary new
software effort: to unify storage and eliminate the divide
• In this model, filesystems would lose their one-to-one
association with devices: many filesystems would be
multiplexed on many devices
• By starting with a clean sheet of paper, ZFS opened up
vistas of innovation — and by its architecture was able
to solve many otherwise intractable problems
• Sun shipped ZFS in 2005, and used it as the foundation
of its enterprise storage products starting in 2008
• ZFS was open sourced in 2005; it remains the only open
source enterprise-grade filesystem
ZFS advantages
• Copy-on-write design allows on-disk consistency to be
always assured (eliminating file system check)
• Copy-on-write design allows constant-time snapshots in
unlimited quantity — and writable clones!
• Filesystem architecture allows filesystems to be created
instantly and expanded — or shrunk! — on-the-fly
• Integrated volume management allows for intelligent
device behavior with respect to disk failure and recovery
• Adaptive replacement cache (ARC) allows for optimal
use of DRAM — especially on high DRAM systems
• Support for dedicated log and cache devices allows for
optimal use of flash-based SSDs
ZFS at Joyent
• Joyent was the earliest ZFS adopter: becoming (in
2005) the first production user of ZFS outside of Sun
• ZFS is one of the four foundational technologies of
Joyent’s SmartOS, our illumos derivative
• The other three foundational technologies in SmartOS are
DTrace, Zones and KVM
• Search “fork yeah illumos” for the (uncensored) history of
OpenSolaris, illumos, SmartOS and derivatives
• Joyent has extended ZFS to provide better support
multi-tenant operation with I/O throttling
ZFS as the basis for IoT?
• ZFS offers commodity hardware economics with
enterprise-grade reliability — and obviates the need for
cross-machine mirroring for durability
• But ZFS is not itself a scale-out distributed system, and
is ill suited to become one
• Conclusion: ZFS is a good building block for the data
explosion from IoT, but not the whole puzzle
IoT: Compute abstraction?
• To facilitate hypothesis exploration, we need to carefully
consider the abstraction for computation
• How is data exploration programmatically expressed?
• How can this be made to be multi-tenant?
• The key enabling technology for multi-tenancy is
virtualization — but where in the stack to virtualize?
• The historical answer — since the 1960s — has been to
virtualize at the level of the hardware:
• A virtual machine is presented upon which each
tenant runs an operating system of their choosing
• There are as many operating systems as tenants
• The historical motivation for hardware virtualization
remains its advantage today: it can run entire legacy
stacks unmodified
• However, hardware virtualization exacts a heavy tolls:
operating systems are not designed to share resources
like DRAM, CPU, I/O devices or the network
• Hardware virtualization limits tenancy and inhibits
performance!
Hardware-level virtualization?
• Virtualizing at the application platform layer addresses
the tenancy challenges of hardware virtualization…
• ...but at the cost of dictating abstraction to the developer
• With IoT, this is especially problematic: we can expect
much more analog data and much deeper numerical
analysis — and dependencies on native libraries and/or
domain-specific languages
• Virtualizing at the application platform layer poses many
other challenges:
• Security, resource containment, language specificity,
environment-specific engineering costs
Platform-level virtualization?
• Containers virtualizing the OS and hit the sweet spot:
• Single OS (single kernel) allows for efficient use of hardware
resources, and therefore allows load factors to be high
• Disjoint instances are securely compartmentalized by the
operating system
• Gives customers what appears to be a virtual machine
(albeit a very fast one) on which to run higher-level software
• Gives customers PaaS when the abstractions work for them,
IaaS when they need more generality
• OS-level virtualization allows for high levels of tenancy
without dictating abstraction or sacrificing efficiency
• Zones is a bullet-proof implementation of OS-level
virtualization — and is the core abstraction in Joyent’s
SmartOS
Joyent’s solution: OS containers
Idea: ZFS + Containers?
• Building a sophisticated distributed system on top of
ZFS and zones, we have built Manta, an internet-facing
object storage system offering in situ compute
• That is, the description of compute can be brought to
where objects reside instead of having to backhaul
objects to transient compute
• The abstractions made available for computation are
anything that can run on the OS...
• ...and as a reminder, the OS — Unix — was built around
the notion of ad hoc unstructured data processing, and
allows for remarkably terse expressions of computation
Manta: ZFS + Containers!
Aside: Unix
• When Unix appeared in the early 1970s, it was not just a
new system, but a new way of thinking about systems
• Instead of a sealed monolith, the operating system was
a collection of small, easily understood programs
• First Edition Unix (1971) contained many programs that
we still use today (ls, rm, cat, mv)
• Its very name conveyed this minimalist aesthetic: Unix is
a homophone of “eunuchs” — a castrated Multics
We were a bit oppressed by the big system mentality. Ken
wanted to do something simple. — Dennis Ritchie
Unix: Let there be light
• In 1969, Doug McIlroy had the idea of connecting
different components:
At the same time that Thompson and Ritchie were sketching
out a file system, I was sketching out how to do data
processing on the blackboard by connecting together
cascades of processes
• This was the primordial pipe, but it took three years to
persuade Thompson to adopt it:
And one day I came up with a syntax for the shell that went
along with the piping, and Ken said, “I’m going to do it!”
Unix: ...and there was light
And the next morning we had this
orgy of one-liners. — Doug McIlroy
The Unix philosophy
• The pipe — coupled with the small-system aesthetic —
gave rise to the Unix philosophy, as articulated by Doug
McIlroy:
• Write programs that do one thing and do it well
• Write programs to work together
• Write programs that handle text streams, because
that is a universal interface
• Four decades later, this philosophy remains the single
most important revolution in software systems thinking!
• In 1986, Jon Bentley posed the challenge that became
the Epic Rap Battle of computer science history:
Read a file of text, determine the n most frequently used
words, and print out a sorted list of those words along with
their frequencies.
• Don Knuth’s solution: an elaborate program in WEB, a
Pascal-like literate programming system of his own
invention, using a purpose-built algorithm
• Doug McIlroy’s solution shows the power of the Unix
philosophy:
tr -cs A-Za-z 'n' | tr A-Z a-z | 
sort | uniq -c | sort -rn | sed ${1}q
Doug McIlroy v. Don Knuth: FIGHT!
Big Data: History repeats itself?
• The original Google MapReduce paper (Dean et al.,
OSDI ’04) poses a problem disturbingly similar to
Bentley’s challenge nearly two decades prior:
Count of URL Access Frequency: The function processes
logs of web page requests and outputs ⟨URL, 1⟩. The
reduce function adds together all values for the same URL
and emits a ⟨URL, total count⟩ pair
• But the solutions do not adhere to the Unix philosophy...
• ...and nor do they make use of the substantial Unix
foundation for data processing
• e.g., Appendix A of the OSDI ’04 paper has a 71 line
word count in C++ — with nary a wc in sight
• Manta allows for an arbitrarily scalable variant of
McIlroy’s solution to Bentley’s challenge:
mfind -t o /bcantrill/public/v7/usr/man | 
mjob create -o -m "tr -cs A-Za-z 'n' | 
tr A-Z a-z | sort | uniq -c" -r 
"awk '{ x[$2] += $1 }
END { for (w in x) { print x[w] " " w } }' | 
sort -rn | sed ${1}q"
• This description not only terse, it is high performing: data
is left at rest — with the “map” phase doing heavy
reduction of the data stream
• As such, Manta — like Unix — is not merely syntactic
sugar; it converges compute and data in a new way
Manta: Unix for Big Data — and IoT
• Eventual consistency represents the wrong CAP
tradeoffs for most; we prefer consistency over
availability for writes (but still availability for reads)
• Many more details:
http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/
• Celebrity endorsement:
Manta: CAP tradeoffs
• Hierarchical storage is an excellent idea (ht: Multics);
Manta implements proper directories, delimited with a
forward slash
• Manta implements a snapshot/link hybrid dubbed a
snaplink; can be used to effect versioning
• Manta has full support for CORS headers
• Manta uses SSH-based HTTP auth for client-side
tooling (IETF draft-cavage-http-signatures-00)
• Manta SDKs exist for node.js, R, Go, Java, Ruby,
Python — and of course, compute jobs may be in any of
these (plus Perl, Clojure, Lisp, Erlang, Forth, Prolog,
Fortran, Haskell, Lua, Mono, COBOL, Fortran, etc.)
• “npm install manta” for command line interface
Manta: Other design principles
• We believe compute/data convergence to be a
constraint imposed by IoT: stores of record must support
computation as a first-class, in situ operation
• We believe that some (and perhaps many) IoT
workloads will require computing at the edge — internet
transit may be prohibitive for certain applications
• We believe that Unix is a natural way of expressing this
computation — and that OS containers are the right way
to support this securely
• We believe that ZFS is the only sane storage
underpinning for such a system
• Manta will surely not be the only system to represent the
confluence of these — but it is the first
Manta and IoT
• Product page:
http://joyent.com/products/manta
• node.js module:
https://github.com/joyent/node-manta
• Manta documentation:
http://apidocs.joyent.com/manta/
• IRC, e-mail, Twitter, etc.:
#manta on freenode, manta@joyent.com, @mcavage,
@dapsays, @yunongx, @joyent
Manta: More information

Mais conteúdo relacionado

Mais procurados

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containersbcantrill
 
Dynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open ChallengesDynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open Challengesbcantrill
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 
Debugging microservices in production
Debugging microservices in productionDebugging microservices in production
Debugging microservices in productionbcantrill
 
Bringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big DataBringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big Databcantrill
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadebcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathbcantrill
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIbcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindbcantrill
 
server to cloud: converting a legacy platform to an open source paas
server to cloud:  converting a legacy platform to an open source paasserver to cloud:  converting a legacy platform to an open source paas
server to cloud: converting a legacy platform to an open source paasTodd Fritz
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornbcantrill
 
Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Hakka Labs
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionbcantrill
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in productionbcantrill
 
Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud ComputingCrash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud ComputingMark Hinkle
 
BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12bcantrill
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
OSCON 2014 - Crash Course in Open Source Cloud Computing
OSCON 2014 -  Crash Course in Open Source Cloud ComputingOSCON 2014 -  Crash Course in Open Source Cloud Computing
OSCON 2014 - Crash Course in Open Source Cloud ComputingMark Hinkle
 
LinuxFest Northwest: Crash Course in Open Source Cloud Computing
LinuxFest Northwest: Crash Course in Open Source Cloud Computing LinuxFest Northwest: Crash Course in Open Source Cloud Computing
LinuxFest Northwest: Crash Course in Open Source Cloud Computing Mark Hinkle
 

Mais procurados (20)

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
 
Dynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open ChallengesDynamic Languages in Production: Progress and Open Challenges
Dynamic Languages in Production: Progress and Open Challenges
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generations
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging microservices in production
Debugging microservices in productionDebugging microservices in production
Debugging microservices in production
 
Bringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big DataBringing the Unix Philosophy to Big Data
Bringing the Unix Philosophy to Big Data
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 
server to cloud: converting a legacy platform to an open source paas
server to cloud:  converting a legacy platform to an open source paasserver to cloud:  converting a legacy platform to an open source paas
server to cloud: converting a legacy platform to an open source paas
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicorn
 
Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destruction
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
 
Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud ComputingCrash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing
 
BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12BayLISA meetup: 8/16/12
BayLISA meetup: 8/16/12
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
OSCON 2014 - Crash Course in Open Source Cloud Computing
OSCON 2014 -  Crash Course in Open Source Cloud ComputingOSCON 2014 -  Crash Course in Open Source Cloud Computing
OSCON 2014 - Crash Course in Open Source Cloud Computing
 
LinuxFest Northwest: Crash Course in Open Source Cloud Computing
LinuxFest Northwest: Crash Course in Open Source Cloud Computing LinuxFest Northwest: Crash Course in Open Source Cloud Computing
LinuxFest Northwest: Crash Course in Open Source Cloud Computing
 

Semelhante a The Internet-of-things: Architecting for the deluge of data

ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)Huibert Aalbers
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Journey to the Programmable Data Center
Journey to the Programmable Data CenterJourney to the Programmable Data Center
Journey to the Programmable Data CenterToby Weiss
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Latest trendsincloud computing
Latest trendsincloud computingLatest trendsincloud computing
Latest trendsincloud computingLiliana Ignat
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.Steve Hoffman
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data DataCentred
 
Sameer Mitter | Introduction to Cloud computing
Sameer Mitter | Introduction to Cloud computingSameer Mitter | Introduction to Cloud computing
Sameer Mitter | Introduction to Cloud computingSameer Mitter
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptVipin Singhal
 

Semelhante a The Internet-of-things: Architecting for the deluge of data (20)

ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)ITI015En-The evolution of databases (I)
ITI015En-The evolution of databases (I)
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Chapter 5(2).pdf
Chapter 5(2).pdfChapter 5(2).pdf
Chapter 5(2).pdf
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Journey to the Programmable Data Center
Journey to the Programmable Data CenterJourney to the Programmable Data Center
Journey to the Programmable Data Center
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Latest trendsincloud computing
Latest trendsincloud computingLatest trendsincloud computing
Latest trendsincloud computing
 
cloudcomputing.pptx
cloudcomputing.pptxcloudcomputing.pptx
cloudcomputing.pptx
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Sameer Mitter | Introduction to Cloud computing
Sameer Mitter | Introduction to Cloud computingSameer Mitter | Introduction to Cloud computing
Sameer Mitter | Introduction to Cloud computing
 
cloud.ppt
cloud.pptcloud.ppt
cloud.ppt
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 

Mais de bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemapsbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
A crime against common sense
A crime against common senseA crime against common sense
A crime against common sensebcantrill
 

Mais de bcantrill (18)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
A crime against common sense
A crime against common senseA crime against common sense
A crime against common sense
 

Último

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 

Último (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 

The Internet-of-things: Architecting for the deluge of data

  • 1. The Internet-of-things: Architecting for the deluge of data CTO bryan@joyent.com Bryan Cantrill @bcantrill
  • 2. Big Data circa 1994: Pre-Internet Source: BusinessWeek, September 5, 1994
  • 3. Aside: Internet circa 1994 Source: BusinessWeek, October 10, 1994
  • 4. Big Data circa 2004: Internet exhaust • Through the 1990s, Moore’s Law + Kryder’s Law grew faster than transaction rates, and what was “overwhelming” in 1994 was manageable by 2004 • But large internet concerns (Google, Facebook, Yahoo!) encountered a new class of problem: analyzing massive amounts of data emitted as a byproduct of activity • Data scaled with activity, not transactions — changing both data sizes and economics • Data sizes were too large for extant data warehousing solutions — and were embarrassingly parallel besides
  • 5. Big Data circa 2004: MapReduce • MapReduce, pioneered by Google and later emulated by Hadoop, pointed to a new paradigm where compute tasks are broken into map and reduce phases • Serves to explicitly divide the work that can be parallelized from that which must be run sequentially • Map phases are farmed out to a storage layer that attempts to co-locate them with the data being mapped • Made for commodity scale-out systems; relatively cheap storage allowed for sloppy but effective solutions (e.g. storing data in triplicate)
  • 6. Big Data circa 2014 • Hadoop has become the de facto big data processing engine — and HDFS the de facto storage substrate • But HDFS is designed around availability during/for computation; it is not designed to be authoritative • HDFS is used primarily for data that is redundant, transient, replaceable or otherwise fungible • Authoritative storage remains either enterprise storage (on premises) or object storage (in the cloud) • For analysis of non-fungible data, pattern is to ingest data into a Hadoop cluster from authoritative storage • But a new set of problems is poised to emerge...
  • 7. Big Data circa 2014: Internet-of-things • IDC forecasts that the “digital universe” will grow from 130 exabytes in 2005 to 40,000 exabytes in 2020 — with as much of a third having “analytic value” • This doesn’t even factor in the (long forecasted) rise of the internet-of-things/industrial internet... • Machine-generated data at the edge will effect a step function in data sizes and processing methodologies • No one really knows how much data will be generated by IoT, but the numbers are insane (e.g., HD camera generates 20 GB/hour; a Ford Energi engine generates 25 GB/hour; a GE jet engine generates 1TB/flight)
  • 8. How to cope with IoT-generated data? • IoT presents so much more data that we will increasingly need data science to make sense of it • To assure data, we need to retain as much raw data as possible, storing it once and authoritatively • Storing data authoritatively has ramifications for the storage substrate • To allow for science, we need to place an emphasis on hypothesis exploration: it must be quick to iterate from hypothesis to experiment to result to new hypothesis • Emphasizing hypothesis exploration has ramifications for the compute abstractions and data movement
  • 9. The coming ramifications of IoT • It will no longer be acceptable to discard data: all data will need to be retained to explore future hypotheses • It will no longer be acceptable to store three copies: 3X on storage costs is too acute when data is massive • It will no longer be acceptable to move data for analysis: in some cases, not even over the internet! • It will no longer be acceptable to dictate the abstraction: we must accommodate anything that can process data • These shifts are as significant as the shift from traditional data warehousing to scale-out MapReduce!
  • 10. IoT: Authoritative storage? • “Filesystems” that are really just user-level programs layered on local filesystems lack device-level visibility, sacrificing reliability and performance • Even in-kernel, we have seen the corrosiveness of an abstraction divide in the historic divide between logical volume management and the filesystem: • The volume manager understands multiple disks, but nothing of the higher level semantics of the filesystem • The filesystem understands the higher semantics of the data, but has no physical device understanding • This divide became entrenched over the 1990s, and had devastating ramifications for reliability and performance
  • 11. The ZFS revolution • Starting in 2001, Sun began a revolutionary new software effort: to unify storage and eliminate the divide • In this model, filesystems would lose their one-to-one association with devices: many filesystems would be multiplexed on many devices • By starting with a clean sheet of paper, ZFS opened up vistas of innovation — and by its architecture was able to solve many otherwise intractable problems • Sun shipped ZFS in 2005, and used it as the foundation of its enterprise storage products starting in 2008 • ZFS was open sourced in 2005; it remains the only open source enterprise-grade filesystem
  • 12. ZFS advantages • Copy-on-write design allows on-disk consistency to be always assured (eliminating file system check) • Copy-on-write design allows constant-time snapshots in unlimited quantity — and writable clones! • Filesystem architecture allows filesystems to be created instantly and expanded — or shrunk! — on-the-fly • Integrated volume management allows for intelligent device behavior with respect to disk failure and recovery • Adaptive replacement cache (ARC) allows for optimal use of DRAM — especially on high DRAM systems • Support for dedicated log and cache devices allows for optimal use of flash-based SSDs
  • 13. ZFS at Joyent • Joyent was the earliest ZFS adopter: becoming (in 2005) the first production user of ZFS outside of Sun • ZFS is one of the four foundational technologies of Joyent’s SmartOS, our illumos derivative • The other three foundational technologies in SmartOS are DTrace, Zones and KVM • Search “fork yeah illumos” for the (uncensored) history of OpenSolaris, illumos, SmartOS and derivatives • Joyent has extended ZFS to provide better support multi-tenant operation with I/O throttling
  • 14. ZFS as the basis for IoT? • ZFS offers commodity hardware economics with enterprise-grade reliability — and obviates the need for cross-machine mirroring for durability • But ZFS is not itself a scale-out distributed system, and is ill suited to become one • Conclusion: ZFS is a good building block for the data explosion from IoT, but not the whole puzzle
  • 15. IoT: Compute abstraction? • To facilitate hypothesis exploration, we need to carefully consider the abstraction for computation • How is data exploration programmatically expressed? • How can this be made to be multi-tenant? • The key enabling technology for multi-tenancy is virtualization — but where in the stack to virtualize?
  • 16. • The historical answer — since the 1960s — has been to virtualize at the level of the hardware: • A virtual machine is presented upon which each tenant runs an operating system of their choosing • There are as many operating systems as tenants • The historical motivation for hardware virtualization remains its advantage today: it can run entire legacy stacks unmodified • However, hardware virtualization exacts a heavy tolls: operating systems are not designed to share resources like DRAM, CPU, I/O devices or the network • Hardware virtualization limits tenancy and inhibits performance! Hardware-level virtualization?
  • 17. • Virtualizing at the application platform layer addresses the tenancy challenges of hardware virtualization… • ...but at the cost of dictating abstraction to the developer • With IoT, this is especially problematic: we can expect much more analog data and much deeper numerical analysis — and dependencies on native libraries and/or domain-specific languages • Virtualizing at the application platform layer poses many other challenges: • Security, resource containment, language specificity, environment-specific engineering costs Platform-level virtualization?
  • 18. • Containers virtualizing the OS and hit the sweet spot: • Single OS (single kernel) allows for efficient use of hardware resources, and therefore allows load factors to be high • Disjoint instances are securely compartmentalized by the operating system • Gives customers what appears to be a virtual machine (albeit a very fast one) on which to run higher-level software • Gives customers PaaS when the abstractions work for them, IaaS when they need more generality • OS-level virtualization allows for high levels of tenancy without dictating abstraction or sacrificing efficiency • Zones is a bullet-proof implementation of OS-level virtualization — and is the core abstraction in Joyent’s SmartOS Joyent’s solution: OS containers
  • 19. Idea: ZFS + Containers?
  • 20. • Building a sophisticated distributed system on top of ZFS and zones, we have built Manta, an internet-facing object storage system offering in situ compute • That is, the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute • The abstractions made available for computation are anything that can run on the OS... • ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation Manta: ZFS + Containers!
  • 21. Aside: Unix • When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems • Instead of a sealed monolith, the operating system was a collection of small, easily understood programs • First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv) • Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie
  • 22. Unix: Let there be light • In 1969, Doug McIlroy had the idea of connecting different components: At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes • This was the primordial pipe, but it took three years to persuade Thompson to adopt it: And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”
  • 23. Unix: ...and there was light And the next morning we had this orgy of one-liners. — Doug McIlroy
  • 24. The Unix philosophy • The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy: • Write programs that do one thing and do it well • Write programs to work together • Write programs that handle text streams, because that is a universal interface • Four decades later, this philosophy remains the single most important revolution in software systems thinking!
  • 25. • In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. • Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm • Doug McIlroy’s solution shows the power of the Unix philosophy: tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q Doug McIlroy v. Don Knuth: FIGHT!
  • 26. Big Data: History repeats itself? • The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior: Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair • But the solutions do not adhere to the Unix philosophy... • ...and nor do they make use of the substantial Unix foundation for data processing • e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight
  • 27. • Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | mjob create -o -m "tr -cs A-Za-z 'n' | tr A-Z a-z | sort | uniq -c" -r "awk '{ x[$2] += $1 } END { for (w in x) { print x[w] " " w } }' | sort -rn | sed ${1}q" • This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream • As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way Manta: Unix for Big Data — and IoT
  • 28. • Eventual consistency represents the wrong CAP tradeoffs for most; we prefer consistency over availability for writes (but still availability for reads) • Many more details: http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/ • Celebrity endorsement: Manta: CAP tradeoffs
  • 29. • Hierarchical storage is an excellent idea (ht: Multics); Manta implements proper directories, delimited with a forward slash • Manta implements a snapshot/link hybrid dubbed a snaplink; can be used to effect versioning • Manta has full support for CORS headers • Manta uses SSH-based HTTP auth for client-side tooling (IETF draft-cavage-http-signatures-00) • Manta SDKs exist for node.js, R, Go, Java, Ruby, Python — and of course, compute jobs may be in any of these (plus Perl, Clojure, Lisp, Erlang, Forth, Prolog, Fortran, Haskell, Lua, Mono, COBOL, Fortran, etc.) • “npm install manta” for command line interface Manta: Other design principles
  • 30. • We believe compute/data convergence to be a constraint imposed by IoT: stores of record must support computation as a first-class, in situ operation • We believe that some (and perhaps many) IoT workloads will require computing at the edge — internet transit may be prohibitive for certain applications • We believe that Unix is a natural way of expressing this computation — and that OS containers are the right way to support this securely • We believe that ZFS is the only sane storage underpinning for such a system • Manta will surely not be the only system to represent the confluence of these — but it is the first Manta and IoT
  • 31. • Product page: http://joyent.com/products/manta • node.js module: https://github.com/joyent/node-manta • Manta documentation: http://apidocs.joyent.com/manta/ • IRC, e-mail, Twitter, etc.: #manta on freenode, manta@joyent.com, @mcavage, @dapsays, @yunongx, @joyent Manta: More information