SlideShare uma empresa Scribd logo
1 de 33
My other computer is  a datacentre Steve Loughran Julio Guijarro December 2010
Our other computer is a datacentre
His  other computer is a datacentre ,[object Object],[object Object],[object Object],code.google.com
His  other computer is a datacentre ,[object Object],[object Object],[object Object]
Their  other computer is a datacentre Owen and Arun at Yahoo!
That sorts a Terabyte in 82s
This datacentre Yahoo! 8000 nodes, 32K cores, 16 Petabytes
His  other computer is a datacentre Dhruba at Facebook!
Problem: Big Data Storing and processing PB of data
Cost-effective storage of Petabytes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Big Data vs HPC Big Data : Petabytes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],HPC: petaflops ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
There are no free electrons
Hardware
Power Concerns ,[object Object],[object Object],[object Object],[object Object]
Network Fabric ,[object Object],[object Object],[object Object],[object Object],Bandwidth between racks is a bottleneck
Where? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Oregon and Washington States
Trend: containerized clusters
 
High Availability ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Front End ,[object Object],[object Object],[object Object],[object Object],[object Object]
Work engine ,[object Object],[object Object],[object Object],[object Object],[object Object]
MapReduce: Hadoop
Highbury Vaults Bluetooth Dataset ,[object Object],[object Object],[object Object],[object Object],{lost,"00:0F:B3:92:05:D3","2008-04-17T22:11:15",1124313075} {found,"00:0F:B3:92:05:D3","2008-04-17T22:11:29",1124313089} {lost,"00:0F:B3:92:05:D3","2008-04-17T22:24:45",1124313885} {found,"00:0F:B3:92:05:D3","2008-04-17T22:25:00",1124313900} {found,"00:60:57:70:25:0F","2008-04-17T22:29:00",1124314140}
MapReduce to Day of Week map_found_event_by_day_of_week( {event, found, Device, _, Timestamp}, Reducer) -> DayOfWeek = timestamp_to_day(Timestamp), Reducer ! {DayOfWeek, Device}. size(Key, Entries, A) -> L = length(Entries), [ {Key, L} | A]. mr_day_of_week(Source) -> mr(Source,  fun traffic:map_found_event_by_day_of_week/2, fun traffic:size/3,  []).
Results traffic:mr_day_of_week(big).  [{3,3702}, {6,3076}, {2,3747}, {5,3845}, {1,3044}, {4,3850}, {7,2274}] Monday 3044 Tuesday 3747 Wednesday 3702 Thursday 3850 Friday 3845 Saturday 3076 Sunday 2274
Hadoop running MapReduce
Filesystem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Logging & Health Layer ,[object Object],[object Object],[object Object],Monitor and mine the infrastructure  -with the infrastructure
What will your datacentre do?
Management  Layer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Infrastructure Layer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Testing ,[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
royans
 

Mais procurados (20)

Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
 
Redis
RedisRedis
Redis
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 

Destaque

Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
Steve Loughran
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
Steve Loughran
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliad
compd
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organization
compd
 

Destaque (14)

Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
 
Extended essay overview
Extended essay overviewExtended essay overview
Extended essay overview
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and Clouds
 
Economic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop JobsEconomic Scheduling of Hadoop Jobs
Economic Scheduling of Hadoop Jobs
 
H is for_hadoop
H is for_hadoopH is for_hadoop
H is for_hadoop
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
Battle At Goliad
Battle At GoliadBattle At Goliad
Battle At Goliad
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Graphs
GraphsGraphs
Graphs
 
A New Approach To Organization
A New Approach To OrganizationA New Approach To Organization
A New Approach To Organization
 
Scholarly articles
Scholarly articlesScholarly articles
Scholarly articles
 
Echolocation
EcholocationEcholocation
Echolocation
 
Did you really want that data?
Did you really want that data?Did you really want that data?
Did you really want that data?
 

Semelhante a My other computer_is_a_datacentre

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 

Semelhante a My other computer_is_a_datacentre (20)

My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentre
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangement
 
Bigdata
BigdataBigdata
Bigdata
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencing
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 

Mais de Steve Loughran

Mais de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

My other computer_is_a_datacentre

Notas do Editor

  1. This is the view from a motel in Hood River, Oregon. Behind the camera, about 15 + miles is Google's Dalles facility. Further up river, are the MS and amazon datacentres. Beyond that, the Hanford reservation where U238 extraction took place in the Manhattan Project. Why? Hydroelectric power.