SlideShare uma empresa Scribd logo
1 de 40
An Overview of Cloud Computing:My Other Computer is a Data Center Robert GrossmanOpen Cloud Consortium January 7, 2010
Part 1What is a Cloud? 2
What is a Cloud? 3 Software as a Service (SaaS)
What Else is a Cloud? 4 Platform as a Service (PaaS)
Is Anything Else a Cloud? 5 Infrastructure as a Service (IaaS)
Are There Other Types of Clouds? 6 ad targeting  Large Data Cloud Services
What is Virtualization? 7
Idea Dates Back to the 1960s 8 App App App CMS CMS MVS IBM VM/370 IBM Mainframe Native (Full) Virtualization Examples: Vmware ESX Virtualization first widely deployed with IBM VM/370.
What Do You Optimize? Goal: Minimize latency and control heat. Goal: Maximize data (with matching compute) and control cost.
10 Scale is new
Elastic, Usage Based Pricing Is New 11 costs the same as 1 computer in a rack for 120 hours 120 computers in  three racks for 1 hour ,[object Object]
 Clouds can be used to manage surges in computing needs.,[object Object]
13
What Resource is Managed? Scarce processors wait for data Manage cycles wait for an opening in the queue scatter the data to the processors and gather the results Persistent data wait for queries Manage data persistent data waits for queries computation done locally results returned Supercomputer Center Model  Data Center Model
Part 2.  Data Centers as the Unit of Computing Cloud computing is at the top of the Gartner hype cycle. “Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions 15
2004 10x-100x 1976 10x-100x data science 1670 250x simulation science 1609 30x experimental science
Requirements for Clouds
Transition Taking Place A hand full of players are building multiple data centers a year and improving with each one. This includes Google, Microsoft, Yahoo, … A data center today costs $200 M – $400+ M Berkeley RAD Report points out analogy with semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B  18
Which is the Operating System? 19 … … VM 1 VM 5 VM 50,000 VM 1 Data Center Operating System Hyperviser workstation data center
How Do You Program A Data Center? 20
Some Programming Models for Data Centers Operations over data center of disks MapReduce (“string-based”) User-Defined Functions (UDFs) over data center SQL and Quasi-SQL over data center Data analysis / statistics over data center Operations over data center of memory Grep over distributed memory UDFs over distributed memory SQL and Quasi-SQL over distributed memory Data analysis / statistics over distributed memory
Part 3.Open Cloud Consortium
U.S. 501(3)(c) not-for-profit corporation Supports the development of standards and interoperability frameworks. Supports reference implementations for cloud computing.   Manages testbeds: Open Cloud Testbed, IntercloudTestbed, Open Science Data Cloud Develops benchmarks. 23 www.opencloudconsortium.org
OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities:  CalIT2, Johns Hopkins, Northwestern, University of Illinois at Chicago, University of Chicago Government agencies: NASA Organizations: Sector Project 24
             Open Cloud Testbed C-Wave CENIC Dragon Phase 2 9 racks 250+ Nodes 1000+ Cores 10+ Gb/s ,[object Object]
Sector/Sphere
Thrift
KVM VMs
Eucalyptus VMsMREN 25
IntercloudTestbed ,[object Object]
Cloud Compute Services
Data & Storage as a ServiceLarge Data Cloud Interoperability Framework Working with Infrastructure 2.0 Working Group SNIA Cloud Data Management Interface (CDMI) Dynamic infrastructure service linking IaaS and DaaS Working with Infrastructure 2.0 Working Group ,[object Object],Virtual Data Centers (VDC) Virtual Networks (VN) Virtual Machines (VM) Physical Resources Dynamic infrastructure service naming and linking entities in the IaaS layers Open Cloud Computing Interface (OCCI) Open Virtualization Format (OVF)
 Open Science Data Cloud sky cloud Planning to work with 5 international partners (all connected with 10 Gbps networks). biocloud 27
MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack.  Data consisted of 20 nodes with 500 million 100-byte records / node.
Some Lessons Learned (So Far) Python over Hadoop Distributed File System surprisingly powerful. Tuning Hadoop can be a large (unacknowledged) cost.  Performance of a cloud computation can be significantly impacted by just 1 or 2 nodes that are a bit slower. Wide area clouds can be practical in some cases. 29
Part 4.  Sector 30 http://sector.sourceforge.net
Sector Overview Sector is fast As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector does not require extensive tuning. Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 supports multiple master node servers 31
Google’s Large Data Cloud Compute Services Data Services Storage Services 32 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
Hadoop’s Large Data Cloud Compute Services Storage Services 33 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
Sector’s Large Data Cloud 34 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack

Mais conteúdo relacionado

Mais procurados

Cloud and Big Data Conference Images
Cloud and Big Data Conference ImagesCloud and Big Data Conference Images
Cloud and Big Data Conference Images
PatrickCrompton
 

Mais procurados (19)

Extending Application Data In The Cloud
Extending Application Data In The CloudExtending Application Data In The Cloud
Extending Application Data In The Cloud
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
TerraEchos Kairos on IBM PowerLinux servers
TerraEchos Kairos on IBM PowerLinux serversTerraEchos Kairos on IBM PowerLinux servers
TerraEchos Kairos on IBM PowerLinux servers
 
Cloud & Data Center Networking
Cloud & Data Center NetworkingCloud & Data Center Networking
Cloud & Data Center Networking
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
CREODIAS: Cloud for Eath Obseration Data Processing
CREODIAS: Cloud for Eath Obseration Data ProcessingCREODIAS: Cloud for Eath Obseration Data Processing
CREODIAS: Cloud for Eath Obseration Data Processing
 
Sobloo Geospatial Ecosystem
Sobloo Geospatial EcosystemSobloo Geospatial Ecosystem
Sobloo Geospatial Ecosystem
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Mundi Presentation - A Space of New Opportunities
Mundi Presentation - A Space of New OpportunitiesMundi Presentation - A Space of New Opportunities
Mundi Presentation - A Space of New Opportunities
 
Ss eb29
Ss eb29Ss eb29
Ss eb29
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
 
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
 
TierraCloud HC2 Customer Presentation
TierraCloud HC2 Customer PresentationTierraCloud HC2 Customer Presentation
TierraCloud HC2 Customer Presentation
 
Cloud and Big Data Conference Images
Cloud and Big Data Conference ImagesCloud and Big Data Conference Images
Cloud and Big Data Conference Images
 
VINEYARD Overview - ARC 2016
VINEYARD Overview - ARC 2016VINEYARD Overview - ARC 2016
VINEYARD Overview - ARC 2016
 

Destaque

MS TechDays 2011 - Cloud Computing with the Windows Azure Platform
MS TechDays 2011 - Cloud Computing with the Windows Azure PlatformMS TechDays 2011 - Cloud Computing with the Windows Azure Platform
MS TechDays 2011 - Cloud Computing with the Windows Azure Platform
Spiffy
 
Windows Azure David Chappell White Paper March 09
Windows Azure David Chappell White Paper March 09Windows Azure David Chappell White Paper March 09
Windows Azure David Chappell White Paper March 09
guest120d945
 
Cloud Computing & Windows Azure
Cloud Computing & Windows AzureCloud Computing & Windows Azure
Cloud Computing & Windows Azure
yeschandana
 
Introducing Azure Services Platform V1
Introducing Azure Services Platform V1Introducing Azure Services Platform V1
Introducing Azure Services Platform V1
guest120d945
 
S00193ed1v01y200905cac006
S00193ed1v01y200905cac006S00193ed1v01y200905cac006
S00193ed1v01y200905cac006
guest120d945
 

Destaque (16)

Webinar: Learn How To Deploy High-Scale, Low-Latency Cost-Efficient Solutions...
Webinar: Learn How To Deploy High-Scale, Low-Latency Cost-Efficient Solutions...Webinar: Learn How To Deploy High-Scale, Low-Latency Cost-Efficient Solutions...
Webinar: Learn How To Deploy High-Scale, Low-Latency Cost-Efficient Solutions...
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPC
 
MS TechDays 2011 - Cloud Computing with the Windows Azure Platform
MS TechDays 2011 - Cloud Computing with the Windows Azure PlatformMS TechDays 2011 - Cloud Computing with the Windows Azure Platform
MS TechDays 2011 - Cloud Computing with the Windows Azure Platform
 
Windows Azure David Chappell White Paper March 09
Windows Azure David Chappell White Paper March 09Windows Azure David Chappell White Paper March 09
Windows Azure David Chappell White Paper March 09
 
Cloud Computing & Windows Azure
Cloud Computing & Windows AzureCloud Computing & Windows Azure
Cloud Computing & Windows Azure
 
Introducing Azure Services Platform V1
Introducing Azure Services Platform V1Introducing Azure Services Platform V1
Introducing Azure Services Platform V1
 
2011.05.31 super mondays-servicebus-demo
2011.05.31 super mondays-servicebus-demo2011.05.31 super mondays-servicebus-demo
2011.05.31 super mondays-servicebus-demo
 
S00193ed1v01y200905cac006
S00193ed1v01y200905cac006S00193ed1v01y200905cac006
S00193ed1v01y200905cac006
 
IT HealthCheck
IT HealthCheckIT HealthCheck
IT HealthCheck
 
Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
Power Comparison Power Comparison of Cloud Data of Cloud Data Center Architec...
 
An introduction to the Design of Warehouse-Scale Computers
An introduction to the Design of Warehouse-Scale ComputersAn introduction to the Design of Warehouse-Scale Computers
An introduction to the Design of Warehouse-Scale Computers
 
Trend and Future of Cloud Computing
Trend and Future of Cloud ComputingTrend and Future of Cloud Computing
Trend and Future of Cloud Computing
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3
 
Ingram Micro IaaS Playbook
Ingram Micro IaaS PlaybookIngram Micro IaaS Playbook
Ingram Micro IaaS Playbook
 
4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud4 Ways To Save Big Money in Your Data Center and Private Cloud
4 Ways To Save Big Money in Your Data Center and Private Cloud
 

Semelhante a My Other Computer is a Data Center (2010 v21)

Cloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree comparedCloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree compared
Md. Hasibur Rashid
 
Spatial data infrastructure in the cloud, 2011
Spatial data infrastructure in the cloud, 2011Spatial data infrastructure in the cloud, 2011
Spatial data infrastructure in the cloud, 2011
Moullet
 
Cloud computing - dien toan dam may
Cloud computing - dien toan dam mayCloud computing - dien toan dam may
Cloud computing - dien toan dam may
Nguyen Duong
 

Semelhante a My Other Computer is a Data Center (2010 v21) (20)

An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)
 
Cloud Computing Standards and Use Cases (Robert Grossman) 09-v8p
Cloud Computing Standards and Use Cases (Robert Grossman) 09-v8pCloud Computing Standards and Use Cases (Robert Grossman) 09-v8p
Cloud Computing Standards and Use Cases (Robert Grossman) 09-v8p
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in cs
 
Cloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree comparedCloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree compared
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Cloud Computing: Overview and Examples
Cloud Computing: Overview and ExamplesCloud Computing: Overview and Examples
Cloud Computing: Overview and Examples
 
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
Cloud vs grid
Cloud vs gridCloud vs grid
Cloud vs grid
 
Spatial data infrastructure in the cloud, 2011
Spatial data infrastructure in the cloud, 2011Spatial data infrastructure in the cloud, 2011
Spatial data infrastructure in the cloud, 2011
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Cloud computing - dien toan dam may
Cloud computing - dien toan dam mayCloud computing - dien toan dam may
Cloud computing - dien toan dam may
 
cloud computing models
cloud computing modelscloud computing models
cloud computing models
 
Cloud computing: highlights
Cloud computing: highlightsCloud computing: highlights
Cloud computing: highlights
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
CLOUD COMPUTING
CLOUD COMPUTINGCLOUD COMPUTING
CLOUD COMPUTING
 

Mais de Robert Grossman

Mais de Robert Grossman (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

My Other Computer is a Data Center (2010 v21)

  • 1. An Overview of Cloud Computing:My Other Computer is a Data Center Robert GrossmanOpen Cloud Consortium January 7, 2010
  • 2. Part 1What is a Cloud? 2
  • 3. What is a Cloud? 3 Software as a Service (SaaS)
  • 4. What Else is a Cloud? 4 Platform as a Service (PaaS)
  • 5. Is Anything Else a Cloud? 5 Infrastructure as a Service (IaaS)
  • 6. Are There Other Types of Clouds? 6 ad targeting Large Data Cloud Services
  • 8. Idea Dates Back to the 1960s 8 App App App CMS CMS MVS IBM VM/370 IBM Mainframe Native (Full) Virtualization Examples: Vmware ESX Virtualization first widely deployed with IBM VM/370.
  • 9. What Do You Optimize? Goal: Minimize latency and control heat. Goal: Maximize data (with matching compute) and control cost.
  • 10. 10 Scale is new
  • 11.
  • 12.
  • 13. 13
  • 14. What Resource is Managed? Scarce processors wait for data Manage cycles wait for an opening in the queue scatter the data to the processors and gather the results Persistent data wait for queries Manage data persistent data waits for queries computation done locally results returned Supercomputer Center Model Data Center Model
  • 15. Part 2. Data Centers as the Unit of Computing Cloud computing is at the top of the Gartner hype cycle. “Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions 15
  • 16. 2004 10x-100x 1976 10x-100x data science 1670 250x simulation science 1609 30x experimental science
  • 18. Transition Taking Place A hand full of players are building multiple data centers a year and improving with each one. This includes Google, Microsoft, Yahoo, … A data center today costs $200 M – $400+ M Berkeley RAD Report points out analogy with semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B 18
  • 19. Which is the Operating System? 19 … … VM 1 VM 5 VM 50,000 VM 1 Data Center Operating System Hyperviser workstation data center
  • 20. How Do You Program A Data Center? 20
  • 21. Some Programming Models for Data Centers Operations over data center of disks MapReduce (“string-based”) User-Defined Functions (UDFs) over data center SQL and Quasi-SQL over data center Data analysis / statistics over data center Operations over data center of memory Grep over distributed memory UDFs over distributed memory SQL and Quasi-SQL over distributed memory Data analysis / statistics over distributed memory
  • 22. Part 3.Open Cloud Consortium
  • 23. U.S. 501(3)(c) not-for-profit corporation Supports the development of standards and interoperability frameworks. Supports reference implementations for cloud computing. Manages testbeds: Open Cloud Testbed, IntercloudTestbed, Open Science Data Cloud Develops benchmarks. 23 www.opencloudconsortium.org
  • 24. OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, Northwestern, University of Illinois at Chicago, University of Chicago Government agencies: NASA Organizations: Sector Project 24
  • 25.
  • 30.
  • 32.
  • 33. Open Science Data Cloud sky cloud Planning to work with 5 international partners (all connected with 10 Gbps networks). biocloud 27
  • 34. MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
  • 35. Some Lessons Learned (So Far) Python over Hadoop Distributed File System surprisingly powerful. Tuning Hadoop can be a large (unacknowledged) cost. Performance of a cloud computation can be significantly impacted by just 1 or 2 nodes that are a bit slower. Wide area clouds can be practical in some cases. 29
  • 36. Part 4. Sector 30 http://sector.sourceforge.net
  • 37. Sector Overview Sector is fast As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector does not require extensive tuning. Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 supports multiple master node servers 31
  • 38. Google’s Large Data Cloud Compute Services Data Services Storage Services 32 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
  • 39. Hadoop’s Large Data Cloud Compute Services Storage Services 33 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
  • 40. Sector’s Large Data Cloud 34 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack
  • 41. Generalization: Apply User Defined Functions (UDF) to Files in Storage Cloud map/shuffle reduce 35 UDF UDF
  • 42. Hadoopvs Sector 36 Source: Gu and Grossman, Sector and Sphere, Phil. Trans. Royal Society A, 2009.
  • 43. Terasort - Sector vsHadoop Performance Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
  • 44. Sector Applications Distributing the 15 TB Sloan Digital Sky Survey to astronomers around the world (joint with JHU, 2005) Managing and analyzing high throughput sequence data (Cistrack, University of Chicago, Cistrack, 2007). Detecting emergent behavior in distributed network data (Angle, won SC 07 Analytics Challenge) Image processing for high throughput sequencing. Wide area clouds (won SC 09 BWC with 100 Gbps wide area computation) New ensemble-based algorithms for trees Graph processing 38
  • 45. Cistrack Web Portal & Widgets Cistrack Elastic Cloud Services Cistrack Database Analysis Pipelines & Re-analysis Services Cistrack Large Data Cloud Services Ingestion Services
  • 46. Thank you For more information, please see blog.rgrossman.com 40