SlideShare uma empresa Scribd logo
1 de 25
Sector: An Open Source Cloud for Data Intensive Computing Robert Grossman University of Illinois at ChicagoOpen Data Group October 20, 2009
Part 1.  Sector 2 http://sector.sourceforge.net
Sector Overview Sector is fastest open source large data cloud As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 has a backup master node server 3
About Sector  YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector. Sector is open source (BSD License) and available from sector.sourceforge.net The current version is 1.24a 4
Target Configurations Sector is designed to run on racks of commodity computers Typical rack configuration today (Oct, 2009) Rack of 32 quad-core 1U computers Each computer has 4 x 1TB disks Each computer has 1 Gbps connection to a top of a rack switch Sometimes these are called Raywulf clusters 5
Google’s Large Data Cloud Compute Services Data Services Storage Services 6 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
Hadoop’s Large Data Cloud Compute Services Storage Services 7 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
Sector’s Large Data Cloud 8 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack
Comparing Sector and Hadoop 9
Terasort - Sector vsHadoop Performance Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack.  Data consisted of 20 nodes with 500 million 100-byte records / node.
How Do You Program A Data Center? 12
Idea 1 – Support UDF’s Over Data Center Think of MapReduce as Map acting on (text) records With fixed Shuffle and Sort Followed by Reducing acting on (text) records We generalize this framework as follows: Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files. MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc. 13
Applying UDF using Sector/Sphere 14 1. Split data Application Sphere Client Input  stream SPE SPE SPE 2. Locate & schedule Sphere Processing Engine (SPE) 3. Collect results Output stream
Sector Programming Model Sector dataset consists of one or more physical files Sphere applies User Defined Functions over streams of data consisting of data segments Data segments can be data records, collections of data records, or files Example of UDFs: Map function, Reduce function, Split function for CART, etc. Outputs of UDFs can be returned to originating node, written to local node,  or shuffled to another node. 15
How Do Move Data in a Cloud & Between Clouds? 16 Option 1: Use TCP and close your eyes. Option 2:    ?????
Idea 2: Sector is Built on Top of UDT 17 ,[object Object]
UDT can take advantage of wide area high performance 10 Gbps network
Sector is a wide area distributed file system built over UDT.
Sector is layered over the native file system (vs being a block-based file system).,[object Object]
(x) UDT Scalable TCP HighSpeed TCP AIMD (TCP NewReno) x Alternatives to TCP – Decreasing Increases AIMD Protocols increase of packet sending rate x decrease factor
UDT Makes Wide Area Clouds Possible Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps) 20 10 Gbps per application
What About Security? 21
Idea 3: Add Security From the Start Security Server Security server maintains information about users and slaves. User access control: password and client IP address. File level access control. Messages are encrypted over SSL. Certificate is used for authentication. Sector is HIPAA capable. Master Client SSL SSL AAA data Slaves
For More Information About Sector YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009 http://arxiv.org/abs/0809.1181 http://rsta.royalsocietypublishing.org/content/367/1897/2429 23

Mais conteúdo relacionado

Mais procurados (20)

DBMS Unit IV and V Material
DBMS Unit IV and V MaterialDBMS Unit IV and V Material
DBMS Unit IV and V Material
 
assignment3
assignment3assignment3
assignment3
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Upper layer protocol
Upper layer protocolUpper layer protocol
Upper layer protocol
 
Ds1 int (1)
Ds1 int (1)Ds1 int (1)
Ds1 int (1)
 
Memory allocation (4)
Memory allocation (4)Memory allocation (4)
Memory allocation (4)
 
Ch20
Ch20Ch20
Ch20
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Allocation and free space management
Allocation and free space managementAllocation and free space management
Allocation and free space management
 
Network topology for ha
Network topology for haNetwork topology for ha
Network topology for ha
 
RAID Levels
RAID LevelsRAID Levels
RAID Levels
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Tape Access Optimization With TReqS
Tape Access Optimization With TReqSTape Access Optimization With TReqS
Tape Access Optimization With TReqS
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
 
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage DelugeWebinar: 3 Steps to Controlling the Secondary Storage Deluge
Webinar: 3 Steps to Controlling the Secondary Storage Deluge
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
報告
報告報告
報告
 

Semelhante a Sector - Presentation at Cloud Computing & Its Applications 2009

My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)Robert Grossman
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Robert Grossman
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsGokhan Boranalp
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Gabriele Bozzi
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaGabriele Bozzi
 
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...IJNSA Journal
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster ComputingNIKHIL NAIR
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutoriallilyco
 
IRJET- Collaborative Network Security in Data Center for Cloud Computing
IRJET-  	  Collaborative Network Security in Data Center for Cloud ComputingIRJET-  	  Collaborative Network Security in Data Center for Cloud Computing
IRJET- Collaborative Network Security in Data Center for Cloud ComputingIRJET Journal
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptMohmdUmer
 

Semelhante a Sector - Presentation at Cloud Computing & Its Applications 2009 (20)

My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
 
CloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom ItaliaCloudCamp Milan 2009: Telecom Italia
CloudCamp Milan 2009: Telecom Italia
 
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
SECURING DATA TRANSFER IN THE CLOUD THROUGH INTRODUCING IDENTIFICATION PACKET...
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutorial
 
IRJET- Collaborative Network Security in Data Center for Cloud Computing
IRJET-  	  Collaborative Network Security in Data Center for Cloud ComputingIRJET-  	  Collaborative Network Security in Data Center for Cloud Computing
IRJET- Collaborative Network Security in Data Center for Cloud Computing
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
sdnppt.pdf
sdnppt.pdfsdnppt.pdf
sdnppt.pdf
 
grid mining
grid mininggrid mining
grid mining
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 

Mais de Robert Grossman

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 

Mais de Robert Grossman (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Sector - Presentation at Cloud Computing & Its Applications 2009

  • 1. Sector: An Open Source Cloud for Data Intensive Computing Robert Grossman University of Illinois at ChicagoOpen Data Group October 20, 2009
  • 2. Part 1. Sector 2 http://sector.sourceforge.net
  • 3. Sector Overview Sector is fastest open source large data cloud As measured by MalStone & Terasort Sector is easy to program Supports UDFs, MapReduce & Python over streams Sector is secure A HIPAA compliant Sector cloud is being set up Sector is reliable Sector v1.24 has a backup master node server 3
  • 4. About Sector YunhongGu from the Laboratory for Advanced Computing at the University of Illinois at Chicago is the Lead Developer of Sector. Sector is open source (BSD License) and available from sector.sourceforge.net The current version is 1.24a 4
  • 5. Target Configurations Sector is designed to run on racks of commodity computers Typical rack configuration today (Oct, 2009) Rack of 32 quad-core 1U computers Each computer has 4 x 1TB disks Each computer has 1 Gbps connection to a top of a rack switch Sometimes these are called Raywulf clusters 5
  • 6. Google’s Large Data Cloud Compute Services Data Services Storage Services 6 Applications Google’s MapReduce Google’s BigTable Google File System (GFS) Google’s Stack
  • 7. Hadoop’s Large Data Cloud Compute Services Storage Services 7 Applications Hadoop’sMapReduce Data Services Hadoop Distributed File System (HDFS) Hadoop’s Stack
  • 8. Sector’s Large Data Cloud 8 Applications Compute Services Sphere’s UDFs Data Services Sector’s Distributed File System (SDFS) Storage Services UDP-based Data Transport Protocol (UDT) Routing & Transport Services Sector’s Stack
  • 10. Terasort - Sector vsHadoop Performance Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks.
  • 11. MalStone (OCC-Developed Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node.
  • 12. How Do You Program A Data Center? 12
  • 13. Idea 1 – Support UDF’s Over Data Center Think of MapReduce as Map acting on (text) records With fixed Shuffle and Sort Followed by Reducing acting on (text) records We generalize this framework as follows: Support a sequence of User Defined Functions (UDF) acting on segments (=chunks) of files. MapReduce is one special case consisting of a user defined Map, a system-defined shuffle and sort, and a user defined reduce In both cases, framework takes care of assigning nodes to process data, restarting failed processes, etc. 13
  • 14. Applying UDF using Sector/Sphere 14 1. Split data Application Sphere Client Input stream SPE SPE SPE 2. Locate & schedule Sphere Processing Engine (SPE) 3. Collect results Output stream
  • 15. Sector Programming Model Sector dataset consists of one or more physical files Sphere applies User Defined Functions over streams of data consisting of data segments Data segments can be data records, collections of data records, or files Example of UDFs: Map function, Reduce function, Split function for CART, etc. Outputs of UDFs can be returned to originating node, written to local node, or shuffled to another node. 15
  • 16. How Do Move Data in a Cloud & Between Clouds? 16 Option 1: Use TCP and close your eyes. Option 2: ?????
  • 17.
  • 18. UDT can take advantage of wide area high performance 10 Gbps network
  • 19. Sector is a wide area distributed file system built over UDT.
  • 20.
  • 21. (x) UDT Scalable TCP HighSpeed TCP AIMD (TCP NewReno) x Alternatives to TCP – Decreasing Increases AIMD Protocols increase of packet sending rate x decrease factor
  • 22. UDT Makes Wide Area Clouds Possible Using UDT, Sector can take advantage of wide area high performance networks (10+ Gbps) 20 10 Gbps per application
  • 24. Idea 3: Add Security From the Start Security Server Security server maintains information about users and slaves. User access control: password and client IP address. File level access control. Messages are encrypted over SSL. Certificate is used for authentication. Sector is HIPAA capable. Master Client SSL SSL AAA data Slaves
  • 25. For More Information About Sector YunhongGu and Robert L Grossman, Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data, Philosophical Transactions of the Royal Society A, Volume 367, Number 1897, pages 2429--2445, 2009 http://arxiv.org/abs/0809.1181 http://rsta.royalsocietypublishing.org/content/367/1897/2429 23
  • 26. For Related Information Related information can be found at: blog.rgrossman.com www.rgrossman.com 24