SlideShare uma empresa Scribd logo
1 de 33
Distributed Data Storage and Parallel Processing Engine   Sector & Sphere Yunhong Gu  Univ. of Illinois at Chicago
What is Sector/Sphere? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object]
Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO,  data locality
Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. But too complicated Sector/Sphere model (cloud model): Clusters are a unity to the developer, simplified programming interface. Limited to certain data parallel applications.
Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.
Sector Distributed File System Security Server Masters slaves slaves SSL SSL Clients User account Data protection System Security Metadata Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional
Sector Distributed File System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sector: Performance ,[object Object],[object Object],[object Object],[object Object],[object Object]
UDT: UDP-based Data Transfer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sector: Fault Tolerance ,[object Object],[object Object],[object Object]
Sector: Security ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sector: Tools and API ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sphere: Simplified Data Processing ,[object Object],[object Object],[object Object],[object Object],[object Object]
Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …);   SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss," findBrownDwarf ", …); myproc->read(result);   findBrownDwarf(char* image, int isize, char* result, int rsize);
Sphere: Data Movement ,[object Object],[object Object],[object Object]
Sphere/UDF vs. MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sphere/UDF vs. MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object]
Why Sector doesn’t Split Files? ,[object Object],[object Object],[object Object],[object Object]
Load Balance ,[object Object],[object Object]
Fault Tolerance ,[object Object],[object Object],[object Object],[object Object],[object Object]
Open Cloud Testbed ,[object Object],[object Object],[object Object],[object Object],[object Object]
Open Cloud Testbed
The TeraSort Benchmark ,[object Object],[object Object],[object Object]
TeraSort 10-byte 90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket-1023 0-1023 Stage 1 : Hash based on  the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2 : Sort each bucket  on local node 100 bytes record
Performance Results: TeraSort Run time: seconds Sector v1.16 vs Hadoop 0.17 1.2TB 900GB 600GB 300GB Data Size 3702 6675 1526 UIC + StarLight + Calit2 + JHU 3069 4341 1430 UIC + StarLight + Calit2 2617 2896 1361 UIC + StarLight 2252 2889 1265 UIC Hadoop (1 replica) Hadoop (3 replicas) Sphere
Performance Results: TeraSort ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The MalStone Benchmark ,[object Object],[object Object],[object Object],http://code.google.com/p/malgen/
MalStone Site ID Time Key Value 3-byte site-000X site-001X site-999X 000-999 Stage 1 : Process each record and hash into buckets according to site ID site-000X site-001X site-999x Stage 2 : Compute infection rate  for each merchant Event ID | Timestamp | Site ID | Compromise Flag | Entity ID 00000000005000000043852268954353585368|2008-11-08 17:56:52.422640|3857268954353628599|1|000000497829 Text Record Transform Flag
Performance Results: MalStone * Courtesy of Collin Bennet and Jonathan Seidman of Open Data Group. Process 10 billions records on 20 OCT nodes (local). 43m 44s  33m 40s  Sector/Sphere 142m 32s  87m 29s Hadoop Streaming/Python 840m 50s  454m 13s  Hadoop MalStone-B MalStone-A
System Monitoring (Testbed)
System Monitoring (Sector/Sphere)
For More Information ,[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Introduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureIntroduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureAnkita Mahajan
 
Keil tutorial
Keil tutorialKeil tutorial
Keil tutorialanishgoel
 
Social Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSocial Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSimon Caton
 
KERNAL ARCHITECTURE
KERNAL ARCHITECTUREKERNAL ARCHITECTURE
KERNAL ARCHITECTURElakshmipanat
 
Computer system architecture
Computer system architectureComputer system architecture
Computer system architectureKumar
 
Scalability and Reliability in the Cloud
Scalability and Reliability in the CloudScalability and Reliability in the Cloud
Scalability and Reliability in the Cloudgmthomps
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and AnalysisPrashant Chopra
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Virtual machines and their architecture
Virtual machines and their architectureVirtual machines and their architecture
Virtual machines and their architectureMrinmoy Dalal
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureThanakrit Lersmethasakul
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Windows Architecture Explained by Stacksol
Windows Architecture Explained by StacksolWindows Architecture Explained by Stacksol
Windows Architecture Explained by StacksolStacksol
 

Mais procurados (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureIntroduction to Data Center Network Architecture
Introduction to Data Center Network Architecture
 
Keil tutorial
Keil tutorialKeil tutorial
Keil tutorial
 
System security
System securitySystem security
System security
 
CLOUD COMPUTING AND STORAGE
CLOUD COMPUTING AND STORAGECLOUD COMPUTING AND STORAGE
CLOUD COMPUTING AND STORAGE
 
network storage
network storagenetwork storage
network storage
 
Social Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSocial Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social Networks
 
KERNAL ARCHITECTURE
KERNAL ARCHITECTUREKERNAL ARCHITECTURE
KERNAL ARCHITECTURE
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
Computer system architecture
Computer system architectureComputer system architecture
Computer system architecture
 
Scalability and Reliability in the Cloud
Scalability and Reliability in the CloudScalability and Reliability in the Cloud
Scalability and Reliability in the Cloud
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Virtual machines and their architecture
Virtual machines and their architectureVirtual machines and their architecture
Virtual machines and their architecture
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Windows Architecture Explained by Stacksol
Windows Architecture Explained by StacksolWindows Architecture Explained by Stacksol
Windows Architecture Explained by Stacksol
 

Destaque

Integrating spheres
Integrating spheresIntegrating spheres
Integrating spheresJehona Salaj
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutoriallilyco
 
Sphere international 2017
Sphere international 2017Sphere international 2017
Sphere international 2017Niomi Cowling
 
Volume of cylinders cones and spheres edmodo
Volume of cylinders cones and spheres edmodoVolume of cylinders cones and spheres edmodo
Volume of cylinders cones and spheres edmodoshumwayc
 
G:\Panitia\Presentation Earth As A Sphere
G:\Panitia\Presentation Earth As A SphereG:\Panitia\Presentation Earth As A Sphere
G:\Panitia\Presentation Earth As A SphereHamidah Hassan
 
Celestial Sphere SK
Celestial Sphere SKCelestial Sphere SK
Celestial Sphere SKMickey Menon
 
Ceh v7 module 01 introduction to ethical hacking
Ceh v7 module 01 introduction to ethical hackingCeh v7 module 01 introduction to ethical hacking
Ceh v7 module 01 introduction to ethical hackingsabulite
 
Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...
Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...
Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...kamal brar
 
Instructional Materials in Mathematics
Instructional Materials in MathematicsInstructional Materials in Mathematics
Instructional Materials in MathematicsMary Caryl Yaun
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentationelliehood
 

Destaque (15)

Integrating spheres
Integrating spheresIntegrating spheres
Integrating spheres
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutorial
 
Sphere international 2017
Sphere international 2017Sphere international 2017
Sphere international 2017
 
Volume of cylinders cones and spheres edmodo
Volume of cylinders cones and spheres edmodoVolume of cylinders cones and spheres edmodo
Volume of cylinders cones and spheres edmodo
 
G:\Panitia\Presentation Earth As A Sphere
G:\Panitia\Presentation Earth As A SphereG:\Panitia\Presentation Earth As A Sphere
G:\Panitia\Presentation Earth As A Sphere
 
Ven de graaff generator 745
Ven de graaff generator 745Ven de graaff generator 745
Ven de graaff generator 745
 
Cone, cylinder,and sphere
Cone, cylinder,and sphereCone, cylinder,and sphere
Cone, cylinder,and sphere
 
Celestial Sphere SK
Celestial Sphere SKCelestial Sphere SK
Celestial Sphere SK
 
Ceh v7 module 01 introduction to ethical hacking
Ceh v7 module 01 introduction to ethical hackingCeh v7 module 01 introduction to ethical hacking
Ceh v7 module 01 introduction to ethical hacking
 
E ball ppt
E ball pptE ball ppt
E ball ppt
 
Mirror ppt
Mirror ppt Mirror ppt
Mirror ppt
 
Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...
Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...
Surface area of a cuboid and a cube,cylinder,cone,sphere,volume of cuboid,cyl...
 
Instructional Materials in Mathematics
Instructional Materials in MathematicsInstructional Materials in Mathematics
Instructional Materials in Mathematics
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentation
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Semelhante a sector-sphere

BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
 
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009Robert Grossman
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt omalreda
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.pptpadalamail
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 

Semelhante a sector-sphere (20)

BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt60141457-Oracle-Golden-Gate-Presentation.ppt
60141457-Oracle-Golden-Gate-Presentation.ppt
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
MYSQL
MYSQLMYSQL
MYSQL
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Handout3o
Handout3oHandout3o
Handout3o
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 

Mais de xlight

What does it take to make google work at scale
What does it take to make google work at scale What does it take to make google work at scale
What does it take to make google work at scale xlight
 
淘宝无线电子商务数据报告
淘宝无线电子商务数据报告淘宝无线电子商务数据报告
淘宝无线电子商务数据报告xlight
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filterxlight
 
Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0xlight
 
Oracle ha
Oracle haOracle ha
Oracle haxlight
 
Oracle 高可用概述
Oracle 高可用概述Oracle 高可用概述
Oracle 高可用概述xlight
 
Stats partitioned table
Stats partitioned tableStats partitioned table
Stats partitioned tablexlight
 
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010xlight
 
C/C++与Lua混合编程
C/C++与Lua混合编程C/C++与Lua混合编程
C/C++与Lua混合编程xlight
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systemsxlight
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systemsxlight
 
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile ServiceHigh Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Servicexlight
 
PgSQL vs MySQL
PgSQL vs MySQLPgSQL vs MySQL
PgSQL vs MySQLxlight
 
SpeedGeeks
SpeedGeeksSpeedGeeks
SpeedGeeksxlight
 
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems xlight
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Make Your web Work
Make Your web WorkMake Your web Work
Make Your web Workxlight
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 

Mais de xlight (20)

What does it take to make google work at scale
What does it take to make google work at scale What does it take to make google work at scale
What does it take to make google work at scale
 
淘宝无线电子商务数据报告
淘宝无线电子商务数据报告淘宝无线电子商务数据报告
淘宝无线电子商务数据报告
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
 
Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0
 
Oracle ha
Oracle haOracle ha
Oracle ha
 
Oracle 高可用概述
Oracle 高可用概述Oracle 高可用概述
Oracle 高可用概述
 
Stats partitioned table
Stats partitioned tableStats partitioned table
Stats partitioned table
 
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
 
C/C++与Lua混合编程
C/C++与Lua混合编程C/C++与Lua混合编程
C/C++与Lua混合编程
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
 
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile ServiceHigh Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
 
PgSQL vs MySQL
PgSQL vs MySQLPgSQL vs MySQL
PgSQL vs MySQL
 
SpeedGeeks
SpeedGeeksSpeedGeeks
SpeedGeeks
 
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
 
UDT
UDTUDT
UDT
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Make Your web Work
Make Your web WorkMake Your web Work
Make Your web Work
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

sector-sphere

  • 1. Distributed Data Storage and Parallel Processing Engine Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago
  • 2.
  • 3.
  • 4. Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO, data locality
  • 5. Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. But too complicated Sector/Sphere model (cloud model): Clusters are a unity to the developer, simplified programming interface. Limited to certain data parallel applications.
  • 6. Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.
  • 7. Sector Distributed File System Security Server Masters slaves slaves SSL SSL Clients User account Data protection System Security Metadata Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss," findBrownDwarf ", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 24.
  • 25. TeraSort 10-byte 90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket-1023 0-1023 Stage 1 : Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2 : Sort each bucket on local node 100 bytes record
  • 26. Performance Results: TeraSort Run time: seconds Sector v1.16 vs Hadoop 0.17 1.2TB 900GB 600GB 300GB Data Size 3702 6675 1526 UIC + StarLight + Calit2 + JHU 3069 4341 1430 UIC + StarLight + Calit2 2617 2896 1361 UIC + StarLight 2252 2889 1265 UIC Hadoop (1 replica) Hadoop (3 replicas) Sphere
  • 27.
  • 28.
  • 29. MalStone Site ID Time Key Value 3-byte site-000X site-001X site-999X 000-999 Stage 1 : Process each record and hash into buckets according to site ID site-000X site-001X site-999x Stage 2 : Compute infection rate for each merchant Event ID | Timestamp | Site ID | Compromise Flag | Entity ID 00000000005000000043852268954353585368|2008-11-08 17:56:52.422640|3857268954353628599|1|000000497829 Text Record Transform Flag
  • 30. Performance Results: MalStone * Courtesy of Collin Bennet and Jonathan Seidman of Open Data Group. Process 10 billions records on 20 OCT nodes (local). 43m 44s 33m 40s Sector/Sphere 142m 32s 87m 29s Hadoop Streaming/Python 840m 50s 454m 13s Hadoop MalStone-B MalStone-A
  • 33.