SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
GraphLab under the hood


               Zuhair Khayyat




12/10/12                             1
GraphLab overview: GraphLab 1.0
●   GraphLab: A New Framework For Parallel
    Machine Learning
           –   high-level abstractions for machine learning
                 problems
           –   Shared-memory multiprocessor
           –   Assume no fault tolerance needed
           –   Concurrent access precessing models with
                sequential-consistency guarantees

12/10/12                                                      2
GraphLab overview: GraphLab 1.0
●   How GraphLab 1.0 works?
           –   Represent the user's data by a directed graph
           –   Each block of data is represented by a vertex
                and a directed edge
           –   Shared data table
           –   User functions:
                   ●   Update: modify the vertex and edges state,
                        read only to shared table
                   ●   Fold: sequential aggregation to a key entry in
12/10/12
                        the shared table, modify vertex data           3
                   ●   Merge: Parallelize Fold function
                   ●   Apply: Finalize the key entry in the shared table
GraphLab overview: GraphLab 1.0




12/10/12                              4
GraphLab overview: Distributed
                  GraphLab 1.0
   ●   Distributed GraphLab: A Framework for
       Machine Learning and Data Mining in the
       Cloud
             –   Fault tolerance using snapshot algorithm
             –   Improved distributed parallel processing
             –   Two stage partitioning:
                     ●   Atoms generated by ParMetis
                     ●   Ghosts generated by the intersection of the
                          atoms
12/10/12
             –   Finalize() function for vertex synchronization5
GraphLab overview: Distributed
                  GraphLab 1.0




12/10/12                                    6
GraphLab overview: Distributed
                  GraphLab 1.0




12/10/12                                      7


            Worker 1               Worker 2
                        GHosts
PowerGraph: Introduction

   ●   GraphLab 2.1
   ●   Problems of highly skewed power-law graphs:
           –   Workload imbalance ==> performance
                degradations
           –   Limiting Scalability
           –   Hard to partition if the graph is too large
           –   Storage
           –   Non-parallel computation
12/10/12                                                     8
PowerGraph: New Abstraction
●   Original Functions:
           –   Update
           –   Finalize
           –   Fold
           –   Merge
           –   Apply: The synchronization apply
●   Introduce GAS model:
           –   Gather: in, out or all neighbors
12/10/12   –   Apply: The GAS model apply         9


           –   Scatter
PowerGraph: Gather




12/10/12                                          10


           Worker 1                    Worker 2
PowerGraph: Apply




12/10/12                                             11


           Worker 1                       Worker 2
PowerGraph: Scatter




12/10/12                                       12


           Worker 1                 Worker 2
PowerGraph: Vertex Cut
                                       A   B   A   H
               A

                           B           A   G   B   C


G                                      B   H   C   D

                   H               C
                                       C   H   C    I


F                                      D   E   D    I
                       I
                                       E   F   E    I


       E                       D       F   H   F   G

    12/10/12                                       13
PowerGraph: Vertex Cut
                                   A       B   C
A              B   A    H
                               D
A              G   B    C                  F   H

B              H   C    D              I


C              H   C    I
                                                   A        H
D              E   D    I
                              A            G
                                                   E        B
E              F   E    I

                              C            D
F              H   F    G                          F        G
    12/10/12                                           14

                              E            I       C        I
PowerGraph: Vertex Cut (Greedy)

A              B   A   H       A       B

A              G   B   C
                               G       H   C
B              H   C   D

C              H   C   I
                           B       C       C        D
D              E   D   I

E              F   E   I   E       H       I        E

F              H   F   G
    12/10/12                                   15
                           F       G
PowerGraph: Experiment




12/10/12                            16
PowerGraph: Experiment




12/10/12                            17
PowerGraph: Discussion
   ●   Isn't it similar to Pregel Mode?
           –   Partially process the vertex if a message exists
   ●   Gather, Apply and Scatter are commutative
       and associative operations. What if the
       computation is not commutative!
           –   Sum up the message values in a specific order
                to get the same floating point rounding error.


12/10/12                                                   18
PowerGraph and Mizan
   ●   In Mizan we use partial replication:

       W0                 W1       W0               W1

           b                        b                          e
                               e


           c    a              f    c      a          a'       f


           d                   g    d                          g

               Compute Phase            Communication Phase
12/10/12                                                      19
GraphChi: Introduction
   ●   Asynchronous Disk-based version of
       GraphLab
   ●   Utilizing parallel sliding window
           –   Very small number of non-sequential accesses
                to the disk
   ●   Support for graph updates
           –   Based on Kineograph, a distributed system for
                processing a continuous in-flow of graph
12/10/12
                updates, while simultaneously running    20
                advanced graph mining algorithms.
GraphChi: Graph Constrains
   ●   Graph does not fit in memory
   ●   A vertex, its edges and values fits in memory




12/10/12                                          21
GraphChi: Disk storage
   ●   Compressed sparse row (CSR):
           –   Compressed adjacency list with indexes of the
                edges.
           –   Fast access to the out-degree vertices.
   ●   Compressed Sparse Column (CSC):
           –   CSR for the transpose graph
           –   Fast access to the in-degree vertices
   ●   Shard: Store the edges' data
12/10/12                                                 22
GraphChi: Loading the graph
   ●   Input graph is split into P disjoint intervals to balance
       edges, each associated with a shard
   ●   A shard contains data of the edges of an interval
   ●   The sub graph is constructed as reading its interval




12/10/12                                                     23
GraphChi: Parallel Sliding Windows
   ●   Each interval is processed in parallel
   ●   P sequential disk access are required to process
       each interval
   ●   The length of intervals vary with graph distribution
   ●   P * P disk access required for one superstep




12/10/12                                                      24
GraphChi: Example

      Executing interval (1,2):




12/10/12                                  25
           (1,2)      (3,4)       (5,6)
GraphChi: Example

      Executing interval (3,4):




12/10/12                                  26
           (1,2)      (3,4)       (5,6)
GraphChi: Example




12/10/12                       27
GraphChi: Evolving Graphs
   ●   Adding an edge is reflected on the intervals and
       shards if read
   ●   Deleting an edge causes that edge to be ignored
   ●   Adding and deleting edges are handled after
       processing the current interval.




12/10/12                                                  28
GraphChi: Preprocessing




12/10/12                             29
Thank you




12/10/12               30
The Blog wants YOU




12/10/12                                  31
           thegraphsblog.wordpress.com/

Mais conteúdo relacionado

Mais de Zuhair khayyat

Scaling Big Data Cleansing
Scaling Big Data CleansingScaling Big Data Cleansing
Scaling Big Data CleansingZuhair khayyat
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTZuhair khayyat
 
IEJoin and Big Data Cleansing
IEJoin and Big Data CleansingIEJoin and Big Data Cleansing
IEJoin and Big Data CleansingZuhair khayyat
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015Zuhair khayyat
 
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Zuhair khayyat
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph ProcessingZuhair khayyat
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingZuhair khayyat
 

Mais de Zuhair khayyat (11)

Scaling Big Data Cleansing
Scaling Big Data CleansingScaling Big Data Cleansing
Scaling Big Data Cleansing
 
BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUST
 
IEJoin and Big Data Cleansing
IEJoin and Big Data CleansingIEJoin and Big Data Cleansing
IEJoin and Big Data Cleansing
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015
 
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph Processing
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
 
Google appengine
Google appengineGoogle appengine
Google appengine
 
MapReduce
MapReduceMapReduce
MapReduce
 
Kineograph
KineographKineograph
Kineograph
 
Dynamo db
Dynamo dbDynamo db
Dynamo db
 

Último

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Último (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Graphlab under the hood

  • 1. GraphLab under the hood Zuhair Khayyat 12/10/12 1
  • 2. GraphLab overview: GraphLab 1.0 ● GraphLab: A New Framework For Parallel Machine Learning – high-level abstractions for machine learning problems – Shared-memory multiprocessor – Assume no fault tolerance needed – Concurrent access precessing models with sequential-consistency guarantees 12/10/12 2
  • 3. GraphLab overview: GraphLab 1.0 ● How GraphLab 1.0 works? – Represent the user's data by a directed graph – Each block of data is represented by a vertex and a directed edge – Shared data table – User functions: ● Update: modify the vertex and edges state, read only to shared table ● Fold: sequential aggregation to a key entry in 12/10/12 the shared table, modify vertex data 3 ● Merge: Parallelize Fold function ● Apply: Finalize the key entry in the shared table
  • 5. GraphLab overview: Distributed GraphLab 1.0 ● Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud – Fault tolerance using snapshot algorithm – Improved distributed parallel processing – Two stage partitioning: ● Atoms generated by ParMetis ● Ghosts generated by the intersection of the atoms 12/10/12 – Finalize() function for vertex synchronization5
  • 6. GraphLab overview: Distributed GraphLab 1.0 12/10/12 6
  • 7. GraphLab overview: Distributed GraphLab 1.0 12/10/12 7 Worker 1 Worker 2 GHosts
  • 8. PowerGraph: Introduction ● GraphLab 2.1 ● Problems of highly skewed power-law graphs: – Workload imbalance ==> performance degradations – Limiting Scalability – Hard to partition if the graph is too large – Storage – Non-parallel computation 12/10/12 8
  • 9. PowerGraph: New Abstraction ● Original Functions: – Update – Finalize – Fold – Merge – Apply: The synchronization apply ● Introduce GAS model: – Gather: in, out or all neighbors 12/10/12 – Apply: The GAS model apply 9 – Scatter
  • 10. PowerGraph: Gather 12/10/12 10 Worker 1 Worker 2
  • 11. PowerGraph: Apply 12/10/12 11 Worker 1 Worker 2
  • 12. PowerGraph: Scatter 12/10/12 12 Worker 1 Worker 2
  • 13. PowerGraph: Vertex Cut A B A H A B A G B C G B H C D H C C H C I F D E D I I E F E I E D F H F G 12/10/12 13
  • 14. PowerGraph: Vertex Cut A B C A B A H D A G B C F H B H C D I C H C I A H D E D I A G E B E F E I C D F H F G F G 12/10/12 14 E I C I
  • 15. PowerGraph: Vertex Cut (Greedy) A B A H A B A G B C G H C B H C D C H C I B C C D D E D I E F E I E H I E F H F G 12/10/12 15 F G
  • 18. PowerGraph: Discussion ● Isn't it similar to Pregel Mode? – Partially process the vertex if a message exists ● Gather, Apply and Scatter are commutative and associative operations. What if the computation is not commutative! – Sum up the message values in a specific order to get the same floating point rounding error. 12/10/12 18
  • 19. PowerGraph and Mizan ● In Mizan we use partial replication: W0 W1 W0 W1 b b e e c a f c a a' f d g d g Compute Phase Communication Phase 12/10/12 19
  • 20. GraphChi: Introduction ● Asynchronous Disk-based version of GraphLab ● Utilizing parallel sliding window – Very small number of non-sequential accesses to the disk ● Support for graph updates – Based on Kineograph, a distributed system for processing a continuous in-flow of graph 12/10/12 updates, while simultaneously running 20 advanced graph mining algorithms.
  • 21. GraphChi: Graph Constrains ● Graph does not fit in memory ● A vertex, its edges and values fits in memory 12/10/12 21
  • 22. GraphChi: Disk storage ● Compressed sparse row (CSR): – Compressed adjacency list with indexes of the edges. – Fast access to the out-degree vertices. ● Compressed Sparse Column (CSC): – CSR for the transpose graph – Fast access to the in-degree vertices ● Shard: Store the edges' data 12/10/12 22
  • 23. GraphChi: Loading the graph ● Input graph is split into P disjoint intervals to balance edges, each associated with a shard ● A shard contains data of the edges of an interval ● The sub graph is constructed as reading its interval 12/10/12 23
  • 24. GraphChi: Parallel Sliding Windows ● Each interval is processed in parallel ● P sequential disk access are required to process each interval ● The length of intervals vary with graph distribution ● P * P disk access required for one superstep 12/10/12 24
  • 25. GraphChi: Example Executing interval (1,2): 12/10/12 25 (1,2) (3,4) (5,6)
  • 26. GraphChi: Example Executing interval (3,4): 12/10/12 26 (1,2) (3,4) (5,6)
  • 28. GraphChi: Evolving Graphs ● Adding an edge is reflected on the intervals and shards if read ● Deleting an edge causes that edge to be ignored ● Adding and deleting edges are handled after processing the current interval. 12/10/12 28
  • 31. The Blog wants YOU 12/10/12 31 thegraphsblog.wordpress.com/