SlideShare uma empresa Scribd logo
1 de 142
Mining the Social Graph
          mixi.inc
       Shunya Kimura
Introduction

•   Name: Shunya Kimura

    •   twitter: @kimuras

•   Job:Data mining, Software engineering

    •   text mining, graph mining, search engine
Agenda

• Introduction
• The past work
• Introduction to GraphDB
• Introduction to Neo4j
• Introduction to analysis sample
Introduction
Motivation for social graph analysis
 Test of millions of nodes, hundreds of millions of edges.

 The diversity of graph algorithm by developing distributed processing technology.

 Challenging.
Number of users on mixi
                 30000000

                                    ID
                 22500000
# of member id




                 15000000



                  7500000



                        0
                            2007         2008   2009   2010   2011
                                                year
What is Social Graph?
Feed Back
Feed Back
Feed Back
Feed Back
Feed Back
Approach for SG analysis


           Feed Back
Approach for SG analysis


           Feed Back
Approach for SG analysis


           Feed Back
Approach for SG analysis


           Feed Back
The past work
• Friend recommend
• Friend recommend
• Community recommend
Relational Databases




from_id    to_id    id   name     age
1          2        1    Kimura   18
1          3        2    kato     45
2          3        3    ito      21
Relational Databases

                                        Dump &
                                        Denormalization




from_id    to_id    id   name     age
1          2        1    Kimura   18
1          3        2    kato     45
2          3        3    ito      21
Relational Databases

                                        Dump &
                                        Denormalization




from_id    to_id    id   name     age                     Key      value

1          2        1    Kimura   18                      From:1   2,3

1          3        2    kato     45                      From:2   3

2          3        3    ito      21                      Prof:1   Kimura,18
                                                          Prof:2   Kato,45
Relational Databases

                                        Dump &
                                        Denormalization




from_id    to_id    id   name     age                     Key      value

1          2        1    Kimura   18                      From:1   2,3

1          3        2    kato     45                      From:2   3

2          3        3    ito      21                      Prof:1   Kimura,18
                                                          Prof:2   Kato,45
Relational Databases

                                        Dump &
                                        Denormalization




from_id    to_id    id   name     age                     Key      value

1          2        1    Kimura   18                      From:1   2,3

1          3        2    kato     45                      From:2   3

2          3        3    ito      21                      Prof:1   Kimura,18
                                                          Prof:2   Kato,45
Relational Databases

                                        Dump &
                                        Denormalization




from_id    to_id    id   name     age                     Key      value

1          2        1    Kimura   18                      From:1   2,3

1          3        2    kato     45                      From:2   3

2          3        3    ito      21                      Prof:1   Kimuras,18
                                                          Prof:2   Kato,45
Relational Databases

                                        Dump &

                   reimplementation     Denormalization




from_id    to_id    id   name     age                     Key      value

1          2        1    Kimura   18                      From:1   2,3

1          3        2    kato     45                      From:2   3

2          3        3    ito      21                      Prof:1   Kimuras,18
                                                          Prof:2   Kato,45
Relational Databases

                                        Dump &

                   reimplementation     Denormalization




from_id    to_id    id   name     age                     Key      value

1
1
           2
           3
                   maintenance cost
                    1
                    2
                         Kimura
                         kato
                                  18
                                  45
                                                          From:1
                                                          From:2
                                                                   2,3
                                                                   3

2          3        3    ito      21                      Prof:1   Kimuras,18
                                                          Prof:2   Kato,45
Relational Databases

                                        Dump &

                   reimplementation     Denormalization




from_id    to_id    id   name     age                     Key      value

1
1
           2
           3
                   maintenance cost
                    1
                    2
                         Kimura
                         kato
                                  18
                                  45
                                                          From:1
                                                          From:2
                                                                   2,3
                                                                   3

2          3        3    ito      21                      Prof:1   Kimuras,18
                                                          Prof:2   Kato,45

                               scalability
Introduction to GraphDB
What is graph
What is graph
   Vertex (node)
What is graph
       Vertex (node)




Edge
What is graph
       Vertex (node)


            Undirected graph

Edge
What is graph
       Vertex (node)




Edge
What is graph
       Vertex (node)




Edge
What is graph
       Vertex (node)




Edge
What is graph
       Vertex (node)


             Directed graph

Edge
What is GraphDB
        Vertex (node)




 Edge
What is GraphDB
ID:   1
               Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




       Edge
What is GraphDB
ID:   1
               Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




       Edge
                  ID:   2
                  NAME: ITO
                  PROP: Female
                  AGE: 21
What is GraphDB
ID:   1
               Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




       Edge
                  ID:   2
                  NAME: ITO
                  PROP: Female
                  AGE: 21
What is GraphDB
ID:   1
               Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




       Edge
                  ID:   2
                  NAME: ITO
                  PROP: Female
                  AGE: 21
What is GraphDB
ID:   1
                       Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




        Edge
                          ID:   2
ID:       3               NAME: ITO
LABEL:    Like            PROP: Female
Since:    2011/08/06      AGE: 21
OutGoing: 2
What is GraphDB
ID:   1
                       Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




        Edge
                          ID:   2
ID:       3               NAME: ITO
LABEL:    Like            PROP: Female
Since:    2011/08/06      AGE: 21
OutGoing: 2
What is GraphDB
ID:   1
                       Vertex (node)
NAME: kimura
PROP: Male
AGE: 18




        Edge
                          ID:   2
ID:       3               NAME: ITO
LABEL:    Like            PROP: Female
Since:    2011/08/06      AGE: 21
OutGoing: 2
The implementations
   for GraphDB




  http://en.wikipedia.org/wiki/GraphDB
Introduction to Neo4j
GraphDB Neo4j
       •     True ACID transactions
       •     High availability
       •     Scales to billions of nods and relationships
       •     High speed querying through traversals


               Single instance(GPLv3)   Multiple instance(AGPLv3)
Embedded       EmbeddedGraphDatabase    HighlyAvailableGraphDatabase
Standalone     Neo4j Server             Neo4j Server high availability mode


                                                   http://neo4j.org/
Other my favorite features
       for Neo4j




          http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features
       for Neo4j
• RESTful APIs




                 http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features
       for Neo4j
• RESTful APIs
• Query Language(Cypher)




                 http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features
       for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
 – lucene




                 http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features
       for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
   – lucene
• Implemented graph algorithm
 – A*, Dijkstra
 – High speed traverse




                         http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Other my favorite features
       for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
   – lucene
• Implemented graph algorithm
 – A*, Dijkstra
 – High speed traverse
• Gremlin supported
 – Like a query language

                         http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
Introduction simple Neo4j usecase
           Single node           Multi node
Embedded
Server
Introduction simple Neo4j usecase
           Single node           Multi node
Embedded



           Analyses system
Server
Introduction simple Neo4j usecase
           Single node           Multi node
Embedded



           Analyses system       Analyses system
Server
Introduction simple Neo4j usecase
           Single node           Multi node
Embedded



           Analyses system       Analyses system




           Analyses system
Server
Introduction simple Neo4j usecase
           Single node           Multi node
Embedded



           Analyses system       Analyses system




           Analyses system       Analyses system
Server
Introduction simple Neo4j usecase
           Single node           Multi node
Embedded



           Analyses system       Analyses system




           Analyses system       Analyses system
Server
Introduction simple Neo4j usecase
              Single node          Multi node
           Analyses system
Embedded



                                   Analyses system




             Analyses system       Analyses system
Server
Introduction simple Neo4j usecase
              Single node          Multi node
           Analyses system
Embedded



                                   Analyses system




             Analyses system       Analyses system
Server
Introduction to simple
   embedded Neo4j

• Insert Vertices & make Relationships
 • Single node & Embedded
• Traversal sample
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {
        GraphDatabaseService graphDb = new
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
            firstNode.setProperty("Name", "Kimura");
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {
            tx.finish();
        }
        graphDb.shutdown();
    }
}
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {
        GraphDatabaseService graphDb = new
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
            firstNode.setProperty("Name", "Kimura");
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {
            tx.finish();
        }
        graphDb.shutdown();
    }
}
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {            ID:   1
        GraphDatabaseService graphDb = new                    NAME: kimura
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
            firstNode.setProperty("Name", "Kimura");
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {
            tx.finish();
        }
        graphDb.shutdown();
    }
}
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {            ID:   1
        GraphDatabaseService graphDb = new                    NAME: kimura
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
            firstNode.setProperty("Name", "Kimura");
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {
            tx.finish();
        }
        graphDb.shutdown();
    }
}
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {            ID:   1
        GraphDatabaseService graphDb = new                    NAME: kimura
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
            firstNode.setProperty("Name", "Kimura");
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {                                           ID:   2
            tx.finish();                                      NAME: Kato
        }
        graphDb.shutdown();
    }
}
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {            ID:   1
        GraphDatabaseService graphDb = new                    NAME: kimura
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
            firstNode.setProperty("Name", "Kimura");
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {                                           ID:   2
            tx.finish();                                      NAME: Kato
        }
        graphDb.shutdown();
    }
}
Insert vertices,
                   make relationship
public final class InputVertex {
    public static void main(final String[] args) {                             ID:   1
        GraphDatabaseService graphDb = new                                     NAME: kimura
                       EmbeddedGraphDatabase("/tmp/neo4j");
        Transaction tx = graphDb.beginTx();
        try {
            Node firstNode = graphDb.createNode();
                                                              ID:       3
            firstNode.setProperty("Name", "Kimura");          Relation: Like
            Node secondNode = graphDb.createNode();
            secondNode.setProperty("Name", "Kato");
            firstNode.createRelationshipTo(secondNode,
                 DynamicRelationshipType.withName("LIKE"));
            tx.success();
        } finally {                                                            ID:   2
            tx.finish();                                                       NAME: Kato
        }
        graphDb.shutdown();
    }
}
Batch Insert
    • Non thread safe, non transaction
    • But very fast!
public final class Batch {
    public static void main(final String[] args) {
        BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j",
                BatchInserterImpl.loadProperties("/tmp/neo4j.props"));
        Map<String, Object> prop = new HashMap<String, Object>();
        prop.put("Name", "Kimura");
        prop.put("Age", 21);
        long node1 = inserter.createNode(prop);

        prop.put("Name", "Kato");
        prop.put("Age", 21);
        long node2 = inserter.createNode(prop);
        inserter.createRelationship(node1, node2,
                DynamicRelationshipType.withName("LIKE"), null);
        inserter.shutdown();
    }
}
Traversal sample
    • You can specify the traverse criteria
public static void main(final String[] args) {
        GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
        Node node = graphDB.getNodeById(1);
        Traverser friends = node.traverse(

          Order.DEPTH_FIRST,

          StopEvaluator.END_OF_GRAPH,

          ReturnableEvaluator.ALL_BUT_START_NODE,

          DynamicRelationshipType.withName("LIKE"),

          Direction.OUTGOING);
        for (Node nodeBuf : friends) {
            TraversalPosition currentPosition = friends.currentPosition();
        }
    }
Traversal sample
    • You can specify the traverse criteria
public static void main(final String[] args) {
        GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
        Node node = graphDB.getNodeById(1);
        Traverser friends = node.traverse(
          //how to traversal
          Order.DEPTH_FIRST,   BREADTH_FIRST


          StopEvaluator.END_OF_GRAPH,

          ReturnableEvaluator.ALL_BUT_START_NODE,

          DynamicRelationshipType.withName("LIKE"),

          Direction.OUTGOING);
        for (Node nodeBuf : friends) {
            TraversalPosition currentPosition = friends.currentPosition();
        }
    }
Traversal sample
    • You can specify the traverse criteria
public static void main(final String[] args) {
        GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
        Node node = graphDB.getNodeById(1);
        Traverser friends = node.traverse(
          //how to traversal
          Order.DEPTH_FIRST,   BREADTH_FIRST
          //traversal termination condition
          StopEvaluator.END_OF_GRAPH,   DEPTH_ONE

          ReturnableEvaluator.ALL_BUT_START_NODE,

          DynamicRelationshipType.withName("LIKE"),

          Direction.OUTGOING);
        for (Node nodeBuf : friends) {
            TraversalPosition currentPosition = friends.currentPosition();
        }
    }
Traversal sample
    • You can specify the traverse criteria
public static void main(final String[] args) {
        GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
        Node node = graphDB.getNodeById(1);
        Traverser friends = node.traverse(
          //how to traversal
          Order.DEPTH_FIRST,    BREADTH_FIRST
          //traversal termination condition
          StopEvaluator.END_OF_GRAPH,    DEPTH_ONE
          // to get the type of node
          ReturnableEvaluator.ALL_BUT_START_NODE,     ALL, isReturnableNode()

          DynamicRelationshipType.withName("LIKE"),

          Direction.OUTGOING);
        for (Node nodeBuf : friends) {
            TraversalPosition currentPosition = friends.currentPosition();
        }
    }
Traversal sample
    • You can specify the traverse criteria
public static void main(final String[] args) {
        GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
        Node node = graphDB.getNodeById(1);
        Traverser friends = node.traverse(
          //how to traversal
          Order.DEPTH_FIRST,    BREADTH_FIRST
          //traversal termination condition
          StopEvaluator.END_OF_GRAPH,     DEPTH_ONE
          // to get the type of node
          ReturnableEvaluator.ALL_BUT_START_NODE,     ALL, isReturnableNode()
          // type of relational for traverse
          DynamicRelationshipType.withName("LIKE"),

          Direction.OUTGOING);
        for (Node nodeBuf : friends) {
            TraversalPosition currentPosition = friends.currentPosition();
        }
    }
Traversal sample
    • You can specify the traverse criteria
public static void main(final String[] args) {
        GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
        Node node = graphDB.getNodeById(1);
        Traverser friends = node.traverse(
          //how to traversal
          Order.DEPTH_FIRST,    BREADTH_FIRST
          //traversal termination condition
          StopEvaluator.END_OF_GRAPH,     DEPTH_ONE
          // to get the type of node
          ReturnableEvaluator.ALL_BUT_START_NODE,     ALL, isReturnableNode()
          // type of relational for traverse
          DynamicRelationshipType.withName("LIKE"),
          // specify a edge type for traverse
          Direction.OUTGOING);      INCOMING, BOTH
        for (Node nodeBuf : friends) {
            TraversalPosition currentPosition = friends.currentPosition();
        }
    }
Traversal sample
   Order.BREADTH_FIRST
• Breadth-first search
Traversal sample
   Order.BREADTH_FIRST
• Breadth-first search
Traversal sample
   Order.BREADTH_FIRST
• Breadth-first search
Traversal sample
   Order.BREADTH_FIRST
• Breadth-first search
Traversal sample
   Order.BREADTH_FIRST
• Breadth-first search
Traversal sample
   Order.BREADTH_FIRST
• Breadth-first search
Traversal sample
     Order.DEPTH_FIRST
• Depth-first search
Traversal sample
     Order.DEPTH_FIRST
• Depth-first search
Traversal sample
     Order.DEPTH_FIRST
• Depth-first search
Traversal sample
     Order.DEPTH_FIRST
• Depth-first search
Traversal sample
     Order.DEPTH_FIRST
• Depth-first search
Traversal sample
     Order.DEPTH_FIRST
• Depth-first search
Neoclipse sample




       http://wiki.neo4j.org/content/Neoclipse
experiment
experiment
•   Store the mixi’s social graph for Neo4j

•   Condition

    •   Machine: 24 core CPU, Memory 65GB

    •   Neo4j: BatchInsert, community, embedded

•   Data

    •   # of node 15 million # of edge    600 million
experiment
•   Store the mixi’s social graph for Neo4j

•   Condition

    •   Machine: 24 core CPU, Memory 65GB

    •   Neo4j: BatchInsert, community, embedded

•   Data

    •   # of node 15 million # of edge    600 million


process time          513m17sec (about 8.6h)
Network Dataset
•   Stanford Large Network Dataset Collection

    •    SNAP has a Wide variety of graph data!
             Social Networks             Communication networks

            Citation networks             Collaboration networks

               Web graphs             Product co-purchasing networks

     Internet peer-to-peer networks           Road networks

        Autonomous systems graphs            Signed networks

    Wikipedia networks and metadata      Memetracker and Twitter


                            http://snap.stanford.edu/data/index.html
Introduction to Analysis
        Sample
Architecture

   Service
                  Database   Analysis   Visualization
(Social Graph)
Architecture

   Service
                  Database   Analysis   Visualization
(Social Graph)
Introduction Analyses
          Sample


• Centrality
• Clustering coefficient
Centrality
• Centrality
 • to measure the importance of eahc nodes
Centrality
• Centrality
 • to measure the importance of eahc nodes
Centrality
• Centrality
 • to measure the importance of eahc nodes
Centrality
       • Centrality
        • to measure the importance of eahc nodes
closeness centrality
Centrality
       • Centrality
        • to measure the importance of eahc nodes
closeness centrality    Pagerank
Centrality
       • Centrality
        • to measure the importance of eahc nodes
closeness centrality    Pagerank

 degree centrality
Centrality
       • Centrality
        • to measure the importance of eahc nodes
closeness centrality       Pagerank

 degree centrality     betweenness centrality
Centrality
        • Centrality
         • to measure the importance of eahc nodes
closeness centrality         Pagerank

 degree centrality       betweenness centrality


eigenvector centrality
Centrality
        • Centrality
         • to measure the importance of eahc nodes
closeness centrality         Pagerank

 degree centrality       betweenness centrality


eigenvector centrality    centraization
Centrality
            • Centrality
             • to measure the importance of eahc nodes
   closeness centrality        Pagerank

degree centralitybetweenness centrality
    eigenvector centrality   centraization
Centrality
            • Centrality
             • to measure the importance of eahc nodes
   closeness centrality        Pagerank

degree centralitybetweenness centrality
    eigenvector centrality   centraization
Degree centrality
•   The simplest measuring.

    •   Counting the number of edge of each nodes.

    •     num of friends
Degree centrality
•   The simplest measuring.

    •   Counting the number of edge of each nodes.

    •     num of friends


                  1             1



                                 1
Degree centrality
•   The simplest measuring.

    •   Counting the number of edge of each nodes.

    •     num of friends

                           2
                  1                1


                      2
                                   1
                               2
Degree centrality
•   The simplest measuring.

    •   Counting the number of edge of each nodes.

    •     num of friends

                           2
                  1                1


                      2
                                   1
                               2
Degree centrality
•   The simplest measuring.

    •   Counting the number of edge of each nodes.

    •     num of friends

                           2
                  1                1
                               5
                      2
                                   1
                               2
Degree centrality
•   The simplest measuring.

    •   Counting the number of edge of each nodes.

    •     num of friends

                           2
                  1                1
                               5
                      2
                                   1
                               2
Degree distribution of mixi


 •     Random sampling the 1000 users

 •     the summary of degree sistribution


Min       1st Que. Median      Mean     3rd Que.     Max

1.00        3.00     10.00     25.69        30.00   903.00
Degree distribution of mixi
Clustering coefficient

•   Network destiny around any node.

    •   ≒ destiny relationship
Clustering coefficient

•   Network destiny around any node.

    •   ≒ destiny relationship
                                 clustering coefficient
                                 0 / 3 = 0 (min)
Clustering coefficient

•   Network destiny around any node.

    •   ≒ destiny relationship
                                 clustering coefficient
                                 0 / 3 = 0 (min)

                                 clustering coefficient
                                 =1/3
Clustering coefficient

•   Network destiny around any node.

    •   ≒ destiny relationship
                                 clustering coefficient
                                 0 / 3 = 0 (min)

                                 clustering coefficient
                                 =1/3

                                 clustering coefficient
                                 =2/3
Clustering coefficient

•   Network destiny around any node.

    •   ≒ destiny relationship
                                 clustering coefficient
                                 0 / 3 = 0 (min)

                                 clustering coefficient
                                 =1/3

                                 clustering coefficient
                                 =2/3

                                 clustering coefficient
                                 = 3 / 3 = 1 (max)
Clustering coefficient


 •     Random sampling the 1000 users

 •     summary for Clustering coefficient


Min       1st Que. Median      Mean     3rd Que.    Max

0.00        0.00    0.1157    0.2071       0.2667   1.000
Clustering coefficient
Clustering coefficient
the sample of low Clustering
           coefficient user
•   degree 25, clustering coefficient   0.08
the sample of middle
      Clustering coefficient user
•   degree 14,   clustering coefficient   0.17
the sample of high Clustering
           coefficient user
•   degree 10,   clustering coefficient   0.68
the sample of MAX Clustering
           coefficient user
•   degree 4,   clustering coefficient   1
Visualization Sample
•   Visualize a my social graph on mixi

•   Weighting the Edge

    •   Amount of communication(color, thickness)

•   Weighting the Vertex

    •   cluster coefficient(color, thickness)

•   visualization tool Gephi

                          http://gephi.org/
•   Motivation for Social Graph mining

•   Overview for GraphDB

•   Introduction for Neo4j

•   The samples for graph analysis with R

•   Introduction Visualization tool Gephi
Thanks!

Mais conteúdo relacionado

Destaque

Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Matthew Russell
 

Destaque (19)

Nonprofits and the Social Graph
Nonprofits and the Social GraphNonprofits and the Social Graph
Nonprofits and the Social Graph
 
What's The Social Graph Got To Do With It?
What's The Social Graph Got To Do With It?What's The Social Graph Got To Do With It?
What's The Social Graph Got To Do With It?
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two Parts
 
Leverage the power of the social graph
Leverage the power of the social graphLeverage the power of the social graph
Leverage the power of the social graph
 
Git for the absolute beginners
Git for the absolute beginnersGit for the absolute beginners
Git for the absolute beginners
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Python
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projects
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Data Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network AnalysisData Mining Seminar - Graph Mining and Social Network Analysis
Data Mining Seminar - Graph Mining and Social Network Analysis
 
How to Leverage the Social Graph with Facebook Platform
How to Leverage the Social Graph with Facebook PlatformHow to Leverage the Social Graph with Facebook Platform
How to Leverage the Social Graph with Facebook Platform
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Starbucks 15 6 07
Starbucks 15 6 07Starbucks 15 6 07
Starbucks 15 6 07
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Mining the social graph

  • 1. Mining the Social Graph mixi.inc Shunya Kimura
  • 2. Introduction • Name: Shunya Kimura • twitter: @kimuras • Job:Data mining, Software engineering • text mining, graph mining, search engine
  • 3. Agenda • Introduction • The past work • Introduction to GraphDB • Introduction to Neo4j • Introduction to analysis sample
  • 5. Motivation for social graph analysis Test of millions of nodes, hundreds of millions of edges. The diversity of graph algorithm by developing distributed processing technology. Challenging.
  • 6. Number of users on mixi 30000000 ID 22500000 # of member id 15000000 7500000 0 2007 2008 2009 2010 2011 year
  • 7. What is Social Graph?
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 20. Approach for SG analysis Feed Back
  • 21. Approach for SG analysis Feed Back
  • 22. Approach for SG analysis Feed Back
  • 23. Approach for SG analysis Feed Back
  • 25.
  • 27. • Friend recommend • Community recommend
  • 28. Relational Databases from_id to_id id name age 1 2 1 Kimura 18 1 3 2 kato 45 2 3 3 ito 21
  • 29. Relational Databases Dump & Denormalization from_id to_id id name age 1 2 1 Kimura 18 1 3 2 kato 45 2 3 3 ito 21
  • 30. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
  • 31. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
  • 32. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimura,18 Prof:2 Kato,45
  • 33. Relational Databases Dump & Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
  • 34. Relational Databases Dump & reimplementation Denormalization from_id to_id id name age Key value 1 2 1 Kimura 18 From:1 2,3 1 3 2 kato 45 From:2 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
  • 35. Relational Databases Dump & reimplementation Denormalization from_id to_id id name age Key value 1 1 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45
  • 36. Relational Databases Dump & reimplementation Denormalization from_id to_id id name age Key value 1 1 2 3 maintenance cost 1 2 Kimura kato 18 45 From:1 From:2 2,3 3 2 3 3 ito 21 Prof:1 Kimuras,18 Prof:2 Kato,45 scalability
  • 39. What is graph Vertex (node)
  • 40. What is graph Vertex (node) Edge
  • 41. What is graph Vertex (node) Undirected graph Edge
  • 42. What is graph Vertex (node) Edge
  • 43. What is graph Vertex (node) Edge
  • 44. What is graph Vertex (node) Edge
  • 45. What is graph Vertex (node) Directed graph Edge
  • 46. What is GraphDB Vertex (node) Edge
  • 47. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge
  • 48. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
  • 49. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
  • 50. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge ID: 2 NAME: ITO PROP: Female AGE: 21
  • 51. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge ID: 2 ID: 3 NAME: ITO LABEL: Like PROP: Female Since: 2011/08/06 AGE: 21 OutGoing: 2
  • 52. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge ID: 2 ID: 3 NAME: ITO LABEL: Like PROP: Female Since: 2011/08/06 AGE: 21 OutGoing: 2
  • 53. What is GraphDB ID: 1 Vertex (node) NAME: kimura PROP: Male AGE: 18 Edge ID: 2 ID: 3 NAME: ITO LABEL: Like PROP: Female Since: 2011/08/06 AGE: 21 OutGoing: 2
  • 54. The implementations for GraphDB http://en.wikipedia.org/wiki/GraphDB
  • 56. GraphDB Neo4j • True ACID transactions • High availability • Scales to billions of nods and relationships • High speed querying through traversals Single instance(GPLv3) Multiple instance(AGPLv3) Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabase Standalone Neo4j Server Neo4j Server high availability mode http://neo4j.org/
  • 57. Other my favorite features for Neo4j http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  • 58. Other my favorite features for Neo4j • RESTful APIs http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  • 59. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  • 60. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) • Full indexing – lucene http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  • 61. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) • Full indexing – lucene • Implemented graph algorithm – A*, Dijkstra – High speed traverse http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  • 62. Other my favorite features for Neo4j • RESTful APIs • Query Language(Cypher) • Full indexing – lucene • Implemented graph algorithm – A*, Dijkstra – High speed traverse • Gremlin supported – Like a query language http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
  • 63. Introduction simple Neo4j usecase Single node Multi node Embedded Server
  • 64. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Server
  • 65. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Server
  • 66. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Analyses system Server
  • 67. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Analyses system Analyses system Server
  • 68. Introduction simple Neo4j usecase Single node Multi node Embedded Analyses system Analyses system Analyses system Analyses system Server
  • 69. Introduction simple Neo4j usecase Single node Multi node Analyses system Embedded Analyses system Analyses system Analyses system Server
  • 70. Introduction simple Neo4j usecase Single node Multi node Analyses system Embedded Analyses system Analyses system Analyses system Server
  • 71. Introduction to simple embedded Neo4j • Insert Vertices & make Relationships • Single node & Embedded • Traversal sample
  • 72. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }
  • 73. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }
  • 74. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }
  • 75. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { tx.finish(); } graphDb.shutdown(); } }
  • 76. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); } }
  • 77. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); firstNode.setProperty("Name", "Kimura"); Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); } }
  • 78. Insert vertices, make relationship public final class InputVertex { public static void main(final String[] args) { ID: 1 GraphDatabaseService graphDb = new NAME: kimura EmbeddedGraphDatabase("/tmp/neo4j"); Transaction tx = graphDb.beginTx(); try { Node firstNode = graphDb.createNode(); ID: 3 firstNode.setProperty("Name", "Kimura"); Relation: Like Node secondNode = graphDb.createNode(); secondNode.setProperty("Name", "Kato"); firstNode.createRelationshipTo(secondNode, DynamicRelationshipType.withName("LIKE")); tx.success(); } finally { ID: 2 tx.finish(); NAME: Kato } graphDb.shutdown(); } }
  • 79. Batch Insert • Non thread safe, non transaction • But very fast! public final class Batch { public static void main(final String[] args) { BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j", BatchInserterImpl.loadProperties("/tmp/neo4j.props")); Map<String, Object> prop = new HashMap<String, Object>(); prop.put("Name", "Kimura"); prop.put("Age", 21); long node1 = inserter.createNode(prop); prop.put("Name", "Kato"); prop.put("Age", 21); long node2 = inserter.createNode(prop); inserter.createRelationship(node1, node2, DynamicRelationshipType.withName("LIKE"), null); inserter.shutdown(); } }
  • 80. Traversal sample • You can specify the traverse criteria public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( Order.DEPTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  • 81. Traversal sample • You can specify the traverse criteria public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  • 82. Traversal sample • You can specify the traverse criteria public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE ReturnableEvaluator.ALL_BUT_START_NODE, DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  • 83. Traversal sample • You can specify the traverse criteria public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  • 84. Traversal sample • You can specify the traverse criteria public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // type of relational for traverse DynamicRelationshipType.withName("LIKE"), Direction.OUTGOING); for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  • 85. Traversal sample • You can specify the traverse criteria public static void main(final String[] args) { GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]); Node node = graphDB.getNodeById(1); Traverser friends = node.traverse( //how to traversal Order.DEPTH_FIRST, BREADTH_FIRST //traversal termination condition StopEvaluator.END_OF_GRAPH, DEPTH_ONE // to get the type of node ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode() // type of relational for traverse DynamicRelationshipType.withName("LIKE"), // specify a edge type for traverse Direction.OUTGOING); INCOMING, BOTH for (Node nodeBuf : friends) { TraversalPosition currentPosition = friends.currentPosition(); } }
  • 86. Traversal sample Order.BREADTH_FIRST • Breadth-first search
  • 87. Traversal sample Order.BREADTH_FIRST • Breadth-first search
  • 88. Traversal sample Order.BREADTH_FIRST • Breadth-first search
  • 89. Traversal sample Order.BREADTH_FIRST • Breadth-first search
  • 90. Traversal sample Order.BREADTH_FIRST • Breadth-first search
  • 91. Traversal sample Order.BREADTH_FIRST • Breadth-first search
  • 92. Traversal sample Order.DEPTH_FIRST • Depth-first search
  • 93. Traversal sample Order.DEPTH_FIRST • Depth-first search
  • 94. Traversal sample Order.DEPTH_FIRST • Depth-first search
  • 95. Traversal sample Order.DEPTH_FIRST • Depth-first search
  • 96. Traversal sample Order.DEPTH_FIRST • Depth-first search
  • 97. Traversal sample Order.DEPTH_FIRST • Depth-first search
  • 98. Neoclipse sample http://wiki.neo4j.org/content/Neoclipse
  • 100. experiment • Store the mixi’s social graph for Neo4j • Condition • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded • Data • # of node 15 million # of edge 600 million
  • 101. experiment • Store the mixi’s social graph for Neo4j • Condition • Machine: 24 core CPU, Memory 65GB • Neo4j: BatchInsert, community, embedded • Data • # of node 15 million # of edge 600 million process time 513m17sec (about 8.6h)
  • 102. Network Dataset • Stanford Large Network Dataset Collection • SNAP has a Wide variety of graph data! Social Networks Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Wikipedia networks and metadata Memetracker and Twitter http://snap.stanford.edu/data/index.html
  • 104. Architecture Service Database Analysis Visualization (Social Graph)
  • 105. Architecture Service Database Analysis Visualization (Social Graph)
  • 106. Introduction Analyses Sample • Centrality • Clustering coefficient
  • 107. Centrality • Centrality • to measure the importance of eahc nodes
  • 108. Centrality • Centrality • to measure the importance of eahc nodes
  • 109. Centrality • Centrality • to measure the importance of eahc nodes
  • 110. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality
  • 111. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank
  • 112. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank degree centrality
  • 113. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank degree centrality betweenness centrality
  • 114. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank degree centrality betweenness centrality eigenvector centrality
  • 115. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank degree centrality betweenness centrality eigenvector centrality centraization
  • 116. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank degree centralitybetweenness centrality eigenvector centrality centraization
  • 117. Centrality • Centrality • to measure the importance of eahc nodes closeness centrality Pagerank degree centralitybetweenness centrality eigenvector centrality centraization
  • 118. Degree centrality • The simplest measuring. • Counting the number of edge of each nodes. • num of friends
  • 119. Degree centrality • The simplest measuring. • Counting the number of edge of each nodes. • num of friends 1 1 1
  • 120. Degree centrality • The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 2 1 2
  • 121. Degree centrality • The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 2 1 2
  • 122. Degree centrality • The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 5 2 1 2
  • 123. Degree centrality • The simplest measuring. • Counting the number of edge of each nodes. • num of friends 2 1 1 5 2 1 2
  • 124. Degree distribution of mixi • Random sampling the 1000 users • the summary of degree sistribution Min 1st Que. Median Mean 3rd Que. Max 1.00 3.00 10.00 25.69 30.00 903.00
  • 126. Clustering coefficient • Network destiny around any node. • ≒ destiny relationship
  • 127. Clustering coefficient • Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min)
  • 128. Clustering coefficient • Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3
  • 129. Clustering coefficient • Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3 clustering coefficient =2/3
  • 130. Clustering coefficient • Network destiny around any node. • ≒ destiny relationship clustering coefficient 0 / 3 = 0 (min) clustering coefficient =1/3 clustering coefficient =2/3 clustering coefficient = 3 / 3 = 1 (max)
  • 131. Clustering coefficient • Random sampling the 1000 users • summary for Clustering coefficient Min 1st Que. Median Mean 3rd Que. Max 0.00 0.00 0.1157 0.2071 0.2667 1.000
  • 134. the sample of low Clustering coefficient user • degree 25, clustering coefficient 0.08
  • 135. the sample of middle Clustering coefficient user • degree 14, clustering coefficient 0.17
  • 136. the sample of high Clustering coefficient user • degree 10, clustering coefficient 0.68
  • 137. the sample of MAX Clustering coefficient user • degree 4, clustering coefficient 1
  • 139. Visualize a my social graph on mixi • Weighting the Edge • Amount of communication(color, thickness) • Weighting the Vertex • cluster coefficient(color, thickness) • visualization tool Gephi http://gephi.org/
  • 140.
  • 141. Motivation for Social Graph mining • Overview for GraphDB • Introduction for Neo4j • The samples for graph analysis with R • Introduction Visualization tool Gephi

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  46. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  47. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  48. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  49. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  50. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  51. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  52. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  53. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  54. &amp;#x30FB;TC&amp;#x3082;mysql&amp;#x3082;&amp;#x73FE;&amp;#x5F79;&amp;#x3060;&amp;#x3057;&amp;#x3001;&amp;#x5927;&amp;#x597D;&amp;#x304D;\n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n
  124. \n
  125. \n
  126. \n
  127. \n
  128. \n
  129. \n
  130. \n
  131. \n
  132. \n
  133. \n
  134. \n
  135. \n
  136. \n
  137. \n
  138. \n
  139. \n
  140. \n
  141. \n
  142. \n
  143. \n
  144. \n
  145. \n
  146. \n
  147. \n
  148. \n
  149. \n
  150. \n
  151. \n
  152. \n
  153. \n
  154. \n
  155. \n
  156. \n
  157. \n
  158. \n
  159. \n
  160. \n
  161. \n
  162. \n
  163. \n
  164. \n
  165. \n
  166. \n
  167. \n
  168. \n
  169. \n
  170. \n
  171. \n
  172. \n
  173. \n
  174. \n
  175. \n
  176. \n
  177. \n
  178. \n
  179. \n
  180. \n
  181. \n
  182. \n
  183. \n
  184. \n
  185. \n