SlideShare uma empresa Scribd logo
1 de 179
Graph distance distribution
                  for social network mining


                      Paolo Boldi, Marco Rosa, Sebastiano Vigna
                            Laboratory for Web Algorithmics
                          Università degli Studi di Milano, Italy

                            Lars Backstrom, Johan Ugander
                                       Facebook


Friday, September 7, 2012
Plan of the talk




Friday, September 7, 2012
Plan of the talk


               • Computing distances in large graphs (using
                   HyperANF)




Friday, September 7, 2012
Plan of the talk


               • Computing distances in large graphs (using
                   HyperANF)

               • Running HyperANF on Facebook (the largest
                   Milgram-like experiment ever performed)




Friday, September 7, 2012
Plan of the talk


               • Computing distances in large graphs (using
                   HyperANF)

               • Running HyperANF on Facebook (the largest
                   Milgram-like experiment ever performed)

               • Other uses of distances (in particular: robustness)



Friday, September 7, 2012
Prelude
                            Milgram’s experiment is 45




Friday, September 7, 2012
Where it all started...




Friday, September 7, 2012
Where it all started...


               • M. Kochen, I. de Sola Pool: Contacts and influences.
                   (Manuscript, early 50s)




Friday, September 7, 2012
Where it all started...


               • M. Kochen, I. de Sola Pool: Contacts and influences.
                   (Manuscript, early 50s)

               • A. Rapoport, W.J. Horvath: A study of a large
                   sociogram. (Behav.Sci. 1961)




Friday, September 7, 2012
Where it all started...


               • M. Kochen, I. de Sola Pool: Contacts and influences.
                   (Manuscript, early 50s)

               • A. Rapoport, W.J. Horvath: A study of a large
                   sociogram. (Behav.Sci. 1961)

               • S. Milgram, An experimental study of the sma$ world
                    problem. (Sociometry, 1969)




Friday, September 7, 2012
Milgram’s experiment




Friday, September 7, 2012
Milgram’s experiment

               • 300 people (starting population) are asked to
                   dispatch a parcel to a single individual (target)




Friday, September 7, 2012
Milgram’s experiment

               • 300 people (starting population) are asked to
                   dispatch a parcel to a single individual (target)

               • The target was a Boston stockbroker




Friday, September 7, 2012
Milgram’s experiment

               • 300 people (starting population) are asked to
                   dispatch a parcel to a single individual (target)

               • The target was a Boston stockbroker
               • The starting population is selected as follows:




Friday, September 7, 2012
Milgram’s experiment

               • 300 people (starting population) are asked to
                   dispatch a parcel to a single individual (target)

               • The target was a Boston stockbroker
               • The starting population is selected as follows:
                     • 100 were random Boston inhabitants (group A)




Friday, September 7, 2012
Milgram’s experiment

               • 300 people (starting population) are asked to
                   dispatch a parcel to a single individual (target)

               • The target was a Boston stockbroker
               • The starting population is selected as follows:
                     • 100 were random Boston inhabitants (group A)
                     • 100 were random Nebraska strockbrokers (group B)




Friday, September 7, 2012
Milgram’s experiment

               • 300 people (starting population) are asked to
                   dispatch a parcel to a single individual (target)

               • The target was a Boston stockbroker
               • The starting population is selected as follows:
                     • 100 were random Boston inhabitants (group A)
                     • 100 were random Nebraska strockbrokers (group B)
                     • 100 were random Nebraska inhabitants (group C)


Friday, September 7, 2012
Milgram’s experiment




Friday, September 7, 2012
Milgram’s experiment


               • Rules of the game:




Friday, September 7, 2012
Milgram’s experiment


               • Rules of the game:
                     • parcels could be directly sent only to someone
                            the sender knows personally




Friday, September 7, 2012
Milgram’s experiment


               • Rules of the game:
                     • parcels could be directly sent only to someone
                            the sender knows personally

                     • 453 intermediaries happened to be involved in
                            the experiments (besides the starting
                            population and the target)




Friday, September 7, 2012
Milgram’s experiment




Friday, September 7, 2012
Milgram’s experiment

               • Questions Milgram wanted to answer:




Friday, September 7, 2012
Milgram’s experiment

               • Questions Milgram wanted to answer:
                     • How many parcels will reach the target?




Friday, September 7, 2012
Milgram’s experiment

               • Questions Milgram wanted to answer:
                     • How many parcels will reach the target?
                     • What is the distribution of the number of hops
                            required to reach the target?




Friday, September 7, 2012
Milgram’s experiment

               • Questions Milgram wanted to answer:
                     • How many parcels will reach the target?
                     • What is the distribution of the number of hops
                            required to reach the target?

                     • Is this distribution different for the three
                            starting subpopulations?



Friday, September 7, 2012
Milgram’s experiment




Friday, September 7, 2012
Milgram’s experiment

               • Answers:




Friday, September 7, 2012
Milgram’s experiment

               • Answers:
                     • How many parcels will reach the target? 29%




Friday, September 7, 2012
Milgram’s experiment

               • Answers:
                     • How many parcels will reach the target? 29%
                     • What is the distribution of the number of hops
                            required to reach the target? Avg. was 5.2




Friday, September 7, 2012
Milgram’s experiment

               • Answers:
                     • How many parcels will reach the target? 29%
                     • What is the distribution of the number of hops
                            required to reach the target? Avg. was 5.2

                     • Is this distribution different for the three starting
                            subpopulations? Yes: avg. for groups A/B/C was
                            4.6/5.4/5.7, respectively




Friday, September 7, 2012
Chain lengths




Friday, September 7, 2012
Milgram’s popularity




Friday, September 7, 2012
Milgram’s popularity
               • Six degrees of separation slipped away from the
                   scientific niche to enter the world of popular
                   immagination:




Friday, September 7, 2012
Milgram’s popularity
               • Six degrees of separation slipped away from the
                   scientific niche to enter the world of popular
                   immagination:

                     • “Six degrees of separation” is a play by John
                            Guare...




Friday, September 7, 2012
Milgram’s popularity
               • Six degrees of separation slipped away from the
                   scientific niche to enter the world of popular
                   immagination:

                     • “Six degrees of separation” is a play by John
                            Guare...

                     • ...a movie by Fred Schepisi...




Friday, September 7, 2012
Milgram’s popularity
               • Six degrees of separation slipped away from the
                   scientific niche to enter the world of popular
                   immagination:

                     • “Six degrees of separation” is a play by John
                            Guare...

                     • ...a movie by Fred Schepisi...
                     • ...a song sung by dolls in their national costume
                            at Disneyland in a heart-warming exhibition
                            celebrating the connectedness of people all

Friday, September 7, 2012
Milgram’s criticisms




Friday, September 7, 2012
Milgram’s criticisms


               • “Could it be a big world after all? (The six-
                   degrees-of-separation myth)” (Judith S. Kleinfeld,
                   2002)




Friday, September 7, 2012
Milgram’s criticisms


               • “Could it be a big world after all? (The six-
                   degrees-of-separation myth)” (Judith S. Kleinfeld,
                   2002)

                     • The vast majority of chains were never
                            completed




Friday, September 7, 2012
Milgram’s criticisms


               • “Could it be a big world after all? (The six-
                   degrees-of-separation myth)” (Judith S. Kleinfeld,
                   2002)

                     • The vast majority of chains were never
                            completed

                     • Extremely difficult to reproduce



Friday, September 7, 2012
Measuring what?




Friday, September 7, 2012
Measuring what?


               • But what did Milgram’s experiment reveal, after
                   all?




Friday, September 7, 2012
Measuring what?


               • But what did Milgram’s experiment reveal, after
                   all?

                     i) That the world is small




Friday, September 7, 2012
Measuring what?


               • But what did Milgram’s experiment reveal, after
                   all?

                     i) That the world is small

                     ii)That people are able to exploit this smallness




Friday, September 7, 2012
HyperANF
              A tool to compute distances in large graphs




Friday, September 7, 2012
Introduction




Friday, September 7, 2012
Introduction


             • You want to study the properties of a huge graph
                 (typically: a social network)




Friday, September 7, 2012
Introduction


             • You want to study the properties of a huge graph
                 (typically: a social network)

             • You want to obtain some information about its global
                 structure (not simply triangle-counting/degree
                 distribution/etc.)




Friday, September 7, 2012
Introduction


             • You want to study the properties of a huge graph
                 (typically: a social network)

             • You want to obtain some information about its global
                 structure (not simply triangle-counting/degree
                 distribution/etc.)

             • A natural candidate: distance distribution


Friday, September 7, 2012
Graph distances and
                               distribution




Friday, September 7, 2012
Graph distances and
                               distribution
            • Given a graph, d(x,y) is the length of the shortest
                 path from x to y (∞ if one cannot go from x to y)




Friday, September 7, 2012
Graph distances and
                               distribution
            • Given a graph, d(x,y) is the length of the shortest
                 path from x to y (∞ if one cannot go from x to y)

            • For undirected graphs, d(x,y)=d(y,x)




Friday, September 7, 2012
Graph distances and
                               distribution
            • Given a graph, d(x,y) is the length of the shortest
                 path from x to y (∞ if one cannot go from x to y)

            • For undirected graphs, d(x,y)=d(y,x)
            • For every t, count the number of pairs (x,y) such
                 that d(x,y)=t




Friday, September 7, 2012
Graph distances and
                               distribution
            • Given a graph, d(x,y) is the length of the shortest
                 path from x to y (∞ if one cannot go from x to y)

            • For undirected graphs, d(x,y)=d(y,x)
            • For every t, count the number of pairs (x,y) such
                 that d(x,y)=t

            • The fraction of pairs at distance t is (the density
                 function of) a distribution


Friday, September 7, 2012
Exact computation




Friday, September 7, 2012
Exact computation
             • How can one compute the distance distribution?




Friday, September 7, 2012
Exact computation
             • How can one compute the distance distribution?
                  • Weighted graphs: Dijkstra (single-source: O(n2)),
                      Floyd-Warshall (all-pairs: O(n3))




Friday, September 7, 2012
Exact computation
             • How can one compute the distance distribution?
                  • Weighted graphs: Dijkstra (single-source: O(n2)),
                      Floyd-Warshall (all-pairs: O(n3))

                  • In the unweighted case:




Friday, September 7, 2012
Exact computation
             • How can one compute the distance distribution?
                  • Weighted graphs: Dijkstra (single-source: O(n2)),
                      Floyd-Warshall (all-pairs: O(n3))

                  • In the unweighted case:
                        • a single BFS solves the single-source version of
                            the problem: O(m)




Friday, September 7, 2012
Exact computation
             • How can one compute the distance distribution?
                  • Weighted graphs: Dijkstra (single-source: O(n2)),
                      Floyd-Warshall (all-pairs: O(n3))

                  • In the unweighted case:
                        • a single BFS solves the single-source version of
                            the problem: O(m)

                        • if we repeat it from every source: O(nm)


Friday, September 7, 2012
Sampling pairs




Friday, September 7, 2012
Sampling pairs


               • Sample at random pairs of nodes (x,y)




Friday, September 7, 2012
Sampling pairs


               • Sample at random pairs of nodes (x,y)
               • Compute d(x,y) with a BFS from x




Friday, September 7, 2012
Sampling pairs


               • Sample at random pairs of nodes (x,y)
               • Compute d(x,y) with a BFS from x
               • (Possibly: reject the pair if d(x,y) is infinite)




Friday, September 7, 2012
Sampling pairs




Friday, September 7, 2012
Sampling pairs


               • For every t, the fraction of sampled pairs that
                   were found at distance t are an estimator of the
                   value of the probability mass function




Friday, September 7, 2012
Sampling pairs


               • For every t, the fraction of sampled pairs that
                   were found at distance t are an estimator of the
                   value of the probability mass function

               • Takes a BFS for every pair O(m)




Friday, September 7, 2012
Sampling sources




Friday, September 7, 2012
Sampling sources



               • Sample at random a source x




Friday, September 7, 2012
Sampling sources



               • Sample at random a source x
               • Compute a full BFS from x




Friday, September 7, 2012
Sampling sources




Friday, September 7, 2012
Sampling sources


               • It is an unbiased estimator only for undirected and
                   connected graphs




Friday, September 7, 2012
Sampling sources


               • It is an unbiased estimator only for undirected and
                   connected graphs

               • Uses anyway BFS...




Friday, September 7, 2012
Sampling sources


               • It is an unbiased estimator only for undirected and
                   connected graphs

               • Uses anyway BFS...
                     • ...not cache friendly




Friday, September 7, 2012
Sampling sources


               • It is an unbiased estimator only for undirected and
                   connected graphs

               • Uses anyway BFS...
                     • ...not cache friendly
                     • ...not compression friendly



Friday, September 7, 2012
Cohen’s sampling




Friday, September 7, 2012
Cohen’s sampling



               • Edith Cohen [JCSS 1997] came out with a very
                   general framework for size estimation: powerful,
                   but doesn’t scale well, it is not easily parallelizable,
                   requires direct access




Friday, September 7, 2012
Alternative: Diffusion




Friday, September 7, 2012
Alternative: Diffusion

               • Basic idea: Palmer et. al, KDD ’02




Friday, September 7, 2012
Alternative: Diffusion

               • Basic idea: Palmer et. al, KDD ’02
               • Let Bt(x) be the ball of radius t about x (the set of
                   nodes at distance ≤t from x)




Friday, September 7, 2012
Alternative: Diffusion

               • Basic idea: Palmer et. al, KDD ’02
               • Let Bt(x) be the ball of radius t about x (the set of
                   nodes at distance ≤t from x)

               • Clearly B0(x)={x}




Friday, September 7, 2012
Alternative: Diffusion

               • Basic idea: Palmer et. al, KDD ’02
               • Let Bt(x) be the ball of radius t about x (the set of
                   nodes at distance ≤t from x)

               • Clearly B0(x)={x}
               • Moreover Bt+1(x)=∪x→yBt(y)∪{x}




Friday, September 7, 2012
Alternative: Diffusion

               • Basic idea: Palmer et. al, KDD ’02
               • Let Bt(x) be the ball of radius t about x (the set of
                   nodes at distance ≤t from x)

               • Clearly B0(x)={x}
               • Moreover Bt+1(x)=∪x→yBt(y)∪{x}
               • So computing Bt+1 starting from Bt one just need a
                   single (sequential) scan of the graph


Friday, September 7, 2012
Easy but costly




Friday, September 7, 2012
Easy but costly

               • Every set requires O(n) bits, hence O(n2) bits
                   overall




Friday, September 7, 2012
Easy but costly

               • Every set requires O(n) bits, hence O(n2) bits
                   overall

               • Too many!




Friday, September 7, 2012
Easy but costly

               • Every set requires O(n) bits, hence O(n2) bits
                   overall

               • Too many!
               • What about using approximated sets?




Friday, September 7, 2012
Easy but costly

               • Every set requires O(n) bits, hence O(n2) bits
                   overall

               • Too many!
               • What about using approximated sets?
               • We need probabilistic counters, with just two
                   primitives: add and size?




Friday, September 7, 2012
Easy but costly

               • Every set requires O(n) bits, hence O(n2) bits
                   overall

               • Too many!
               • What about using approximated sets?
               • We need probabilistic counters, with just two
                   primitives: add and size?

               • Very small!

Friday, September 7, 2012
HyperANF




Friday, September 7, 2012
HyperANF


                  • We used HyperLogLog counters [Flajolet et al.,
                      2007]




Friday, September 7, 2012
HyperANF


                  • We used HyperLogLog counters [Flajolet et al.,
                      2007]

                  • With 40 bits you can count up to 4 billion with a
                      standard deviation of 6%




Friday, September 7, 2012
HyperANF


                  • We used HyperLogLog counters [Flajolet et al.,
                      2007]

                  • With 40 bits you can count up to 4 billion with a
                      standard deviation of 6%

                  • Remember: one set per node!



Friday, September 7, 2012
Observe that




Friday, September 7, 2012
Observe that

               • Every single counter has a guaranteed relative
                   standard deviation (depending only on the number
                   of registers per counter)




Friday, September 7, 2012
Observe that

               • Every single counter has a guaranteed relative
                   standard deviation (depending only on the number
                   of registers per counter)

               • This implies a guarantee on the summation of the
                   counters




Friday, September 7, 2012
Observe that

               • Every single counter has a guaranteed relative
                   standard deviation (depending only on the number
                   of registers per counter)

               • This implies a guarantee on the summation of the
                   counters

               • This gives in turn precision bounds on the
                   estimated distribution with respect to the real one



Friday, September 7, 2012
Other tricks




Friday, September 7, 2012
Other tricks


             • We use broadword programming to compute efficiently
                 unions




Friday, September 7, 2012
Other tricks


             • We use broadword programming to compute efficiently
                 unions

             • Systolic computation for on-demand updates of
                 counters




Friday, September 7, 2012
Other tricks


             • We use broadword programming to compute efficiently
                 unions

             • Systolic computation for on-demand updates of
                 counters

             • Exploited micropara$elization of multicore
                 architectures




Friday, September 7, 2012
Real speed?




Friday, September 7, 2012
Real speed?

            • Small dimension: 1.8min vs. 4.6h on a graph with
                 7.4M nodes




Friday, September 7, 2012
Real speed?

            • Small dimension: 1.8min vs. 4.6h on a graph with
                 7.4M nodes

            • Large dimension: HADI [Kang et al., 2010] is a
                 Hadoop-conscious implementation of ANF. Takes 30
                 minutes on a 200K-node graph (on one of the 50
                 world largest supercomputers). HyperANF does the
                 same in 2.25min on our workstation (20 min on this
                 laptop).



Friday, September 7, 2012
Try it!




Friday, September 7, 2012
Try it!


               • HyperANF is available within the webgraph
                   package




Friday, September 7, 2012
Try it!


               • HyperANF is available within the webgraph
                   package

               • Download it from




Friday, September 7, 2012
Try it!


               • HyperANF is available within the webgraph
                   package

               • Download it from
                            • http://webgraph.law.dsi.unimi.it/




Friday, September 7, 2012
Try it!


               • HyperANF is available within the webgraph
                   package

               • Download it from
                            • http://webgraph.law.dsi.unimi.it/
               • Or google for webgraph



Friday, September 7, 2012
Running it on Facebook!
                      [with Lars Backstrom and Johan Ugander]




Friday, September 7, 2012
Facebook




Friday, September 7, 2012
Facebook


               • Facebook opened up to non-college students on
                   September 26, 2006




Friday, September 7, 2012
Facebook


               • Facebook opened up to non-college students on
                   September 26, 2006

               • So, between 1 Jan 2007 and 1 Jan 2008 the number
                   of users exploded




Friday, September 7, 2012
Experiments (time)

               • We ran our experiments on snapshots of facebook
                            • Jan 1, 2007
                            • Jan 1, 2008 ...
                            • Jan 1, 2011
                            • [current] May, 2011


Friday, September 7, 2012
Experiments (dataset)

               • We considered:
                            • -: the whole facebook
                            • it / se: only Italian / Swedish users
                            • it+se: only Italian & Swedish users
                            • us: only US users
               • Based on users’ current geo-IP location

Friday, September 7, 2012
Active users


               • We only considered active users (users who have
                   done some activity in the 28 days preceding 9 Jun
                   2011)

               • So we are not considering “old” users that are not
                   active any more

               • For - [current] we have about 750M nodes



Friday, September 7, 2012
Distance distribution ($)




Friday, September 7, 2012
Distance distribution ($)
                                                                                fb 2007




                                      0.7
                                      0.6
                                      0.5



                                                            ●
                                      0.4
                            % pairs




                                                                ●
                                      0.3
                                      0.2
                                      0.1




                                                        ●           ●



                                                                        ●
                                      0.0




                                            ●   ●   ●                       ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution ($)
                                                                                fb 2008




                                      0.7
                                      0.6
                                      0.5




                                                                ●
                                      0.4
                            % pairs

                                      0.3




                                                                    ●
                                      0.2




                                                            ●
                                      0.1




                                                                        ●



                                                        ●                   ●
                                      0.0




                                            ●   ●   ●                            ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution ($)
                                                                                fb 2009




                                      0.7
                                      0.6
                                      0.5


                                                                ●
                                      0.4
                            % pairs

                                      0.3




                                                                    ●
                                      0.2




                                                            ●
                                      0.1




                                                                        ●
                                      0.0




                                            ●   ●   ●   ●                   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution ($)
                                                                                fb 2010




                                      0.7
                                      0.6
                                      0.5
                                      0.4                       ●
                            % pairs

                                      0.3




                                                                    ●
                                      0.2




                                                            ●
                                      0.1




                                                                        ●
                                      0.0




                                            ●   ●   ●   ●                   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution ($)
                                                                                fb 2011




                                      0.7
                                      0.6
                                      0.5
                                      0.4                       ●
                            % pairs

                                      0.3




                                                            ●
                                      0.2




                                                                    ●
                                      0.1




                                                        ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution ($)
                                                                            fb current




                                      0.7
                                      0.6
                                                                ●
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                            ●
                                      0.2




                                                                    ●
                                      0.1




                                                        ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (it)




Friday, September 7, 2012
Distance distribution (it)
                                                                                it 2007




                                      0.7
                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3
                                      0.2
                                      0.1




                                                                    ●   ●   ●
                                                                                 ●
                                                                ●                    ●
                                                                                           ●
                                                                                               ●   ●
                                                            ●                                          ●   ●    ●   ●   ●
                                            ●   ●       ●                                                                   ●
                                      0.0




                                                    ●




                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (it)
                                                                                it 2008




                                      0.7
                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                                    ●

                                                                ●
                                      0.2




                                                                        ●
                                      0.1




                                                            ●               ●


                                                                                 ●
                                                        ●                            ●
                                                                                           ●
                                      0.0




                                            ●   ●   ●                                          ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (it)
                                                                                it 2009




                                      0.7
                                      0.6
                                      0.5




                                                                ●
                                      0.4




                                                            ●
                            % pairs

                                      0.3
                                      0.2
                                      0.1




                                                                    ●



                                                        ●
                                                                        ●
                                      0.0




                                            ●   ●   ●                       ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (it)
                                                                                it 2010




                                      0.7
                                                            ●




                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3
                                      0.2




                                                                ●



                                                        ●
                                      0.1




                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (it)
                                                                                it 2011




                                      0.7
                                                            ●




                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                        ●
                                      0.2
                                      0.1




                                                                ●
                                      0.0




                                            ●   ●   ●               ●   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (it)
                                                                            it current




                                      0.7
                                                            ●


                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                        ●
                                      0.2




                                                                ●
                                      0.1




                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)




Friday, September 7, 2012
Distance distribution (se)
                                                                                se 2007




                                      0.7
                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3
                                      0.2




                                                                ●   ●


                                                            ●           ●
                                      0.1




                                                                            ●
                                                        ●
                                                                                 ●
                                                    ●                                ●
                                                                                           ●
                                      0.0




                                            ●   ●                                              ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)
                                                                                se 2008




                                      0.7
                                      0.6

                                                            ●
                                      0.5
                                      0.4
                            % pairs




                                                                ●
                                      0.3
                                      0.2
                                      0.1




                                                        ●


                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)
                                                                                se 2009




                                      0.7
                                      0.6
                                                            ●
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                                ●
                                      0.2




                                                        ●
                                      0.1




                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)
                                                                                se 2010




                                      0.7
                                      0.6
                                                            ●
                                      0.5
                                      0.4
                            % pairs

                                      0.3
                                      0.2




                                                                ●
                                                        ●
                                      0.1




                                                                    ●
                                      0.0




                                                    ●                   ●   ●
                                            ●   ●                                ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)
                                                                                se 2011




                                      0.7
                                                            ●
                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                        ●
                                      0.2




                                                                ●
                                      0.1




                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)
                                                                                se 2011




                                      0.7
                                                            ●
                                      0.6
                                      0.5
                                      0.4
                            % pairs

                                      0.3




                                                        ●
                                      0.2




                                                                ●
                                      0.1




                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Distance distribution (se)
                                                                            se current




                                      0.7
                                      0.6
                                      0.5
                                      0.4                   ●
                            % pairs

                                      0.3




                                                        ●
                                      0.2




                                                                ●
                                      0.1




                                                                    ●
                                      0.0




                                            ●   ●   ●                   ●   ●    ●   ●     ●   ●   ●   ●   ●    ●   ●   ●   ●



                                            0                   5                    10                    15

                                                                                distance




Friday, September 7, 2012
Average distance
                                                                      it                                                                                   se

                                                 ●                                                                                     ●


                                                                                                                                                                                                  2008   curr



                                                                                                                    4.0 4.5 5.0 5.5
              avg. distance




                                                                                                    avg. distance
                              4 5 6 7 8 9




                                                        ●

                                                                                                                                              ●
                                                                                                                                                     ●             ●
                                                               ●




                                                                                                                                                                                                  6.58 3.90
                                                                             ●      ●       ●                                                                             ●       ●


                                                2007   2008   2009          2010   2011   current                                     2007   2008   2009          2010   2011   current
                                                                                                                                                                                           it
                                                                     year                                                                                  year




                                                                                                                                                                                                  4.33 3.89
                                                                     itse                                                                                  us

                                                 ●                                                                                            ●      ●
                                                                                                                                                                                           se
                              9




                                                                                                                    4.7
              avg. distance




                                                                                                    avg. distance
                              8




                                                                                                                                                                   ●
                              7




                                                                                                                    4.5
                              6




                                                                                                                                                                                                  4.9 4.16
                                                                                                                                                                          ●


                                                                                                                                                                                          it+se
                              5




                                                        ●      ●                                                                                                                  ●
                                                                             ●
                                                                                                                    4.3




                                                                                    ●       ●                                          ●
                              4




                                                2007   2008   2009          2010   2011   current                                     2007   2008   2009          2010   2011   current

                                                                     year                                                                                  year




                                                        ●      ●
                                                                     fb                                                                                                                    us     4.74 4.32
                              4.6 4.8 5.0 5.2
              avg. distance




                                                                             ●




                                                 ●
                                                                                    ●       ●
                                                                                                                                                                                           $      5.28 4.74
                                                2007   2008   2009          2010   2011   current

                                                                     year




Friday, September 7, 2012
Average distance
                                                                      it                                                                                   se

                                                 ●                                                                                     ●


                                                                                                                                                                                                  2008   curr



                                                                                                                    4.0 4.5 5.0 5.5
              avg. distance




                                                                                                    avg. distance
                              4 5 6 7 8 9




                                                        ●

                                                                                                                                              ●
                                                                                                                                                     ●             ●
                                                               ●




                                                                                                                                                                                                  6.58 3.90
                                                                             ●      ●       ●                                                                             ●       ●


                                                2007   2008   2009          2010   2011   current                                     2007   2008   2009          2010   2011   current
                                                                                                                                                                                           it
                                                                     year                                                                                  year




                                                                                                                                                                                                  4.33 3.89
                                                                     itse                                                                                  us

                                                 ●                                                                                            ●      ●
                                                                                                                                                                                           se
                              9




                                                                                                                    4.7
              avg. distance




                                                                                                    avg. distance
                              8




                                                                                                                                                                   ●
                              7




                                                                                                                    4.5
                              6




                                                                                                                                                                                                  4.9 4.16
                                                                                                                                                                          ●


                                                                                                                                                                                          it+se
                              5




                                                        ●      ●                                                                                                                  ●
                                                                             ●
                                                                                                                    4.3




                                                                                    ●       ●                                          ●
                              4




                                                2007   2008   2009          2010   2011   current                                     2007   2008   2009          2010   2011   current

                                                                     year                                                                                  year




                                                        ●      ●
                                                                     fb                                                                                                                    us     4.74 4.32
                              4.6 4.8 5.0 5.2
              avg. distance




                                                                             ●




                                                 ●
                                                                                    ●       ●
                                                                                                                                                                                           $      5.28 4.74
                                                2007   2008   2009          2010   2011   current

                                                                     year




Friday, September 7, 2012
Average distance
                                                                      it                                                                                   se

                                                 ●                                                                                     ●


                                                                                                                                                                                                  2008   curr



                                                                                                                    4.0 4.5 5.0 5.5
              avg. distance




                                                                                                    avg. distance
                              4 5 6 7 8 9




                                                        ●

                                                                                                                                              ●
                                                                                                                                                     ●             ●
                                                               ●




                                                                                                                                                                                                  6.58 3.90
                                                                             ●      ●       ●                                                                             ●       ●


                                                2007   2008   2009          2010   2011   current                                     2007   2008   2009          2010   2011   current
                                                                                                                                                                                           it
                                                                     year                                                                                  year




                                                                                                                                                                                                  4.33 3.89
                                                                     itse                                                                                  us

                                                 ●                                                                                            ●      ●
                                                                                                                                                                                           se
                              9




                                                                                                                    4.7
              avg. distance




                                                                                                    avg. distance
                              8




                                                                                                                                                                   ●
                              7




                                                                                                                    4.5
                              6




                                                                                                                                                                                                  4.9 4.16
                                                                                                                                                                          ●


                                                                                                                                                                                          it+se
                              5




                                                        ●      ●                                                                                                                  ●
                                                                             ●
                                                                                                                    4.3




                                                                                    ●       ●                                          ●
                              4




                                                2007   2008   2009          2010   2011   current                                     2007   2008   2009          2010   2011   current

                                                                     year                                                                                  year




                                                        ●      ●
                                                                     fb                                                                                                                    us     4.74 4.32
                              4.6 4.8 5.0 5.2
              avg. distance




                                                                             ●




                                                 ●
                                                                                    ●       ●
                                                                                                                                                                                           $      5.28 4.74
                                                2007   2008   2009          2010   2011   current

                                                                     year
                                                                                                                                      $ (current): 92% pairs
                                                                                                                                          are reachable!
Friday, September 7, 2012
Effective diameter (@                                                         90%)
                                                effective diameters

                                        ●                                                       2008   curr
                                   8




                                        ●
                                        ●
                                        ●
                                               ●
                                               ●                  ●
                                                                                          it    9.0 5.2
                                               ●
                                               ●                                    ●
                                                                  ●
                                   6




                                        ●                                           ●
                                               ●


                                                                                                5.9 5.3
                                                                  ●                 ●
                                                                  ●
                                                                                          se
                 effective diam.




                                                                                    ●

                                                                                    ●
                                   4




                                                                                           it
                                                                        *
                                                                        *
                                                                        *
                                                                            it
                                                                            se
                                                                            itse
                                                                            us
                                                                                          +se   6.8 5.8
                                                                        *
                                                                        *   fb
                                   2




                                                                                          us    6.5 5.8
                                   0




                                       2008   2009               2010              2011
                                                                                          $     7.0 6.2
                                                       year




Friday, September 7, 2012
Harmonic diameter
                                                 harmonic diameters

                                                                                               2008   curr
                                  30




                                        ●




                                                                                               23.7 3.4
                                  25




                                                                                         it
                                                                       *   it
                                  20




                                                                       *   se
                                                                       *   itse
                                                                                         se    4.5 4.0
                 harmonic diam.




                                                                       *   us
                                                                       *   fb
                                  15




                                                                                          it
                                                                                         +se   5.8 3.8
                                  10




                                        ●
                                        ●
                                        ●        ●                ●                ●
                                                                                         us    4.6 4.4
                                  5




                                                 ●
                                                 ●                ●
                                        ●                         ●
                                                 ●                                 ●
                                                                  ●                ●
                                                                                   ●
                                                                                   ●
                                  0




                                       2008     2009            2010              2011
                                                                                         $     5.7 4.6
                                                        year




Friday, September 7, 2012
Average degree vs. density ($)

                                   Avg. degree   Density
                            2009      88.7       6.4 * 10-7
                            2010      113.0      3.4 * 10-7
                            2011      169.0      3.0 * 10-7
                            curr      190.4      2.6 * 10-7

Friday, September 7, 2012
Actual diameter

                                     2008     curr
                              it      >29         =25
                              se      >16         =25
                            it+se     >21     =27
                              us      >17     =30
                              #       >16     >58

Friday, September 7, 2012
Actual diameter
                                            Used the fringe/double-sweep
                                                  technique for “=”


                                     2008        curr
                              it      >29         =25
                              se      >16         =25
                            it+se     >21         =27
                              us      >17         =30
                              #       >16         >58

Friday, September 7, 2012
Other applications
                    Spid, network robustness and more...




Friday, September 7, 2012
What are distances good for?




Friday, September 7, 2012
What are distances good for?


             • Network models are usually studied on the base of
                 the local statistics they produce




Friday, September 7, 2012
What are distances good for?


             • Network models are usually studied on the base of
                 the local statistics they produce

             • Not difficult to obtain models that behave correctly
                 locally (i.e., as far as degree distribution, assortativity,
                 clustering coefficients... are concerned)




Friday, September 7, 2012
Global = more informative!




Friday, September 7, 2012
An application




Friday, September 7, 2012
An application


               • An application: use the distance distribution as a
                   graph digest




Friday, September 7, 2012
An application


               • An application: use the distance distribution as a
                   graph digest

               • Typical example: if I modify the graph with a
                   certain criterion, how much does the distance
                   distribution change?




Friday, September 7, 2012
Node elimination




Friday, September 7, 2012
Node elimination


               • Consider a certain ordering of the vertices of a graph




Friday, September 7, 2012
Node elimination


               • Consider a certain ordering of the vertices of a graph
               • Fix a threshold ϑ, delete all vertices (and all
                   incident arcs) in the specified order, until ϑm arcs
                   have been deleted




Friday, September 7, 2012
Node elimination


               • Consider a certain ordering of the vertices of a graph
               • Fix a threshold ϑ, delete all vertices (and all
                   incident arcs) in the specified order, until ϑm arcs
                   have been deleted

               • Compute the “difference” between the graph you
                   obtained and the original one




Friday, September 7, 2012
Experiment

                                          0.12

                                           0.1

                                          0.08
                            probability




                                          0.06

                                          0.04

                                          0.02

                                            0
                                                 1                   10             100
                                                                   length

                                                     0.00   0.05      0.15   0.30
                                                     0.01   0.10      0.20



                       Deleting nodes in order of (syntactic) depth
Friday, September 7, 2012
Experiment (cont.)

                            0.9
                            0.8
                            0.7
                            0.6
                            0.5
                            0.4
                            0.3
                            0.2
                            0.1
                              0
                                  0       0.05       0.1   0.15   0.2   0.25   0.3
                                                            θ

                                       Kullback-Leibler                  L1
                                      δ-average distance                 L2


                        Distribution divergence (various measures)
Friday, September 7, 2012
Removal strategies
                                                    compared
                                                  1

                                                 0.8
                            δ-average distance




                                                 0.6

                                                 0.4

                                                 0.2

                                                  0

                                                       0       0.05   0.1   0.15   0.2        0.25   0.3
                                                                             θ

                                                           random           PR           near-root
                                                            degree          LP


Friday, September 7, 2012
Removal in social networks

                                                  1

                                                 0.8
                            δ-average distance




                                                 0.6

                                                 0.4

                                                 0.2

                                                  0

                                                       0   0.05      0.1   0.15        0.2   0.25   0.3
                                                                            θ

                                                  random          degree          PR           LP




Friday, September 7, 2012
Findings




Friday, September 7, 2012
Findings


               • Depth-order, PR etc. are strongly disruptive on
                   web graphs




Friday, September 7, 2012
Findings


               • Depth-order, PR etc. are strongly disruptive on
                   web graphs

               • Proper social networks are much more robust, still
                   being similar to web graphs under many respects




Friday, September 7, 2012
Another application: Spid




Friday, September 7, 2012
Another application: Spid

             • We propose to use spid (shortest-paths index of
                 dispersion), the ratio between variance and average in
                 the distance distribution




Friday, September 7, 2012
Another application: Spid

             • We propose to use spid (shortest-paths index of
                 dispersion), the ratio between variance and average in
                 the distance distribution

             • When the dispersion index is <1, the distribution is
                 subdispersed; >1, is superdispersed




Friday, September 7, 2012
Another application: Spid

             • We propose to use spid (shortest-paths index of
                 dispersion), the ratio between variance and average in
                 the distance distribution

             • When the dispersion index is <1, the distribution is
                 subdispersed; >1, is superdispersed

             • Web graphs and social networks are different under
                 this viewpoint!



Friday, September 7, 2012
Spid plot
                     4

                  3.5

                     3

                  2.5
        spid




                     2

                  1.5

                     1

                  0.5

                     0
                     10000   100000   1e+06   1e+07   1e+08   1e+09   1e+10
                                               size
Friday, September 7, 2012
Spid conjecture




Friday, September 7, 2012
Spid conjecture

               • We conjecture that spid is able to tell social
                   networks from web graphs




Friday, September 7, 2012
Spid conjecture

               • We conjecture that spid is able to tell social
                   networks from web graphs

               • Averag distance alone would not suffice: it is very
                   changeable and depends on the scale




Friday, September 7, 2012
Spid conjecture

               • We conjecture that spid is able to tell social
                   networks from web graphs

               • Averag distance alone would not suffice: it is very
                   changeable and depends on the scale

               • Spid, instead, seems to have a clear cutpoint at 1




Friday, September 7, 2012
Spid conjecture

               • We conjecture that spid is able to tell social
                   networks from web graphs

               • Averag distance alone would not suffice: it is very
                   changeable and depends on the scale

               • Spid, instead, seems to have a clear cutpoint at 1
               • What is Facebook spid?


Friday, September 7, 2012
Spid conjecture

               • We conjecture that spid is able to tell social
                   networks from web graphs

               • Averag distance alone would not suffice: it is very
                   changeable and depends on the scale

               • Spid, instead, seems to have a clear cutpoint at 1
               • What is Facebook spid?        [Answer: 0.093]



Friday, September 7, 2012
Average distance∝ Effective diameter
                                   18

                                   16

                                   14
                average distance




                                   12

                                   10

                                    8

                                    6

                                    4

                                    2
                                        0   5        10        15        20       25   30
                                                interpolated effective diameter

Friday, September 7, 2012
That’s all, folks!




Friday, September 7, 2012

Mais conteúdo relacionado

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Hyper an fandfb-handout

  • 1. Graph distance distribution for social network mining Paolo Boldi, Marco Rosa, Sebastiano Vigna Laboratory for Web Algorithmics Università degli Studi di Milano, Italy Lars Backstrom, Johan Ugander Facebook Friday, September 7, 2012
  • 2. Plan of the talk Friday, September 7, 2012
  • 3. Plan of the talk • Computing distances in large graphs (using HyperANF) Friday, September 7, 2012
  • 4. Plan of the talk • Computing distances in large graphs (using HyperANF) • Running HyperANF on Facebook (the largest Milgram-like experiment ever performed) Friday, September 7, 2012
  • 5. Plan of the talk • Computing distances in large graphs (using HyperANF) • Running HyperANF on Facebook (the largest Milgram-like experiment ever performed) • Other uses of distances (in particular: robustness) Friday, September 7, 2012
  • 6. Prelude Milgram’s experiment is 45 Friday, September 7, 2012
  • 7. Where it all started... Friday, September 7, 2012
  • 8. Where it all started... • M. Kochen, I. de Sola Pool: Contacts and influences. (Manuscript, early 50s) Friday, September 7, 2012
  • 9. Where it all started... • M. Kochen, I. de Sola Pool: Contacts and influences. (Manuscript, early 50s) • A. Rapoport, W.J. Horvath: A study of a large sociogram. (Behav.Sci. 1961) Friday, September 7, 2012
  • 10. Where it all started... • M. Kochen, I. de Sola Pool: Contacts and influences. (Manuscript, early 50s) • A. Rapoport, W.J. Horvath: A study of a large sociogram. (Behav.Sci. 1961) • S. Milgram, An experimental study of the sma$ world problem. (Sociometry, 1969) Friday, September 7, 2012
  • 12. Milgram’s experiment • 300 people (starting population) are asked to dispatch a parcel to a single individual (target) Friday, September 7, 2012
  • 13. Milgram’s experiment • 300 people (starting population) are asked to dispatch a parcel to a single individual (target) • The target was a Boston stockbroker Friday, September 7, 2012
  • 14. Milgram’s experiment • 300 people (starting population) are asked to dispatch a parcel to a single individual (target) • The target was a Boston stockbroker • The starting population is selected as follows: Friday, September 7, 2012
  • 15. Milgram’s experiment • 300 people (starting population) are asked to dispatch a parcel to a single individual (target) • The target was a Boston stockbroker • The starting population is selected as follows: • 100 were random Boston inhabitants (group A) Friday, September 7, 2012
  • 16. Milgram’s experiment • 300 people (starting population) are asked to dispatch a parcel to a single individual (target) • The target was a Boston stockbroker • The starting population is selected as follows: • 100 were random Boston inhabitants (group A) • 100 were random Nebraska strockbrokers (group B) Friday, September 7, 2012
  • 17. Milgram’s experiment • 300 people (starting population) are asked to dispatch a parcel to a single individual (target) • The target was a Boston stockbroker • The starting population is selected as follows: • 100 were random Boston inhabitants (group A) • 100 were random Nebraska strockbrokers (group B) • 100 were random Nebraska inhabitants (group C) Friday, September 7, 2012
  • 19. Milgram’s experiment • Rules of the game: Friday, September 7, 2012
  • 20. Milgram’s experiment • Rules of the game: • parcels could be directly sent only to someone the sender knows personally Friday, September 7, 2012
  • 21. Milgram’s experiment • Rules of the game: • parcels could be directly sent only to someone the sender knows personally • 453 intermediaries happened to be involved in the experiments (besides the starting population and the target) Friday, September 7, 2012
  • 23. Milgram’s experiment • Questions Milgram wanted to answer: Friday, September 7, 2012
  • 24. Milgram’s experiment • Questions Milgram wanted to answer: • How many parcels will reach the target? Friday, September 7, 2012
  • 25. Milgram’s experiment • Questions Milgram wanted to answer: • How many parcels will reach the target? • What is the distribution of the number of hops required to reach the target? Friday, September 7, 2012
  • 26. Milgram’s experiment • Questions Milgram wanted to answer: • How many parcels will reach the target? • What is the distribution of the number of hops required to reach the target? • Is this distribution different for the three starting subpopulations? Friday, September 7, 2012
  • 28. Milgram’s experiment • Answers: Friday, September 7, 2012
  • 29. Milgram’s experiment • Answers: • How many parcels will reach the target? 29% Friday, September 7, 2012
  • 30. Milgram’s experiment • Answers: • How many parcels will reach the target? 29% • What is the distribution of the number of hops required to reach the target? Avg. was 5.2 Friday, September 7, 2012
  • 31. Milgram’s experiment • Answers: • How many parcels will reach the target? 29% • What is the distribution of the number of hops required to reach the target? Avg. was 5.2 • Is this distribution different for the three starting subpopulations? Yes: avg. for groups A/B/C was 4.6/5.4/5.7, respectively Friday, September 7, 2012
  • 34. Milgram’s popularity • Six degrees of separation slipped away from the scientific niche to enter the world of popular immagination: Friday, September 7, 2012
  • 35. Milgram’s popularity • Six degrees of separation slipped away from the scientific niche to enter the world of popular immagination: • “Six degrees of separation” is a play by John Guare... Friday, September 7, 2012
  • 36. Milgram’s popularity • Six degrees of separation slipped away from the scientific niche to enter the world of popular immagination: • “Six degrees of separation” is a play by John Guare... • ...a movie by Fred Schepisi... Friday, September 7, 2012
  • 37. Milgram’s popularity • Six degrees of separation slipped away from the scientific niche to enter the world of popular immagination: • “Six degrees of separation” is a play by John Guare... • ...a movie by Fred Schepisi... • ...a song sung by dolls in their national costume at Disneyland in a heart-warming exhibition celebrating the connectedness of people all Friday, September 7, 2012
  • 39. Milgram’s criticisms • “Could it be a big world after all? (The six- degrees-of-separation myth)” (Judith S. Kleinfeld, 2002) Friday, September 7, 2012
  • 40. Milgram’s criticisms • “Could it be a big world after all? (The six- degrees-of-separation myth)” (Judith S. Kleinfeld, 2002) • The vast majority of chains were never completed Friday, September 7, 2012
  • 41. Milgram’s criticisms • “Could it be a big world after all? (The six- degrees-of-separation myth)” (Judith S. Kleinfeld, 2002) • The vast majority of chains were never completed • Extremely difficult to reproduce Friday, September 7, 2012
  • 43. Measuring what? • But what did Milgram’s experiment reveal, after all? Friday, September 7, 2012
  • 44. Measuring what? • But what did Milgram’s experiment reveal, after all? i) That the world is small Friday, September 7, 2012
  • 45. Measuring what? • But what did Milgram’s experiment reveal, after all? i) That the world is small ii)That people are able to exploit this smallness Friday, September 7, 2012
  • 46. HyperANF A tool to compute distances in large graphs Friday, September 7, 2012
  • 48. Introduction • You want to study the properties of a huge graph (typically: a social network) Friday, September 7, 2012
  • 49. Introduction • You want to study the properties of a huge graph (typically: a social network) • You want to obtain some information about its global structure (not simply triangle-counting/degree distribution/etc.) Friday, September 7, 2012
  • 50. Introduction • You want to study the properties of a huge graph (typically: a social network) • You want to obtain some information about its global structure (not simply triangle-counting/degree distribution/etc.) • A natural candidate: distance distribution Friday, September 7, 2012
  • 51. Graph distances and distribution Friday, September 7, 2012
  • 52. Graph distances and distribution • Given a graph, d(x,y) is the length of the shortest path from x to y (∞ if one cannot go from x to y) Friday, September 7, 2012
  • 53. Graph distances and distribution • Given a graph, d(x,y) is the length of the shortest path from x to y (∞ if one cannot go from x to y) • For undirected graphs, d(x,y)=d(y,x) Friday, September 7, 2012
  • 54. Graph distances and distribution • Given a graph, d(x,y) is the length of the shortest path from x to y (∞ if one cannot go from x to y) • For undirected graphs, d(x,y)=d(y,x) • For every t, count the number of pairs (x,y) such that d(x,y)=t Friday, September 7, 2012
  • 55. Graph distances and distribution • Given a graph, d(x,y) is the length of the shortest path from x to y (∞ if one cannot go from x to y) • For undirected graphs, d(x,y)=d(y,x) • For every t, count the number of pairs (x,y) such that d(x,y)=t • The fraction of pairs at distance t is (the density function of) a distribution Friday, September 7, 2012
  • 57. Exact computation • How can one compute the distance distribution? Friday, September 7, 2012
  • 58. Exact computation • How can one compute the distance distribution? • Weighted graphs: Dijkstra (single-source: O(n2)), Floyd-Warshall (all-pairs: O(n3)) Friday, September 7, 2012
  • 59. Exact computation • How can one compute the distance distribution? • Weighted graphs: Dijkstra (single-source: O(n2)), Floyd-Warshall (all-pairs: O(n3)) • In the unweighted case: Friday, September 7, 2012
  • 60. Exact computation • How can one compute the distance distribution? • Weighted graphs: Dijkstra (single-source: O(n2)), Floyd-Warshall (all-pairs: O(n3)) • In the unweighted case: • a single BFS solves the single-source version of the problem: O(m) Friday, September 7, 2012
  • 61. Exact computation • How can one compute the distance distribution? • Weighted graphs: Dijkstra (single-source: O(n2)), Floyd-Warshall (all-pairs: O(n3)) • In the unweighted case: • a single BFS solves the single-source version of the problem: O(m) • if we repeat it from every source: O(nm) Friday, September 7, 2012
  • 63. Sampling pairs • Sample at random pairs of nodes (x,y) Friday, September 7, 2012
  • 64. Sampling pairs • Sample at random pairs of nodes (x,y) • Compute d(x,y) with a BFS from x Friday, September 7, 2012
  • 65. Sampling pairs • Sample at random pairs of nodes (x,y) • Compute d(x,y) with a BFS from x • (Possibly: reject the pair if d(x,y) is infinite) Friday, September 7, 2012
  • 67. Sampling pairs • For every t, the fraction of sampled pairs that were found at distance t are an estimator of the value of the probability mass function Friday, September 7, 2012
  • 68. Sampling pairs • For every t, the fraction of sampled pairs that were found at distance t are an estimator of the value of the probability mass function • Takes a BFS for every pair O(m) Friday, September 7, 2012
  • 70. Sampling sources • Sample at random a source x Friday, September 7, 2012
  • 71. Sampling sources • Sample at random a source x • Compute a full BFS from x Friday, September 7, 2012
  • 73. Sampling sources • It is an unbiased estimator only for undirected and connected graphs Friday, September 7, 2012
  • 74. Sampling sources • It is an unbiased estimator only for undirected and connected graphs • Uses anyway BFS... Friday, September 7, 2012
  • 75. Sampling sources • It is an unbiased estimator only for undirected and connected graphs • Uses anyway BFS... • ...not cache friendly Friday, September 7, 2012
  • 76. Sampling sources • It is an unbiased estimator only for undirected and connected graphs • Uses anyway BFS... • ...not cache friendly • ...not compression friendly Friday, September 7, 2012
  • 78. Cohen’s sampling • Edith Cohen [JCSS 1997] came out with a very general framework for size estimation: powerful, but doesn’t scale well, it is not easily parallelizable, requires direct access Friday, September 7, 2012
  • 80. Alternative: Diffusion • Basic idea: Palmer et. al, KDD ’02 Friday, September 7, 2012
  • 81. Alternative: Diffusion • Basic idea: Palmer et. al, KDD ’02 • Let Bt(x) be the ball of radius t about x (the set of nodes at distance ≤t from x) Friday, September 7, 2012
  • 82. Alternative: Diffusion • Basic idea: Palmer et. al, KDD ’02 • Let Bt(x) be the ball of radius t about x (the set of nodes at distance ≤t from x) • Clearly B0(x)={x} Friday, September 7, 2012
  • 83. Alternative: Diffusion • Basic idea: Palmer et. al, KDD ’02 • Let Bt(x) be the ball of radius t about x (the set of nodes at distance ≤t from x) • Clearly B0(x)={x} • Moreover Bt+1(x)=∪x→yBt(y)∪{x} Friday, September 7, 2012
  • 84. Alternative: Diffusion • Basic idea: Palmer et. al, KDD ’02 • Let Bt(x) be the ball of radius t about x (the set of nodes at distance ≤t from x) • Clearly B0(x)={x} • Moreover Bt+1(x)=∪x→yBt(y)∪{x} • So computing Bt+1 starting from Bt one just need a single (sequential) scan of the graph Friday, September 7, 2012
  • 85. Easy but costly Friday, September 7, 2012
  • 86. Easy but costly • Every set requires O(n) bits, hence O(n2) bits overall Friday, September 7, 2012
  • 87. Easy but costly • Every set requires O(n) bits, hence O(n2) bits overall • Too many! Friday, September 7, 2012
  • 88. Easy but costly • Every set requires O(n) bits, hence O(n2) bits overall • Too many! • What about using approximated sets? Friday, September 7, 2012
  • 89. Easy but costly • Every set requires O(n) bits, hence O(n2) bits overall • Too many! • What about using approximated sets? • We need probabilistic counters, with just two primitives: add and size? Friday, September 7, 2012
  • 90. Easy but costly • Every set requires O(n) bits, hence O(n2) bits overall • Too many! • What about using approximated sets? • We need probabilistic counters, with just two primitives: add and size? • Very small! Friday, September 7, 2012
  • 92. HyperANF • We used HyperLogLog counters [Flajolet et al., 2007] Friday, September 7, 2012
  • 93. HyperANF • We used HyperLogLog counters [Flajolet et al., 2007] • With 40 bits you can count up to 4 billion with a standard deviation of 6% Friday, September 7, 2012
  • 94. HyperANF • We used HyperLogLog counters [Flajolet et al., 2007] • With 40 bits you can count up to 4 billion with a standard deviation of 6% • Remember: one set per node! Friday, September 7, 2012
  • 96. Observe that • Every single counter has a guaranteed relative standard deviation (depending only on the number of registers per counter) Friday, September 7, 2012
  • 97. Observe that • Every single counter has a guaranteed relative standard deviation (depending only on the number of registers per counter) • This implies a guarantee on the summation of the counters Friday, September 7, 2012
  • 98. Observe that • Every single counter has a guaranteed relative standard deviation (depending only on the number of registers per counter) • This implies a guarantee on the summation of the counters • This gives in turn precision bounds on the estimated distribution with respect to the real one Friday, September 7, 2012
  • 100. Other tricks • We use broadword programming to compute efficiently unions Friday, September 7, 2012
  • 101. Other tricks • We use broadword programming to compute efficiently unions • Systolic computation for on-demand updates of counters Friday, September 7, 2012
  • 102. Other tricks • We use broadword programming to compute efficiently unions • Systolic computation for on-demand updates of counters • Exploited micropara$elization of multicore architectures Friday, September 7, 2012
  • 104. Real speed? • Small dimension: 1.8min vs. 4.6h on a graph with 7.4M nodes Friday, September 7, 2012
  • 105. Real speed? • Small dimension: 1.8min vs. 4.6h on a graph with 7.4M nodes • Large dimension: HADI [Kang et al., 2010] is a Hadoop-conscious implementation of ANF. Takes 30 minutes on a 200K-node graph (on one of the 50 world largest supercomputers). HyperANF does the same in 2.25min on our workstation (20 min on this laptop). Friday, September 7, 2012
  • 107. Try it! • HyperANF is available within the webgraph package Friday, September 7, 2012
  • 108. Try it! • HyperANF is available within the webgraph package • Download it from Friday, September 7, 2012
  • 109. Try it! • HyperANF is available within the webgraph package • Download it from • http://webgraph.law.dsi.unimi.it/ Friday, September 7, 2012
  • 110. Try it! • HyperANF is available within the webgraph package • Download it from • http://webgraph.law.dsi.unimi.it/ • Or google for webgraph Friday, September 7, 2012
  • 111. Running it on Facebook! [with Lars Backstrom and Johan Ugander] Friday, September 7, 2012
  • 113. Facebook • Facebook opened up to non-college students on September 26, 2006 Friday, September 7, 2012
  • 114. Facebook • Facebook opened up to non-college students on September 26, 2006 • So, between 1 Jan 2007 and 1 Jan 2008 the number of users exploded Friday, September 7, 2012
  • 115. Experiments (time) • We ran our experiments on snapshots of facebook • Jan 1, 2007 • Jan 1, 2008 ... • Jan 1, 2011 • [current] May, 2011 Friday, September 7, 2012
  • 116. Experiments (dataset) • We considered: • -: the whole facebook • it / se: only Italian / Swedish users • it+se: only Italian & Swedish users • us: only US users • Based on users’ current geo-IP location Friday, September 7, 2012
  • 117. Active users • We only considered active users (users who have done some activity in the 28 days preceding 9 Jun 2011) • So we are not considering “old” users that are not active any more • For - [current] we have about 750M nodes Friday, September 7, 2012
  • 118. Distance distribution ($) Friday, September 7, 2012
  • 119. Distance distribution ($) fb 2007 0.7 0.6 0.5 ● 0.4 % pairs ● 0.3 0.2 0.1 ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 120. Distance distribution ($) fb 2008 0.7 0.6 0.5 ● 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 121. Distance distribution ($) fb 2009 0.7 0.6 0.5 ● 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 122. Distance distribution ($) fb 2010 0.7 0.6 0.5 0.4 ● % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 123. Distance distribution ($) fb 2011 0.7 0.6 0.5 0.4 ● % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 124. Distance distribution ($) fb current 0.7 0.6 ● 0.5 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 126. Distance distribution (it) it 2007 0.7 0.6 0.5 0.4 % pairs 0.3 0.2 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● 0 5 10 15 distance Friday, September 7, 2012
  • 127. Distance distribution (it) it 2008 0.7 0.6 0.5 0.4 % pairs 0.3 ● ● 0.2 ● 0.1 ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 128. Distance distribution (it) it 2009 0.7 0.6 0.5 ● 0.4 ● % pairs 0.3 0.2 0.1 ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 129. Distance distribution (it) it 2010 0.7 ● 0.6 0.5 0.4 % pairs 0.3 0.2 ● ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 130. Distance distribution (it) it 2011 0.7 ● 0.6 0.5 0.4 % pairs 0.3 ● 0.2 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 131. Distance distribution (it) it current 0.7 ● 0.6 0.5 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 133. Distance distribution (se) se 2007 0.7 0.6 0.5 0.4 % pairs 0.3 0.2 ● ● ● ● 0.1 ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 134. Distance distribution (se) se 2008 0.7 0.6 ● 0.5 0.4 % pairs ● 0.3 0.2 0.1 ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 135. Distance distribution (se) se 2009 0.7 0.6 ● 0.5 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 136. Distance distribution (se) se 2010 0.7 0.6 ● 0.5 0.4 % pairs 0.3 0.2 ● ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 137. Distance distribution (se) se 2011 0.7 ● 0.6 0.5 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 138. Distance distribution (se) se 2011 0.7 ● 0.6 0.5 0.4 % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 139. Distance distribution (se) se current 0.7 0.6 0.5 0.4 ● % pairs 0.3 ● 0.2 ● 0.1 ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 distance Friday, September 7, 2012
  • 140. Average distance it se ● ● 2008 curr 4.0 4.5 5.0 5.5 avg. distance avg. distance 4 5 6 7 8 9 ● ● ● ● ● 6.58 3.90 ● ● ● ● ● 2007 2008 2009 2010 2011 current 2007 2008 2009 2010 2011 current it year year 4.33 3.89 itse us ● ● ● se 9 4.7 avg. distance avg. distance 8 ● 7 4.5 6 4.9 4.16 ● it+se 5 ● ● ● ● 4.3 ● ● ● 4 2007 2008 2009 2010 2011 current 2007 2008 2009 2010 2011 current year year ● ● fb us 4.74 4.32 4.6 4.8 5.0 5.2 avg. distance ● ● ● ● $ 5.28 4.74 2007 2008 2009 2010 2011 current year Friday, September 7, 2012
  • 141. Average distance it se ● ● 2008 curr 4.0 4.5 5.0 5.5 avg. distance avg. distance 4 5 6 7 8 9 ● ● ● ● ● 6.58 3.90 ● ● ● ● ● 2007 2008 2009 2010 2011 current 2007 2008 2009 2010 2011 current it year year 4.33 3.89 itse us ● ● ● se 9 4.7 avg. distance avg. distance 8 ● 7 4.5 6 4.9 4.16 ● it+se 5 ● ● ● ● 4.3 ● ● ● 4 2007 2008 2009 2010 2011 current 2007 2008 2009 2010 2011 current year year ● ● fb us 4.74 4.32 4.6 4.8 5.0 5.2 avg. distance ● ● ● ● $ 5.28 4.74 2007 2008 2009 2010 2011 current year Friday, September 7, 2012
  • 142. Average distance it se ● ● 2008 curr 4.0 4.5 5.0 5.5 avg. distance avg. distance 4 5 6 7 8 9 ● ● ● ● ● 6.58 3.90 ● ● ● ● ● 2007 2008 2009 2010 2011 current 2007 2008 2009 2010 2011 current it year year 4.33 3.89 itse us ● ● ● se 9 4.7 avg. distance avg. distance 8 ● 7 4.5 6 4.9 4.16 ● it+se 5 ● ● ● ● 4.3 ● ● ● 4 2007 2008 2009 2010 2011 current 2007 2008 2009 2010 2011 current year year ● ● fb us 4.74 4.32 4.6 4.8 5.0 5.2 avg. distance ● ● ● ● $ 5.28 4.74 2007 2008 2009 2010 2011 current year $ (current): 92% pairs are reachable! Friday, September 7, 2012
  • 143. Effective diameter (@ 90%) effective diameters ● 2008 curr 8 ● ● ● ● ● ● it 9.0 5.2 ● ● ● ● 6 ● ● ● 5.9 5.3 ● ● ● se effective diam. ● ● 4 it * * * it se itse us +se 6.8 5.8 * * fb 2 us 6.5 5.8 0 2008 2009 2010 2011 $ 7.0 6.2 year Friday, September 7, 2012
  • 144. Harmonic diameter harmonic diameters 2008 curr 30 ● 23.7 3.4 25 it * it 20 * se * itse se 4.5 4.0 harmonic diam. * us * fb 15 it +se 5.8 3.8 10 ● ● ● ● ● ● us 4.6 4.4 5 ● ● ● ● ● ● ● ● ● ● ● 0 2008 2009 2010 2011 $ 5.7 4.6 year Friday, September 7, 2012
  • 145. Average degree vs. density ($) Avg. degree Density 2009 88.7 6.4 * 10-7 2010 113.0 3.4 * 10-7 2011 169.0 3.0 * 10-7 curr 190.4 2.6 * 10-7 Friday, September 7, 2012
  • 146. Actual diameter 2008 curr it >29 =25 se >16 =25 it+se >21 =27 us >17 =30 # >16 >58 Friday, September 7, 2012
  • 147. Actual diameter Used the fringe/double-sweep technique for “=” 2008 curr it >29 =25 se >16 =25 it+se >21 =27 us >17 =30 # >16 >58 Friday, September 7, 2012
  • 148. Other applications Spid, network robustness and more... Friday, September 7, 2012
  • 149. What are distances good for? Friday, September 7, 2012
  • 150. What are distances good for? • Network models are usually studied on the base of the local statistics they produce Friday, September 7, 2012
  • 151. What are distances good for? • Network models are usually studied on the base of the local statistics they produce • Not difficult to obtain models that behave correctly locally (i.e., as far as degree distribution, assortativity, clustering coefficients... are concerned) Friday, September 7, 2012
  • 152. Global = more informative! Friday, September 7, 2012
  • 154. An application • An application: use the distance distribution as a graph digest Friday, September 7, 2012
  • 155. An application • An application: use the distance distribution as a graph digest • Typical example: if I modify the graph with a certain criterion, how much does the distance distribution change? Friday, September 7, 2012
  • 157. Node elimination • Consider a certain ordering of the vertices of a graph Friday, September 7, 2012
  • 158. Node elimination • Consider a certain ordering of the vertices of a graph • Fix a threshold ϑ, delete all vertices (and all incident arcs) in the specified order, until ϑm arcs have been deleted Friday, September 7, 2012
  • 159. Node elimination • Consider a certain ordering of the vertices of a graph • Fix a threshold ϑ, delete all vertices (and all incident arcs) in the specified order, until ϑm arcs have been deleted • Compute the “difference” between the graph you obtained and the original one Friday, September 7, 2012
  • 160. Experiment 0.12 0.1 0.08 probability 0.06 0.04 0.02 0 1 10 100 length 0.00 0.05 0.15 0.30 0.01 0.10 0.20 Deleting nodes in order of (syntactic) depth Friday, September 7, 2012
  • 161. Experiment (cont.) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.05 0.1 0.15 0.2 0.25 0.3 θ Kullback-Leibler L1 δ-average distance L2 Distribution divergence (various measures) Friday, September 7, 2012
  • 162. Removal strategies compared 1 0.8 δ-average distance 0.6 0.4 0.2 0 0 0.05 0.1 0.15 0.2 0.25 0.3 θ random PR near-root degree LP Friday, September 7, 2012
  • 163. Removal in social networks 1 0.8 δ-average distance 0.6 0.4 0.2 0 0 0.05 0.1 0.15 0.2 0.25 0.3 θ random degree PR LP Friday, September 7, 2012
  • 165. Findings • Depth-order, PR etc. are strongly disruptive on web graphs Friday, September 7, 2012
  • 166. Findings • Depth-order, PR etc. are strongly disruptive on web graphs • Proper social networks are much more robust, still being similar to web graphs under many respects Friday, September 7, 2012
  • 167. Another application: Spid Friday, September 7, 2012
  • 168. Another application: Spid • We propose to use spid (shortest-paths index of dispersion), the ratio between variance and average in the distance distribution Friday, September 7, 2012
  • 169. Another application: Spid • We propose to use spid (shortest-paths index of dispersion), the ratio between variance and average in the distance distribution • When the dispersion index is <1, the distribution is subdispersed; >1, is superdispersed Friday, September 7, 2012
  • 170. Another application: Spid • We propose to use spid (shortest-paths index of dispersion), the ratio between variance and average in the distance distribution • When the dispersion index is <1, the distribution is subdispersed; >1, is superdispersed • Web graphs and social networks are different under this viewpoint! Friday, September 7, 2012
  • 171. Spid plot 4 3.5 3 2.5 spid 2 1.5 1 0.5 0 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 size Friday, September 7, 2012
  • 173. Spid conjecture • We conjecture that spid is able to tell social networks from web graphs Friday, September 7, 2012
  • 174. Spid conjecture • We conjecture that spid is able to tell social networks from web graphs • Averag distance alone would not suffice: it is very changeable and depends on the scale Friday, September 7, 2012
  • 175. Spid conjecture • We conjecture that spid is able to tell social networks from web graphs • Averag distance alone would not suffice: it is very changeable and depends on the scale • Spid, instead, seems to have a clear cutpoint at 1 Friday, September 7, 2012
  • 176. Spid conjecture • We conjecture that spid is able to tell social networks from web graphs • Averag distance alone would not suffice: it is very changeable and depends on the scale • Spid, instead, seems to have a clear cutpoint at 1 • What is Facebook spid? Friday, September 7, 2012
  • 177. Spid conjecture • We conjecture that spid is able to tell social networks from web graphs • Averag distance alone would not suffice: it is very changeable and depends on the scale • Spid, instead, seems to have a clear cutpoint at 1 • What is Facebook spid? [Answer: 0.093] Friday, September 7, 2012
  • 178. Average distance∝ Effective diameter 18 16 14 average distance 12 10 8 6 4 2 0 5 10 15 20 25 30 interpolated effective diameter Friday, September 7, 2012
  • 179. That’s all, folks! Friday, September 7, 2012