4. Anscombe’s Quartet - Statistics
Property
Value
Equality
Mean of x in each case
9
Exact
Variance of x in each case
11
Exact
Mean of y in each case
7.50
To 2 decimal places
Variance of y in each case
4.122 or 4.127
To 3 decimal places
Correlation between x and
0.816
y in each case
Linear regression line in
each case
To 3 decimal places
To 2 and 3 decimal
y = 3.00 + 0.500x
places, respectively
4
6. No catalogue of techniques can convey a
willingness to look for what can be seen, whether
or not anticipated. Yet this is at the heart of
exploratory data analysis. ... the picture-examining
eye is the best finder we have of the wholly
unanticipated.
– Tukey, 1980
6
35. NodeXL as a Teaching Tool
I. Getting Started with Analyzing Social Media Networks
1. Introduction to Social Media and Social Networks
2. Social media: New Technologies of Collaboration
3. Social Network Analysis
II. NodeXL Tutorial: Learning by Doing
4. Layout, Visual Design & Labeling
5. Calculating & Visualizing Network Metrics
6. Preparing Data & Filtering
7. Clustering &Grouping
III Social Media Network Analysis Case Studies
8. Email
9. Threaded Networks
10. Twitter
11. Facebook
12. WWW
13. Flickr
14. YouTube
15. Wiki Networks
http://www.elsevier.com/wps/find/bookdescription.cws_home/723354/description
35
37. NodeXL Results
• Easy to learn, yet powerful and insightful
• Widely used by both students and researchers
• Free and open source sofware
• World-wide team of collaborators
Malik S, Smith A, Papadatos P, Li J, Dunne C, and Shneiderman B (2013), “TopicFlow: Visualizing topic
alignment of Twitter data over time. In ASONAM '13.
Bonsignore EM, Dunne C, Rotman D, Smith M, Capone T, Hansen DL and Shneiderman B (2009), "First steps
to NetViz Nirvana: Evaluating social network analysis with NodeXL", In CSE '09. pp. 332-339.
DOI:10.1109/CSE.2009.120
Mohammad S, Dunne C and Dorr B (2009), "Generating high-coverage semantic orientation lexicons from
overtly marked words and a thesaurus", In EMNLP '09. pp. 599-608.
Smith M, Shneiderman B, Milic-Frayling N, Rodrigues EM, Barash V, Dunne C, Capone T, Perer A and Gleave E
(2009), "Analyzing (social media) networks with NodeXL", In C&T '09. pp. 255-264.
37
DOI:0.1145/1556460.1556497
39. Lostpedia articles
Observations
1: There are repeating patterns in
networks (motifs)
2: Motifs often dominate the
visualization
3: Motifs members can be
functionally equivalent
39
55. Motif Simplification Results
• Controlled experiment with 36 users showed that
motif simplification improves user task performance
• Reducing complexity
• Understanding larger or hidden relationships
• Algorithms for detecting fans, connectors, and
cliques
• Publicly available implementation in NodeXL:
nodexl.codeplex.com
Dunne C and Shneiderman B (2013), "Motif simplification: improving network visualization readability with
fan, connector, and clique glyphs", In CHI '13. pp. 3247-3256. DOI:10.1145/2470654.2466444
Shneiderman B and Dunne C (2012), "Interactive network exploration to derive insights:
Filtering, clustering, grouping, and simplification", In Graph Drawing ‘12. pp. 2-18. DOI:10.1007/978-3-642- 55
56. 3. Explore groups in the network, including
their size, membership, and relationships
56
58. Previous Meta-Layouts
• Poorly show ties (Rodrigues et al., 2011)
• Long ties
• Group arrangement
• Aggregate relationships
OR
• Poorly show nodes & groups (Noack, 2003)
• Require much more space
• Harder to see groups
58
65. Meta-Layout Results
• Three Group-in-a-Box layout algorithms for
dissecting networks
• Improved group and overview visualization
• Empirical evaluation on 309 Twitter networks using
readability metrics
• Publicly available implementation in NodeXL:
nodexl.codeplex.com
Shneiderman B and Dunne C (2012), "Interactive network exploration to derive insights:
Filtering, clustering, grouping, and simplification", In Graph Drawing ‘12. pp. 2-18. DOI:10.1007/978-3-64236763-2_2
Chaturvedi S, Ashktorab Z, Dunne C, Zacharia R, and Shneiderman B (2013), “Croissant-Donut and ForceDirected Group-in-a-Box layouts for clustered network visualization", In preparation.
Rodrigues EM, Milic-Frayling N, Smith M, Shneiderman B, and Hansen (2011), “Group-in-a-Box layout for
multi-faceted analysis of communities”, In SocialCom ’11. pp. 354-361.
65
66. Available Now in NodeXL!
•
•
•
•
•
•
•
•
•
•
•
•
•
Motif Simplification
Group-in-a-Box Layouts
Data import spigots
Excel functions & macros
Network statistics
Layout algorithms
Filtering
Clustering
Attribute mapping
Automate analyses
Email reporting
Graph Gallery
C# libraries
nodexl.codeplex.com
Cody Dunne
IBM Research – Cambridge, MA
cdunne@us.ibm.com
Notas do Editor
Brent Spiner as Data on Star Trek: TNG
Visual bandwidth is enormousHuman perceptual skills are remarkableTrend, cluster, gap, outlier...Human image storage is fast and vastChallengesMeaningful visual displays of massive data Color, size, shape, proximity...Interaction: widgets & window coordinationImage from Wikipedia user Shultz: http://en.wikipedia.org/wiki/File:Anscombe%27s_quartet_3.svg
Tukey, John W. "We Need Both Exploratory and Confirmatory." The American Statistician 34.1 (1980): 23-25.
A NodeXL social media network diagram of relationships among Twitter users mentioning the hashtag “#WIN09” used by attendees of a conference on network science at New York University in September 2009. The size or each user’s vertex is proportional to the number of tweets that user has ever made.Edge for follow, mention, or replyTwo distinct groups – separate disciplines
Sociology (Newman & Girvan, 2004)Scientometrics (Henry et al., 2007)Politics (Adamic & Glance, 2005)Urban Planning (Scott Dempwolf)Biology (Kelley et al., 2003)Archaeology (Tom Brughmans)WWW (Cheswick et al. 2000)
Matrix – unreadable with many nodesAggregation o Attribute grouping or clustering o Lose topology info infoDunne, C.; Riche, N. H.; Lee, B.; Metoyer, R. A. & Robertson, G. G.GraphTrail: Analyzing large multivariate and heterogeneous networks while supporting exploration historyCHI '12: Proc. 2012 international conference on Human factors in computing systems, 2012Gove, R.; Gramsky, N.; Kirby, R.; Sefer, E.; Sopan, A.; Dunne, C.; Shneiderman, B. & Taieb-Maimon, M.NetVisia: Heat map & matrix visualization of dynamic social network statistics & contentSocialCom '11: Proc. 2011 IEEE 3rd International Conference on Social Computing, 2011, 19-26.DOI:10.1109/PASSAT/SocialCom.2011.216Blue R, Dunne C, Fuchs A, King K and Schulman A (2008), "Visualizing real-time network resource usage", In VizSec '08. pp. 119-135.Henry, N. & Fekete, J.-D.MatrixExplorer: A dual-representation system to explore social networksTVCG: IEEE Transactions on Visualization and Computer Graphics, 2006, 12, 677-684.DOI:10.1109/TVCG.2006.160Freire, M.; Plaisant, C.; Shneiderman, B. & Golbeck, J.ManyNets: An interface for multiple network analysis and visualizationCHI '10: Proc. 28th international conference on Human factors in computing systems, ACM, 2010, 213-222.DOI:10.1145/1753326.1753358Wattenberg, M.Visual exploration of multivariate graphsCHI '06: Proc. SIGCHI conference on Human Factors in Computing Systems, ACM, 2006, 811-819.DOI:10.1145/1124772.1124891
~30 courses on network analysisTutorials I taught.
involved since 2008As of June 2012, ~20 team, 7 me. Total: 80 ACM, 117 Scopus, 1270 Google ScholarDunne C and Shneiderman B (2013), "Motif simplification: improving network visualization readability with fan, connector, and clique glyphs", In CHI '13.Shneiderman B and Dunne C (2012), "Interactive network exploration to derive insights: Filtering, clustering, grouping, and simplification", In Graph Drawing ‘12. pp. 2-18. DOI:10.1007/978-3-642-36763-2_2Bonsignore EM, Dunne C, Rotman D, Smith M, Capone T, Hansen DL and Shneiderman B (2009), "First steps to NetViz Nirvana: Evaluating social network analysis with NodeXL", In SocialCom '09. pp. 332-339. DOI:10.1109/CSE.2009.120Mohammad S, Dunne C and Dorr B (2009), "Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus", In EMNLP '09. pp. 599-608.Smith M, Shneiderman B, Milic-Frayling N, Rodrigues EM, Barash V, Dunne C, Capone T, Perer A and Gleave E (2009), "Analyzing (social media) networks with NodeXL", In C&T '09. pp. 255-264. DOI:0.1145/1556460.1556497Dunne C, Chaturvedi S, Ashktorab Z, Zacharia R, and Shneiderman B (2013), "Fitted rectangles and force-directed group-in-a-box layouts for clustered network visualization", In preparation.Dunne C and Shneiderman B (2009), "Improving graph drawing readability by incorporating readability metrics: A software tool for network analysts". University of Maryland. Human-Computer Interaction Lab Tech Report No. (HCIL-2009-13).
Aggregate topology to reduce stored information – combine functional equivalent nodesHard to tell underlying structureDifficult to understand summarization processCan’t see attributesNavlakha, S.; Rastogi, R. & Shrivastava, N.Graph summarization with bounded errorSIGMOD '08: Proc. 2008 ACM SIGMOD international conference on Management of data, ACM, 2008, 419-432.DOI:10.1145/1376616.1376661
ArcFix left side, move clockwiseFixed radiusShapeSizeRetain attribute encodings (head)Unique colorAttribute color
ShapeSizeSame colorMeta-edge size & color (unbalanced)
Lossless transformationsDirect manipulationVisual & textual cues
Ben Nelson (NE) in main DChuck Hagel (NE) in main R
Ben Nelson (NE) in main DChuck Hagel (NE) wildcard: Hard right. Less connections with moderates.
Ben Nelson (NE) blue dog: Elected under moderate platform. Closest to potential moderate like Snowe, Lieberman, CollinsChuck Hagel (NE) hard right: Against no child left behind, the rest of the party lined up for. Against Bush prescription drug (medicare) act.
Published in NodeXL bookThink straightforwardSize unclearHidden relationships
Based on Lee et al. 2006 taxonomy:Node count: About how many nodes are in the network?Articulation point: Which individual node would we remove to disconnect the most nodes from the main network?Largest motif & size: Which is the largest ( fan | connector | clique ) motif and how many nodes does it contain?Labels: Which node has the label “XXX”?Shortest path: What is the length of the shortest path between the two highlighted nodes?Neighbors: Which of the two highlighted nodes has more neighbors?Common Neighbors: How many common neighbors are shared by the two highlighted nodes?Common Neighbors: Which of these two pairs of nodes has more common neighbors?
Clustered with CNM
http://www.boardgamegeek.com/image/1466865/risk
"never get involved in a land war in Asia”
Based on Lee et al. 2006 taxonomy:Node count: About how many nodes are in the network?Articulation point: Which individual node would we remove to disconnect the most nodes from the main network?Largest motif & size: Which is the largest ( fan | connector | clique ) motif and how many nodes does it contain?Labels: Which node has the label “XXX”?Shortest path: What is the length of the shortest path between the two highlighted nodes?Neighbors: Which of the two highlighted nodes has more neighbors?Common Neighbors: How many common neighbors are shared by the two highlighted nodes?Common Neighbors: Which of these two pairs of nodes has more common neighbors?