SlideShare uma empresa Scribd logo
1 de 70
Baixar para ler offline
STING: Spatio-Temporal Interaction
Networks and Graphs for Intel Platforms
David Bader, Jason Riedy,
David Ediger, Rob McColl,
Timothy G. Mattson∗
24 July 2012
Outline
Motivation
STING for streaming, graph-structured data on Intel-based
platforms
World-leading community detection
Community-building interactions
Plans and direction
Note: Without hard limits, academics will use at least all the time...
2 / 57
(Prefix)scale Data Analysis
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating algorithms
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
3 / 57
Graphs are pervasive
Graphs: things and relationships
• Different kinds of things, different kinds of relationships, but
graphs provide a framework for analyzing the relationships.
• New challenges for analysis: data sizes, heterogeneity,
uncertainty, data quality.
Astrophysics
Problem Outlier detection
Challenges Massive data
sets, temporal variation
Graph problems Matching,
clustering
Bioinformatics
Problem Identifying target
proteins
Challenges Data
heterogeneity, quality
Graph problems Centrality,
clustering
Social Informatics
Problem Emergent behavior,
information spread
Challenges New analysis,
data uncertainty
Graph problems Clustering,
flows, shortest paths
4 / 57
No shortage of data...
Existing (some out-of-date) data volumes
NYSE 1.5 TB generated daily, maintain a 8 PB archive
Google “Several dozen” 1PB data sets (CACM, Jan 2010)
Wal-Mart 536 TB, 1B entries daily (2006)
EBay 2 PB, traditional DB, and 6.5PB streaming, 17
trillion records, 1.5B records/day, each web click is
50-150 details.
http://www.dbms2.com/2009/04/30/
ebays-two-enormous-data-warehouses/
Facebook 845 M users... and growing(?)
• All data is rich and semantic (graphs!) and changing.
• Base data rates include items and not relationships.
5 / 57
General approaches
• High-performance static graph analysis
• Develop techniques that apply to unchanging massive
graphs.
• Provides useful after-the-fact information, starting points.
• Serves many existing applications well: market research,
much bioinformatics, ...
• High-performance streaming graph analysis
• Focus on the dynamic changes within massive graphs.
• Find trends or new information as they appear.
• Serves upcoming applications: fault or threat detection,
trend analysis, ...
Both very important to different areas.
GT’s primary focus is on streaming.
Note: Not CS theory streaming, but analysis of streaming data.
6 / 57
Why analyze data streams?
Data volumes
NYSE 1.5TB daily
LHC 41TB daily
Facebook, etc. Who
knows?
Data transfer
• 1 Gb Ethernet: 8.7TB daily at
100%, 5-6TB daily realistic
• Multi-TB storage on 10GE: 300TB
daily read, 90TB daily write
• CPU ↔ Memory: QPI,HT:
2PB/day@100%
Data growth
• Facebook: > 2×/yr
• Twitter: > 10×/yr
• Growing sources:
Bioinformatics,
µsensors, security
Speed growth
• Ethernet/IB/etc.: 4× in next 2
years. Maybe.
• Flash storage, direct: 10× write,
4× read. Relatively huge cost.
7 / 57
Intel’s non-numeric computing
program (our perspective)
Supporting analysis of massive graph under rapid change
across the spectrum of Intel-based platforms.
STING on Intel-based platforms
• Maintain a graph and metrics under large data changes
• Performance, documentation, distribution
• Greatly improved ingestion rate, improved example metrics
• Available at http://www.cc.gatech.edu/stinger/.
• Algorithm development
• World-leading community detection (clustering) performance
on Intel-based platforms
• Good for focused or multi-level analysis, visualization, ...
• Developments on community monitoring in dynamic graphs
8 / 57
On to STING...
Motivation
STING for streaming, graph-structured data on Intel-based
platforms
Assumptions about the data
High-level architecture
Algorithms and performance
World-leading community detection
Community-building interactions
Plans and direction
9 / 57
Overall streaming approach
Yifan Hu’s (AT&T) visualization of the
Livejournal data set
Jason’s network via LinkedIn Labs
Assumptions
• A graph represents some real-world phenomenon.
• But not necessarily exactly!
• Noise comes from lost updates, partial information, ...
10 / 57
Overall streaming approach
Yifan Hu’s (AT&T) visualization of the
Livejournal data set
Jason’s network via LinkedIn Labs
Assumptions
• We target massive, “social network” graphs.
• Small diameter, power-law degrees
• Small changes in massive graphs often are unrelated.
10 / 57
Overall streaming approach
Yifan Hu’s (AT&T) visualization of the
Livejournal data set
Jason’s network via LinkedIn Labs
Assumptions
• The graph changes but we don’t need a continuous view.
• We can accumulate changes into batches...
• But not so many that it impedes responsiveness.
10 / 57
STING’s focus
Source data
predictionaction
summary
Control
VizSimulation / query
• STING manages queries against changing graph data.
• Visualization and control often are application specific.
• Ideal: Maintain many persistent graph analysis kernels.
• One current snapshot of the graph, kernels keep smaller
histories.
• Also (a harder goal), coordinate the kernels’ cooperation.
• STING components:
• Framework for batching input changes.
• Lockless data structure, STINGER, accumulating a typed
graph with timestamps.
11 / 57
STINGER
STING Extensible Representation:
• Rule #1: No explicit locking.
• Rely on atomic operations.
• Massive graph: Scattered updates, scattered reads rarely
conflict.
• Use time stamps for some view of time.
12 / 57
STING’s year
Over the past year
• Improved performance on Intel-based platforms by
14×[HPEC12]
• Evaluated multiple dynamic kernels across
platforms[MTAAP12, ICASSP2012]
• Documentation and tutorials
• Assist projects with STING adoption on Intel-based
platforms
13 / 57
STING in use
Where is it?
http://www.cc.gatech.edu/stinger/
Code, development, documentation, presentations...
Who is using it?
• Center for Adaptive Supercomputing Software –
Multithreaded Architectures (CASS-MT) directed by PNNL
• PRODIGAL team for DARPA ADAMS (Anomaly Detection at
Multiple Scales) including GTRI, SAIC, Oregon State U., U.
Mass., and CMU
• DARPA SMISC (Social Media in Strategic Communication)
including GTRI
• Well-attended tutorials given at PPoPP, in MD, and locally.
14 / 57
Experimental setup
Unless otherwise noted
Line Model Speed (GHz) Sockets Cores
Nehalem X5570 2.93 2 4
Westmere X5650 2.66 2 6
Westmere E7-8870 2.40 4 10
• Large Westmere, mirasol, loaned by Intel (thank you!)
• All memory: 1067MHz DDR3, installed appropriately
• Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.2 kernel
• Artificial graph and edge stream generated by R-MAT
[Chakrabarti, Zhan, & Faloutsos].
• Scale x, edge factor f ⇒ 2x
vertices, ≈ f · 2x
edges.
• Edge actions: 7/8th insertions, 1/8th deletions
• Results over five batches of edge actions.
• Caveat: No vector instructions, low-level optimizations yet.
• Portable: OpenMP, Cray XMT family
15 / 57
STINGER insertion / removal rates
Edges per second
Number of threads
AggregateSTINGERupdatespersecond
104
104.5
105
105.5
106
106.5
107
E7−8870
qqq
qqq
qqq
qqq
qqq qqq qqq
qqq
qqq qqq qqq qqq qqq
qqq
qqq
qqq qqq
qqq
qqq qqq qqq
qqq
qq
q qqq
qqq qqq qqqqqq
qqq
qqq
qqq
qqq
qqq qqq qq
qqqq
qqq qqq
q
qq qqq qqq
qqq
qqq
qqq qqq
qqq
qqq
qqq qq
qqqq qqq qqq qqq qqq qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq qqq qqq
qqq
q
q
q
qqq
qqq
qqq
qqq
qqq
qqq
qqq
q
q
q
qqq
qqq
q
qq qqq
q
q
q
q
qq
qqq
q
qq
qqq
q
qqqqq
qq
q
qqq
qqq
qq
q
q
q
q qqq qqq q
q
q
qqq
qqq
qqq
q
qq
qq
q
q
q
q
q
qq
qqq
qqq
qqq
qq
q
qqq q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qqq
qq
q
qq
q
qqq
qqq qqq
q
q
q
q
qq q
q
q q
q
q
q
q
q
q
q
q
q
qq
qqq
q
q
q q
q
q
qqq
qqq
qq
q qqq
qqq
qq
q
q
qq q
q
q
q
q
q
qqq
qqq
20 40 60 80
X5570
q
q
q
q
q
qqq
q
qq
q
q
q
q
qq
q
qq
qq
qq
q
q
qqq
q
q q
qq
q
q
qqq
q
qqq
q
qq
qq
q
qqqq
q
q
qqq
q
q
qq q
qq
q
q
q
q
qq
qq
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
16 / 57
STINGER insertion / removal rates
Seconds per batch, “latency”
Number of threads
Secondsperbatchprocessed
10−3
10−2
10−1
100
E7−8870
qqq
qqq
qqq
qqq
qqq qqq qqq
qqq
qqq qqq qqq qqq qqq
qqq
qqq
qqq qqq
qqq
qqq qqq qqq
qqq
qq
q qqq
qqq qqq qqqqqq
qqq
qqq
qqq
qqq
qqq qqq qqqqqq qqq qqq
qqq qqq qqq
qqq
qqq
qqq qqq
qqq
qqq qqq qq
qqqq qqq qqq qqq qqq qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq qqq
qqq
qqq qqq qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
q
qq qqq q
q
q
q
qq
qqq
q
qq
qqq
q
qqqqq
qqq
qqq
qqq
qqq
q
qq qqq qqq qq
q
qqq
qqqqqq
q
qq
qqq
q
q
q
q
qq
qqq
qqq
qqq
qqq
qqq q
qq q
q
q
q
qq
q
qq
q
q
q
qqq
qq
q
qqq
qqq
qqq qqq
q
q
q
q
qq q
qq q
qq qq
q
q
q
q
q
qq
qqq
qq
q q
q
q
qqq
qqq
qq
q qqq
qqq
qq
q
q
qq q
q
q
q
q
q
qqq
qqq
20 40 60 80
X5570
q
q
q
q
q
qqq
q
qq
q
q
q
q
qq
q
qq
qq
qq
q
q
qqq
q
q q
qq
q
q
qqq
q
qqq
qqq
qq
q
qqqq
q
q
qqq
q
q
qq q
qq
q
q
q
q
qq
qq
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
16 / 57
Clustering coefficients
• Used to measure “small-world-ness”
[Watts & Strogatz] and potential
community structure
• Larger clustering coefficient ⇒ more
inter-connected
• Roughly the ratio of the number of actual
to potential triangles
v
i
j
m
n
• Defined in terms of triplets.
• i – v – j is a closed triplet (triangle).
• m – v – n is an open triplet.
• Clustering coefficient:
# of closed triplets / total # of triplets
• Locally around v or globally for entire graph.
17 / 57
STINGER clustering coefficient rates
Edges per second
Number of threads
Clusteringcoefficientupdatespersecond
103
103.5
104
104.5
105
105.5
E7−8870
qq q
q
q q q
q
q q q q qq
q
q q
q
q q q
q
q q q
q
q
q
q
q
q
q
q
q q
q
q q
q
q q
qq
q q
q
q q q
q
q q q q q
q
q
q
q
q
q q q
q
q
q q q
qq
q q q
q
q q q
q
q q q q
q
q
q
q
q
q
q
q q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q q
q
q
q q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
20 40 60 80
X5570
qq
q
qqq
q
q
q
q
q
q
qqq
qq
q
qqq
qqq
qq
q
q
qq
qq
q
q
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
qq
qq
qqq
q
q
q
qqq
q
q
q
q
q
q q
qq
q
q
q
qqq
qq
q
qqq
qqq
qqq
q
q
q
qq
q
qqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
qqq
qqq
q
q
q
qq
q
q
q
q
q
qq
qq
q
qqq
qqq
qq
q
q
qq
qqq
q
q
q
qq
q
qqqq
q
q
q
q
q
qqq
q
qq
q
q
q
qq
q
qqqqqq qqq
qqq
qqq qqqq
qq
qqqqqq
qqq
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
18 / 57
STINGER clustering coefficient rates
Speed-up over static recalculation
Number of threads
Clusteringcoefficientspeedupoverrecalculation
10−1
100
101
102
E7−8870
q
q
q
q
q
q q
q
q q q q q
q
q
q
q
q
q
q q
q
q q q
q
q
q
q
q
q
q
q
q q
q
q q
q q q
q
q
q
q
q
q
q
q
q
q q q q q
q
q
q
q
q
q
q
q
q
q q q q q
q
q
q
q
q
q
q q
q
q q q q
q
q
q
q
q
q
q q q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q
q
q
q q
q
q q q q
q
q
q
q
q
q
q
q
q
q
q
q q q q
q
20 40 60 80
X5570
qq
q
qqq
q
q
q
qqq
qqq
qqq
qq
q
qqq
qq
q
q
qq
qq
q
qqq
qqq qq
q
q
qq
qqq
q
q
q
q
q
q
qqq
qqq
qqq
qqq
q
q
q
q
qq
q
q
q
q
qq
qqq
q
qq
qq
q
qqq
q
q
q
qq
q
qqq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
qq
q
qqq
q
qq
qq
q
qqq
qqq
qq
q
q
qq
qqq
q
q
q
qq
q
qqq
qq
q
qqq
qqq
qqq
q
q
q
qq
q
qqq
qq
q
qq
q
qqq
qqq
qqq
qqq
qqq
q
qq
q
qq
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
18 / 57
Connected components
• Maintain a mapping from vertex to
component.
• Global property, unlike triangle
counts
• In “scale free” social networks:
• Often one big component, and
• many tiny ones.
• Edge changes often sit within
components.
• Remaining insertions merge
components.
• Deletions are more difficult...
19 / 57
Connected components: Deleted
edges
The difficult case
• Very few deletions matter.
• Maintain a spanning tree, &
ignore deletion if
1 not in spanning tree,
2 endpoints share a
common neighbor∗
,
3 loose endpoint reaches
root∗
.
• In the last two (∗), also fix
the spanning tree.
• Rules out 99.7% of
deletions.
• (Long history, see related
work in [MTAAP11].)
20 / 57
STINGER component monitoring
rates
Edges per second
Number of threads
Connectedcomponentupdatespersecond
102
102.5
103
103.5
104
104.5
105
105.5
E7−8870
q
qq
qqq
q
q
q
qqq
qq
q q
qq
q
q
q
qqq
qq
q qq
q qqq qqq qqq
q
qq
qqq
qqq q
q
qqqq q
qq
q
qq q
qqqqq q
q
q
qqq qqq qqq
qqq
q
q
q q
q
q
qqq
q
qq
qqq
q
qq
q
q
q qqq
q
q
q
qqq qq
q q
q
q qqq
qq
q
qq
q
qqq
q
q
q
q
q
q
qqq
q
q
q
q
q
q qq
qqqq qqq qqq
qqq qqq
qq
q
q
qq qqq
qqq
qq
q
qqq
q
q
q qq
q qqq
qqq
qqq q
qq qq
q
qq
q q
q
q
qqq
qqq
q
qq q
q
q
qqq
qq
q q
qq q
qq
qq
q
q
q
q q
qq qq
q qq
q qqq
qqq
qqq
qqq
qqq
qqq
q
q
q qqq
q
qq
qqq
q
q
q qqq qqq qqq
q
q
q
qqq
qqq
qqq qq
q
qqq
q
q
q q
qq qqq
qqq
qqq q
q
q qq
q
q
q
q
q
q
q
qqq
q
qq
qqq
q
q
q
qqq
q
qq q
q
q
qqq
qqq
qqq qqq q
qq
q
q
q q
q
q
q
qq
qqq
qq
q
qqq
qqq
qqq qqq
qqq
qqq
qqq qqq qqq
qq
q
q
q
q
qqq
20 40 60 80
X5570
q
qq
q
q
q
qq
q
q
q
q
q
qq
qqq
qqq
qq
qqq
qq
qq
q
q qqq
qq
q
qq
q q
q
q
qqq
q
qq
q
q
q
qqqqq
q
qq
q
qq
q
q
q
q
q
q
qq
q
qqq
q
qq q
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
qqq
qq
q
q
q
qq
q q
q
q
qqq
qqq
qqq
qq
q
qqq
qq
q
q
q
qqq
qqq
qqq
q
qq
qq
q
q
q
qqq
qqq
qqq
qqq
q
qq
q
qq
qq
qq
q
qqq
qqq
qqq
qqq
qq
qqq
q
qq
qqq
qqq
q
q
q
qqq
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
21 / 57
STINGER component monitoring
rates
Seconds per batch, “latency”
Number of threads
Secondsperbatchprocessed
10−1.5
10−1
10−0.5
100
100.5
101
E7−8870
q
qq
qqq
q
q
q
qqq
qq
q q
qq
q
q
q
qqq
qq
q
qq
q qqq qqq
qqq
q
qq
qqq
qqq q
q
qqq
q q
qq
q
qq q
qqqqq
q
q
q
qqq qqq qqq
qqq
q
q
q
q
q
q
qqq
q
qq
qqq
q
qq
q
q
q qqq
q
q
q
qqq qq
q q
q
q qq
q
qq
q
qq
qqqq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
qq qqq qq
q
qqq qqq
qq
q
q
qq
qqq
qqq
qq
q
qqq
q
q
q qq
q qqq
qqq
qqq
q
qq q
q
q
qq
q q
q
q
qqq
qqq
q
qq q
q
q
qqq
qq
q q
q
q q
qq
qq
q
q
q
q q
qq qq
q
qq
q qqq
qqq
qqq
qqq
qqq
qqq
q
q
q
qqq
q
qq
qqq
q
q
q qqq qqq qqq
q
q
q
qqq
qqq
q
qq qq
q
qqq
q
q
q q
qq qqq
qqq
qqq q
q
q q
q
q
q
q
q
q
q
q
qqq
q
qq
qqq
q
q
q
q
qq
q
qq q
q
q
qqq
qqq
qqq q
qq q
qq
q
q
q q
q
q
q
q
q
qqq
qq
q
qqq
qqq
qqq q
qq
qqq
qqq
qqq q
qq q
qq
qq
q
q
q
q
qqq
20 40 60 80
X5570
q
qq
q
q
q
qq
q
q
q
q
q
qq
qqq
qq
q
qq
q
qq
qq
q
q
q
q qqq
q
q
q
qq
q
q
q
q
qqq
q
qq
q
q
q
qq
qqq
q
qq
qqq
q
q
q
q
q
q
qq
q
qqq
q
qq
q
q
q
q
qq
q
q
q
q
q
qq
q
q
qq
qqq
qq
q
q
q
qq
q
q
q
q
qqq
qqq
qqq
qq
q
qqq
qq
q
q
q
qqq
qqq
qqq
q
qq
qq
q
q
q
qqq
qqq
qqq
qqq
q
qq
q
qq
qq
qq
q
qqq
qqq
qqq
qqq
qq
qqq
q
qq
qqq
q
q
q
q
q
q
qqq
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
21 / 57
STINGER component monitoring
rates
Speed-up over static recalculation
Number of threads
Connectedcomponentsspeedupoverrecalculation
10−1
100
101
102
E7−8870
q
qq
qqq
q
q
q
qqq
qq
q q
qq qq
q
qqq
qq
q qqq qqq q
qq q
q
q
qqq
qqq
qqq
q
q
q
qqq
q
qq
q
qq q
qq
qqq
q
qq qqq qqq qqq qqq
q
q
q
q
q
q
qqq
q
qq
qqq
q
qq
q
q
q qqq
q
q
q
qqq qq
q
qqq
qq
q
qq
q
qq
q
qqq
q
q
q
q
q
q
qqq
q
q
q
q
q
q qqq
qqq
qqq qqq q
q
q qqq q
qq
q
qq
qqq
qqq
qq
q
qqq
q
q
q qq
q qqq
qqq
qqq
q
qq q
q
q qqq
q
q
q
qqq
qqq
qqq
qq
q
qqq
qq
q
q
qq q
qq
qqq
q
q
q q
qq qqq qqq
q
q
q
qqq
qqq
qqq
qqq
qqq
q
qq qqq
q
qq
qqq
q
q
q qqq qqq qqq
qq
q
qqq
qqq
qqq
qqq
qqq
q
q
q
q
qq qqq
qqq
qqq q
q
q qqq q
qq
q
q
q
qqq qqq
qqq
q
q
q
qqq
q
qq q
qq
qqq
qqq
qqq qqq qqq q
q
q
q
q
q
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq qqq q
qq
q
q
q
qq
qqqq
20 40 60 80
X5570
q
qq
q
q
q
qq
q
qq
q
q
qq
qq
q
qqq
qq
qqq qq
qq
q
q qqq
qq
q
qqq
q
q
q
qqq
q
qq
qqq
qqqqq
q
qq
q
qq
q
qq
q
q
q
qq
q
qqqq
qq
q
q
q
q
qq
q
q
q
q
q
qq
q
qqq
qqq
qq
q
qq
qq
q q
q
q
qqq
qq
q
qqq
qqq
qqq
qq
q
q
q
qqq
qqq
qqq
qqq
qq
q
q
q q
qq
q
q
q
qqq
qq
q
qqq
qqq
qq
q
q
q
qqq
qqq
qqq
qqq
q
q
qq
q
q
q
q
qqq
qqq
q
qq
qqq
5 10 15 20
Batch size
q 3000000
q 1000000
q 300000
q 100000
q 30000
q 10000
q 3000
q 1000
q 300
q 100
21 / 57
Community detection
Motivation
STING for streaming, graph-structured data on Intel-based
platforms
World-leading community detection
Defining community detection
Our parallel, agglomerative algorithm
Performance
Community-building interactions
Plans and direction
22 / 57
Community detection
What do we mean?
• Partition a graph’s
vertices into disjoint
communities.
• A community locally
maximizes some metric.
• Modularity,
conductance, ...
• Trying to capture that
vertices are more
similar within one
community than
between communities.
Jason’s network via LinkedIn Labs
23 / 57
Community detection
Assumptions
• Disjoint partitioning of
vertices.
• There is no one unique
answer.
• Many metrics:
NP-complete to opti-
mize [Brandes, et al.].
• Graph is lossy
representation.
• Want an adaptable
detection method.
• For rest, assume
modularity [Newman].
• Important: Edge local
scoring.
Jason’s network via LinkedIn Labs
23 / 57
Multi-threaded algorithm design
points
Targeting massive graphs
Multi-threading over shared memory, up to 2 GiB on Intel-based
platforms.
A scalable multi-threaded graph analysis algorithm
• ... avoids global locks and frequent global synchronization.
• ... distributes computation over edges rather than only
vertices.
• ... works with data as local to an edge as possible.
• ... uses compact data structures that agglomerate memory
references.
24 / 57
Sequential agglomerative method
A
B
C
D
E
F
G
• A common method (e.g.
[Clauset, et al.]) agglomerates
vertices into communities.
• Each vertex begins in its own
community.
• An edge is chosen to contract.
• Merging maximally increases
modularity.
• Priority queue.
• Known often to fall into an O(n2)
performance trap with modularity
[Wakita and Tsurumi].
25 / 57
Sequential agglomerative method
A
B
C
D
E
F
G
C
B
• A common method (e.g.
[Clauset, et al.]) agglomerates
vertices into communities.
• Each vertex begins in its own
community.
• An edge is chosen to contract.
• Merging maximally increases
modularity.
• Priority queue.
• Known often to fall into an O(n2)
performance trap with modularity
[Wakita and Tsurumi].
25 / 57
Sequential agglomerative method
A
B
C
D
E
F
G
C
B
D
A
• A common method (e.g.
[Clauset, et al.]) agglomerates
vertices into communities.
• Each vertex begins in its own
community.
• An edge is chosen to contract.
• Merging maximally increases
modularity.
• Priority queue.
• Known often to fall into an O(n2)
performance trap with modularity
[Wakita and Tsurumi].
25 / 57
Sequential agglomerative method
A
B
C
D
E
F
G
C
B
D
A
B
C
• A common method (e.g.
[Clauset, et al.]) agglomerates
vertices into communities.
• Each vertex begins in its own
community.
• An edge is chosen to contract.
• Merging maximally increases
modularity.
• Priority queue.
• Known often to fall into an O(n2)
performance trap with modularity
[Wakita and Tsurumi].
25 / 57
Parallel agglomerative method
A
B
C
D
E
F
G
• We use a matching to avoid the
queue.
• Compute a heavy weight, large
matching.
• Simple greedy algorithm.
• Maximal matching.
• Within factor of 2 in weight.
• Merge all matched communities at
once.
• Maintains some balance.
• Produces different results.
• Agnostic to weighting, matching...
• Can maximize modularity, minimize
conductance.
• Modifying matching permits easy
exploration.
26 / 57
Parallel agglomerative method
A
B
C
D
E
F
G
C
D
G
• We use a matching to avoid the
queue.
• Compute a heavy weight, large
matching.
• Simple greedy algorithm.
• Maximal matching.
• Within factor of 2 in weight.
• Merge all matched communities at
once.
• Maintains some balance.
• Produces different results.
• Agnostic to weighting, matching...
• Can maximize modularity, minimize
conductance.
• Modifying matching permits easy
exploration.
26 / 57
Parallel agglomerative method
A
B
C
D
E
F
G
C
D
G
E
B
C
• We use a matching to avoid the
queue.
• Compute a heavy weight, large
matching.
• Simple greedy algorithm.
• Maximal matching.
• Within factor of 2 in weight.
• Merge all matched communities at
once.
• Maintains some balance.
• Produces different results.
• Agnostic to weighting, matching...
• Can maximize modularity, minimize
conductance.
• Modifying matching permits easy
exploration.
26 / 57
Implementation: Data structures
Extremely basic for graph G = (V, E)
• An array of (i, j; w) weighted edge pairs, each i, j stored
only once and packed, uses 3|E| space
• An array to store self-edges, d(i) = w, |V|
• A temporary floating-point array for scores, |E|
• A additional temporary arrays using 4|V| + 2|E| to store
degrees, matching choices, offsets...
• Weights count number of agglomerated vertices or edges.
• Scoring methods (modularity, conductance) need only
vertex-local counts.
• Storing an undirected graph in a symmetric manner reduces
memory usage drastically and works with our simple
matcher.
27 / 57
Implementation: Data structures
Extremely basic for graph G = (V, E)
• An array of (i, j; w) weighted edge pairs, each i, j stored
only once and packed, uses 3|E| space
• An array to store self-edges, d(i) = w, |V|
• A temporary floating-point array for scores, |E|
• A additional temporary arrays using 4|V| + 2|E| to store
degrees, matching choices, offsets...
• Original ignored order in edge array, killed OpenMP.
• Roughly bucket edge array by first stored index.
Non-adjacent CSR-like structure [MTAAP12].
• Hash i, j to determine order. Scatter among buckets
[MTAAP12].
27 / 57
Implementation: Routines
Three primitives: Scoring, matching, contracting
Scoring Trivial.
Matching Repeat until no ready, unmatched vertex:
1 For each unmatched vertex in parallel, find the
best unmatched neighbor in its bucket.
2 Try to point remote match at that edge (lock,
check if best, unlock).
3 If pointing succeeded, try to point self-match at
that edge.
4 If both succeeded, yeah! If not and there was
some eligible neighbor, re-add self to ready,
unmatched list.
(Possibly too simple, but...)
28 / 57
Implementation: Routines
Contracting
1 Map each i, j to new vertices, re-order by hashing.
2 Accumulate counts for new i bins, prefix-sum for offset.
3 Copy into new bins.
• Only synchronizing in the prefix-sum. That could be
removed if I don’t re-order the i , j pair; haven’t timed the
difference.
• Actually, the current code copies twice... On short list for
fixing.
• Binning as opposed to original list-chasing enabled
Intel/OpenMP support with reasonable performance.
29 / 57
Performance summary
Two moderate-sized graphs, one large
Graph |V| |E| Reference
rmat-24-16 15 580 378 262 482 711
[Chakrabarti, et al.]
[Bader, et al.]
soc-LiveJournal1 4 847 571 68 993 773 [Leskovec]
uk-2007-05 105 896 555 3 301 876 564 [Boldi, et al.]
Peak processing rates in edges/second
Platform rmat-24-16 soc-LiveJournal1 uk-2007-05
X5570 1.83 × 106 3.89 × 106
X5650 2.54 × 106 4.98 × 106
E7-8870 5.86 × 106 6.90 × 106 6.54 × 106
XMT 1.20 × 106 0.41 × 106
XMT2 2.11 × 106 1.73 × 106 3.11 × 106
30 / 57
Performance: Time to solution
OpenMP Threads
Time(s)
3
10
32
100
316
1000
3162
rmat−24−16
qqqq
q
q
q
qq
qqq
qqqqq
q
qqq
q
q
q
qqqqqq
1 2 4 8 16 32 64 128
soc−LiveJournal1
qqqqqq
qqqqqq
qqq
qqq
qqq
qqq
qq
qqqq
1 2 4 8 16 32 64 128
Platform
q X5570 (4−core) X5650 (6−core) E7−8870 (10−core)
31 / 57
Performance: Rate (edges/second)
OpenMP Threads
Edgespersecond
105.5
106
106.5
107
107.5
rmat−24−16
qqqq
q
q
q
qq
qq
q
qqqqq
q
qq
q
q
q
q
q
q
qqqq
1 2 4 8 16 32 64 128
soc−LiveJournal1
q
q
qq
qq
q
qq
q
qq
qq
q
q
qqqq
q
qqqq
q
qqq
q
1 2 4 8 16 32 64 128
Platform
q X5570 (4−core) X5650 (6−core) E7−8870 (10−core)
32 / 57
Performance: Modularity
step
Modularity
0.0
0.2
0.4
0.6
0.8
q
q
q
0 10 20 30
Graph
q coAuthorsCiteseer q eu−2005 q uk−2002
Termination metric
q Coverage Max Average
• Timing results: Stop when coverage ≥ 0.5 (communities cover 1/2
edges).
• More work ⇒ higher modularity. Choice up to application.
33 / 57
Performance: Large-scale time
Threads (OpenMP) / Processors (XMT)
Time(s)
210
212
214
q
q
q
q
q
q
q
q
q
q q q
q
q
q q
q q
q
q
q
q
q
q
504.9s
1063s
6917s
31568s
1 2 4 8 16 32 64
Platform
q E7−8870 (10−core) XMT2
34 / 57
Performance: Large-scale speedup
Threads (OpenMP) / Processors (XMT)
Speedupoversingleprocessor/thread
21
22
23
24
q
q
q
q
q
q
q
q
q q q
q
q
q q
q q
q
q
q
q
q
q
13.7x
29.6x
1 2 4 8 16 32 64
Platform
q E7−8870 (10−core) XMT2
35 / 57
Building a community around
streaming, graph-structured data
Motivation
STING for streaming, graph-structured data on Intel-based
platforms
World-leading community detection
Community-building interactions
External STING users∗
Graph500 (synergistic)
Conferences, DIMACS Challenge, etc.
Plans and direction
36 / 57
Graph500
Not part of this project, but...
To keep people apprised:
• Adding single-source shortest paths benchmark.
• Re-working the generator to be more scalable, vectorized,
etc.
• Addressing some generator parameter issues. Fewer
duplicate edges.
• Improving shared-memory reference implementation
performance...
• (Possibly adding an OpenCL reference implementation.)
mirasol is very useful for development and reference tuning.
37 / 57
10th
DIMACS Implementation
Challenge
Graph partitioning and clustering
• Georgia Tech assisted in organizing the 10th DIMACS
Implementation Challenge.
• Co-sponsored by Intel.
• Community detection implementation on mirasol won the
Mix Challenge combining speed and multiple clustering
criteria.
• One of only two implementations that tackled uk-2007-05
with 106 million vertices and 3.3 billion edges.
• Other implementation required multiple days across a
cluster.
• Next closest for performance used Intel’s TBB (Fagginger
Auer & Bisseling) or CUDA (small examples).
38 / 57
Conferences and minisymposiums
• GT hosted the bi-annual SIAM Parallel Processing
conference.
• Included a well-attended minisymposium organized by Riedy
& Meyerhenke on “Parallel Analysis of Massive Social
Networks,” including KDT work from UCSB.
• Submitting a massive graph minisymposium for SIAM CSE
2013 (25 Feb – 1 Mar)... Looking for speakers.
• List of presentations appended to slides, too many to
include.
• 10 conference presentations / invited talks / posters
• 3 tutorials
• 5 accepted (refereed) publications, 1 in submission
39 / 57
Plans and direction
Motivation
STING for streaming, graph-structured data on Intel-based
platforms
World-leading community detection
Community-building interactions
Plans and direction
Plans
Software
40 / 57
Plans
• Finish the split into a client/server model, including
subgraph visualization, looking into KDT interface.
• Integrate performance counters, build a performance model
/ monitor.
• Evaluate Intel features for bottlenecks (vectors, better NUMA
allocation, etc.)
• Incorporate community detection, seed set expansion.
41 / 57
STING on-line resources
Home page http://www.cc.gatech.edu/stinger/
Download .../downloads.php
Reference guide .../doxygen/
General tutorial (URL too long, linked from above)
The current release of STING includes the following static and
dynamic analysis kernel examples:
• multi-platform benchmarks for edge insertion and removal
(dynamic),
• breadth-first search (static),
• clustering and transitivity coefficients (static and dynamic),
and
• connected components (static and dynamic).
42 / 57
Community detection
Home page http://www.cc.gatech.edu/˜jriedy/
community-detection/
Git source repository http://www.cc.gatech.edu/
˜jriedy/gits/community-el.git
• “Experimental” and subject to extreme change.
• Current version packaged as a separate, easy to use /
import executable.
• Being adapted into STING.
43 / 57
Presentations and tutorials I
David A. Bader.
Opportunities and challenges in massive data-intensive
computing,.
In 9th International Conference on Parallel Processing and
Applied Mathematics (PPAM11), September 2011.
(invited keynote).
David A. Bader.
Opportunities and challenges in massive data-intensive
computing.
Workshop on Parallel Algorithms and Software for Analysis
of Massive Graphs (ParGraph), December 2011.
(invited keynote).
44 / 57
Presentations and tutorials II
Jason Riedy, David Ediger, David A. Bader, and Henning
Meyerhenke.
Tracking structure of streaming social networks.
2011 Graph Exploitation Symposium hosted by MIT Lincoln
Labs, August 2011.
(presentation).
David A. Bader.
Fourth Graph500 results.
Graph500 Birds of a Feather, 27th International
Supercomputing Conference (ISC), June 2012.
(presentation).
45 / 57
Presentations and tutorials III
David A. Bader.
Massive data analytics using heterogeneous computing.
In 21st International Heterogeneity in Computing Workshop
(HCW 2012), held in conjunction with The International
Parallel and Distributed Processing Symposium (IPDPS
2012), May 2012.
(invited keynote).
David A. Bader.
Opportunities and challenges in massive data-intensive
computing.
Distinguished Lecture, May 2012.
(invited presentation).
46 / 57
Presentations and tutorials IV
David A. Bader.
Opportunities and challenges in massive data-intensive
computing.
From Data to Knowledge: Machine-Learning with Real-time
and Streaming Applications, May 2012.
(invited presentation).
David A. Bader.
Opportunities and challenges in massive data-intensive
computing.
NSF Workshop on Research Directions in the Principles of
Parallel Computation, June 2012.
(invited presentation).
47 / 57
Presentations and tutorials V
David Ediger, Jason Riedy, Rob McColl, and David A. Bader.
Parallel programming for graph analysis.
In 17th ACM SIGPLAN Annual Symposium on Principles
and Practice of Parallel Programming (PPoPP), New
Orleans, LA, February 2012.
(tutorial).
Henning Meyerhenke, E. Jason Riedy, and David A. Bader.
Parallel community detection in streaming graphs.
SIAM Parallel Processing for Scientific Computing, February
2012.
(presentation).
48 / 57
Presentations and tutorials VI
E. Jason Riedy, David Ediger, Henning Meyerhenke, and
David A. Bader.
Sting: Software for analysis of spatio-temporal interaction
networks and graphs.
SIAM Parallel Processing for Scientific Computing, February
2012.
(poster).
E. Jason Riedy and Henning Meyerhenke.
Scalable algorithms for analysis of massive, streaming
graphs.
SIAM Parallel Processing for Scientific Computing, February
2012.
(presentation).
49 / 57
Presentations and tutorials VII
Anita Zakrzewska and Oded Green.
Stinger tutorial.
Presented to GTRI and SAIC software engineers, July 2012.
(tutorial).
50 / 57
Publications I
David Ediger, Rob McColl, Jason Riedy, and David A. Bader.
Stinger: High performance data structure for streaming
graphs.
In IEEE High Performance Extreme Computing Conference
(HPEC), Waltham, MA, September 2012.
(to appear).
Xin Liu, Pushkar Pande, Henning Meyerhenke, and David A.
Bader.
Pasqual: A parallel de novo assembler for next generation
genome sequencing.
IEEE Transactions on Parallel and Distributed Systems,
2012.
(to appear).
51 / 57
Publications II
E. Jason Riedy, David A. Bader, and Henning Meyerhenke.
Scalable multi-threaded community detection in social
networks.
In 6th Workshop on Multithreaded Architectures and
Applications (MTAAP), held in conjunction with The
International Parallel and Distributed Processing Symposium
(IPDPS 2012), May 2012.
(9/15 papers accepted, 60% acceptance).
E. Jason Riedy, Henning Meyerhenke, David Ediger, and
David A. Bader.
Parallel community detection for massive graphs.
In 10th DIMACS Implementation Challenge - Graph
Partitioning and Graph Clustering. (workshop paper),
Atlanta, Georgia, February 2012.
Won first place in the Mix Challenge and Mix Pareto
Challenge.
52 / 57
Publications III
Jason Riedy, Henning Meyerhenke, David A. Bader, David
Ediger, and Timothy G. Mattson.
Analysis of streaming social networks and graphs on
multicore architectures.
In IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP). Kyoto, Japan, March 2012.
53 / 57
References I
D. Bader, J. Gilbert, J. Kepner, D. Koester, E. Loh,
K. Madduri, W. Mann, and T. Meuse.
HPCS SSCA#2 Graph Analysis Benchmark Specifications
v1.1.
Jul. 2005.
D.A. Bader and J. McCloskey.
Modularity and graph algorithms.
Presented at UMBC, September 2009.
J.W. Berry., B. Hendrickson, R.A. LaViolette, and C.A.
Phillips.
Tolerating the community detection resolution limit with edge
weighting.
CoRR abs/0903.1072 (2009).
54 / 57
References II
P. Boldi, B. Codenotti, M. Santini, and S. Vigna.
Ubicrawler: A scalable fully distributed web crawler.
Software: Practice & Experience, vol. 34, no. 8, pp. 711–726,
2004.
U. Brandes, D. Delling, M. Gaertler, R. G¨orke, M. Hoefer,
Z. Nikoloski, and D. Wagner.
On modularity clustering.
In IEEE Trans. Knowledge and Data Engineering, vol. 20,
no. 2, pp. 172–188, 2008.
D. Chakrabarti, Y. Zhan, and C. Faloutsos.
R-MAT: A recursive model for graph mining.
In Proc. 4th SIAM Intl. Conf. on Data Mining (SDM), Orlando,
FL, Apr. 2004. SIAM.
55 / 57
References III
A. Clauset, M. Newman, and C. Moore.
Finding community structure in very large networks.
Physical Review E, vol. 70, no. 6, p. 66111, 2004.
D. Ediger, E. J. Riedy, and D. A. Bader.
Tracking structure of streaming social networks.
In Proceedings of the Workshop on Multithreaded
Architectures and Applications (MTAAP’11), May 2011.
J. Leskovec.
Stanford large network dataset collection.
At http://snap.stanford.edu/data/, Oct. 2011.
M. Newman.
Modularity and community structure in networks.
In Proc. of the National Academy of Sciences, vol. 103,
no. 23, pp. 8577–8582, 2006.
56 / 57
References IV
K. Wakita and T. Tsurumi.
Finding community structure in mega-scale social networks.
CoRR, vol. abs/cs/0702048, 2007.
D. J. Watts and S. H. Strogatz.
Collective dynamics of ‘small-world’ networks.
Nature, 393(6684):440–442, Jun 1998.
57 / 57

Mais conteúdo relacionado

Mais procurados

STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingJason Riedy
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
BioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology ExchangeBioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology ExchangeChris Dagdigian
 
نشست نظريه داده براي مديريت داده
نشست نظريه داده براي مديريت دادهنشست نظريه داده براي مديريت داده
نشست نظريه داده براي مديريت دادهHosseinieh Ershad Public Library
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsJason Riedy
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 

Mais procurados (6)

STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
BioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology ExchangeBioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology Exchange
 
نشست نظريه داده براي مديريت داده
نشست نظريه داده براي مديريت دادهنشست نظريه داده براي مديريت داده
نشست نظريه داده براي مديريت داده
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 

Semelhante a STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...Jason Riedy
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
Real-Time Analytics With StarRocks (DWH+DL).pdf
Real-Time Analytics With StarRocks (DWH+DL).pdfReal-Time Analytics With StarRocks (DWH+DL).pdf
Real-Time Analytics With StarRocks (DWH+DL).pdfAlbert Wong
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Dynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsDynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsPayamBarnaghi
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and GraphsSTING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and GraphsJason Riedy
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Masoud Nikravesh
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?Amazon Web Services
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 

Semelhante a STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms (20)

ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
ICASSP 2012: Analysis of Streaming Social Networks and Graphs on Multicore Ar...
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Real-Time Analytics With StarRocks (DWH+DL).pdf
Real-Time Analytics With StarRocks (DWH+DL).pdfReal-Time Analytics With StarRocks (DWH+DL).pdf
Real-Time Analytics With StarRocks (DWH+DL).pdf
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Dynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsDynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT Environments
 
S B Goyal
S B GoyalS B Goyal
S B Goyal
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
FC Brochure & Insert
FC Brochure & InsertFC Brochure & Insert
FC Brochure & Insert
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and GraphsSTING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012
 
SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...
SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...
SLAS Ultra-High-Throughput Screening Special Interest Group SLAS2017 Presenta...
 
WSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product OverviewWSO2 Machine Learner - Product Overview
WSO2 Machine Learner - Product Overview
 
Deeplearning in finance
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 

Mais de Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 

Mais de Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 

Último

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 

Último (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

  • 1. STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms David Bader, Jason Riedy, David Ediger, Rob McColl, Timothy G. Mattson∗ 24 July 2012
  • 2. Outline Motivation STING for streaming, graph-structured data on Intel-based platforms World-leading community detection Community-building interactions Plans and direction Note: Without hard limits, academics will use at least all the time... 2 / 57
  • 3. (Prefix)scale Data Analysis Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating algorithms Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes 3 / 57
  • 4. Graphs are pervasive Graphs: things and relationships • Different kinds of things, different kinds of relationships, but graphs provide a framework for analyzing the relationships. • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Astrophysics Problem Outlier detection Challenges Massive data sets, temporal variation Graph problems Matching, clustering Bioinformatics Problem Identifying target proteins Challenges Data heterogeneity, quality Graph problems Centrality, clustering Social Informatics Problem Emergent behavior, information spread Challenges New analysis, data uncertainty Graph problems Clustering, flows, shortest paths 4 / 57
  • 5. No shortage of data... Existing (some out-of-date) data volumes NYSE 1.5 TB generated daily, maintain a 8 PB archive Google “Several dozen” 1PB data sets (CACM, Jan 2010) Wal-Mart 536 TB, 1B entries daily (2006) EBay 2 PB, traditional DB, and 6.5PB streaming, 17 trillion records, 1.5B records/day, each web click is 50-150 details. http://www.dbms2.com/2009/04/30/ ebays-two-enormous-data-warehouses/ Facebook 845 M users... and growing(?) • All data is rich and semantic (graphs!) and changing. • Base data rates include items and not relationships. 5 / 57
  • 6. General approaches • High-performance static graph analysis • Develop techniques that apply to unchanging massive graphs. • Provides useful after-the-fact information, starting points. • Serves many existing applications well: market research, much bioinformatics, ... • High-performance streaming graph analysis • Focus on the dynamic changes within massive graphs. • Find trends or new information as they appear. • Serves upcoming applications: fault or threat detection, trend analysis, ... Both very important to different areas. GT’s primary focus is on streaming. Note: Not CS theory streaming, but analysis of streaming data. 6 / 57
  • 7. Why analyze data streams? Data volumes NYSE 1.5TB daily LHC 41TB daily Facebook, etc. Who knows? Data transfer • 1 Gb Ethernet: 8.7TB daily at 100%, 5-6TB daily realistic • Multi-TB storage on 10GE: 300TB daily read, 90TB daily write • CPU ↔ Memory: QPI,HT: 2PB/day@100% Data growth • Facebook: > 2×/yr • Twitter: > 10×/yr • Growing sources: Bioinformatics, µsensors, security Speed growth • Ethernet/IB/etc.: 4× in next 2 years. Maybe. • Flash storage, direct: 10× write, 4× read. Relatively huge cost. 7 / 57
  • 8. Intel’s non-numeric computing program (our perspective) Supporting analysis of massive graph under rapid change across the spectrum of Intel-based platforms. STING on Intel-based platforms • Maintain a graph and metrics under large data changes • Performance, documentation, distribution • Greatly improved ingestion rate, improved example metrics • Available at http://www.cc.gatech.edu/stinger/. • Algorithm development • World-leading community detection (clustering) performance on Intel-based platforms • Good for focused or multi-level analysis, visualization, ... • Developments on community monitoring in dynamic graphs 8 / 57
  • 9. On to STING... Motivation STING for streaming, graph-structured data on Intel-based platforms Assumptions about the data High-level architecture Algorithms and performance World-leading community detection Community-building interactions Plans and direction 9 / 57
  • 10. Overall streaming approach Yifan Hu’s (AT&T) visualization of the Livejournal data set Jason’s network via LinkedIn Labs Assumptions • A graph represents some real-world phenomenon. • But not necessarily exactly! • Noise comes from lost updates, partial information, ... 10 / 57
  • 11. Overall streaming approach Yifan Hu’s (AT&T) visualization of the Livejournal data set Jason’s network via LinkedIn Labs Assumptions • We target massive, “social network” graphs. • Small diameter, power-law degrees • Small changes in massive graphs often are unrelated. 10 / 57
  • 12. Overall streaming approach Yifan Hu’s (AT&T) visualization of the Livejournal data set Jason’s network via LinkedIn Labs Assumptions • The graph changes but we don’t need a continuous view. • We can accumulate changes into batches... • But not so many that it impedes responsiveness. 10 / 57
  • 13. STING’s focus Source data predictionaction summary Control VizSimulation / query • STING manages queries against changing graph data. • Visualization and control often are application specific. • Ideal: Maintain many persistent graph analysis kernels. • One current snapshot of the graph, kernels keep smaller histories. • Also (a harder goal), coordinate the kernels’ cooperation. • STING components: • Framework for batching input changes. • Lockless data structure, STINGER, accumulating a typed graph with timestamps. 11 / 57
  • 14. STINGER STING Extensible Representation: • Rule #1: No explicit locking. • Rely on atomic operations. • Massive graph: Scattered updates, scattered reads rarely conflict. • Use time stamps for some view of time. 12 / 57
  • 15. STING’s year Over the past year • Improved performance on Intel-based platforms by 14×[HPEC12] • Evaluated multiple dynamic kernels across platforms[MTAAP12, ICASSP2012] • Documentation and tutorials • Assist projects with STING adoption on Intel-based platforms 13 / 57
  • 16. STING in use Where is it? http://www.cc.gatech.edu/stinger/ Code, development, documentation, presentations... Who is using it? • Center for Adaptive Supercomputing Software – Multithreaded Architectures (CASS-MT) directed by PNNL • PRODIGAL team for DARPA ADAMS (Anomaly Detection at Multiple Scales) including GTRI, SAIC, Oregon State U., U. Mass., and CMU • DARPA SMISC (Social Media in Strategic Communication) including GTRI • Well-attended tutorials given at PPoPP, in MD, and locally. 14 / 57
  • 17. Experimental setup Unless otherwise noted Line Model Speed (GHz) Sockets Cores Nehalem X5570 2.93 2 4 Westmere X5650 2.66 2 6 Westmere E7-8870 2.40 4 10 • Large Westmere, mirasol, loaned by Intel (thank you!) • All memory: 1067MHz DDR3, installed appropriately • Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.2 kernel • Artificial graph and edge stream generated by R-MAT [Chakrabarti, Zhan, & Faloutsos]. • Scale x, edge factor f ⇒ 2x vertices, ≈ f · 2x edges. • Edge actions: 7/8th insertions, 1/8th deletions • Results over five batches of edge actions. • Caveat: No vector instructions, low-level optimizations yet. • Portable: OpenMP, Cray XMT family 15 / 57
  • 18. STINGER insertion / removal rates Edges per second Number of threads AggregateSTINGERupdatespersecond 104 104.5 105 105.5 106 106.5 107 E7−8870 qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qq q qqq qqq qqq qqqqqq qqq qqq qqq qqq qqq qqq qq qqqq qqq qqq q qq qqq qqq qqq qqq qqq qqq qqq qqq qqq qq qqqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq q q q qqq qqq qqq qqq qqq qqq qqq q q q qqq qqq q qq qqq q q q q qq qqq q qq qqq q qqqqq qq q qqq qqq qq q q q q qqq qqq q q q qqq qqq qqq q qq qq q q q q q qq qqq qqq qqq qq q qqq q qq q q q q q q q q q q q q qqq qq q qq q qqq qqq qqq q q q q qq q q q q q q q q q q q q q qq qqq q q q q q q qqq qqq qq q qqq qqq qq q q qq q q q q q q qqq qqq 20 40 60 80 X5570 q q q q q qqq q qq q q q q qq q qq qq qq q q qqq q q q qq q q qqq q qqq q qq qq q qqqq q q qqq q q qq q qq q q q q qq qq q q q q q qq q qq q q q q q q qq q q q q q qq q q 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 16 / 57
  • 19. STINGER insertion / removal rates Seconds per batch, “latency” Number of threads Secondsperbatchprocessed 10−3 10−2 10−1 100 E7−8870 qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qq q qqq qqq qqq qqqqqq qqq qqq qqq qqq qqq qqq qqqqqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qq qqqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq q qq qqq q q q q qq qqq q qq qqq q qqqqq qqq qqq qqq qqq q qq qqq qqq qq q qqq qqqqqq q qq qqq q q q q qq qqq qqq qqq qqq qqq q qq q q q q qq q qq q q q qqq qq q qqq qqq qqq qqq q q q q qq q qq q qq qq q q q q q qq qqq qq q q q q qqq qqq qq q qqq qqq qq q q qq q q q q q q qqq qqq 20 40 60 80 X5570 q q q q q qqq q qq q q q q qq q qq qq qq q q qqq q q q qq q q qqq q qqq qqq qq q qqqq q q qqq q q qq q qq q q q q qq qq q q q q q qq q qq q q q q q q q q q q q q q qq q q 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 16 / 57
  • 20. Clustering coefficients • Used to measure “small-world-ness” [Watts & Strogatz] and potential community structure • Larger clustering coefficient ⇒ more inter-connected • Roughly the ratio of the number of actual to potential triangles v i j m n • Defined in terms of triplets. • i – v – j is a closed triplet (triangle). • m – v – n is an open triplet. • Clustering coefficient: # of closed triplets / total # of triplets • Locally around v or globally for entire graph. 17 / 57
  • 21. STINGER clustering coefficient rates Edges per second Number of threads Clusteringcoefficientupdatespersecond 103 103.5 104 104.5 105 105.5 E7−8870 qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 20 40 60 80 X5570 qq q qqq q q q q q q qqq qq q qqq qqq qq q q qq qq q q qq q q q qq q q q qq q q q q qq qq qqq q q q qqq q q q q q q q qq q q q qqq qq q qqq qqq qqq q q q qq q qqq q q q q q q q qq q q q q q q qq q qqq qqq q q q qq q q q q q qq qq q qqq qqq qq q q qq qqq q q q qq q qqqq q q q q q qqq q qq q q q qq q qqqqqq qqq qqq qqq qqqq qq qqqqqq qqq 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 18 / 57
  • 22. STINGER clustering coefficient rates Speed-up over static recalculation Number of threads Clusteringcoefficientspeedupoverrecalculation 10−1 100 101 102 E7−8870 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 20 40 60 80 X5570 qq q qqq q q q qqq qqq qqq qq q qqq qq q q qq qq q qqq qqq qq q q qq qqq q q q q q q qqq qqq qqq qqq q q q q qq q q q q qq qqq q qq qq q qqq q q q qq q qqq q q q q q q qqq q q q q q q q q q qqq q q q qq q qq q qqq q qq qq q qqq qqq qq q q qq qqq q q q qq q qqq qq q qqq qqq qqq q q q qq q qqq qq q qq q qqq qqq qqq qqq qqq q qq q qq 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 18 / 57
  • 23. Connected components • Maintain a mapping from vertex to component. • Global property, unlike triangle counts • In “scale free” social networks: • Often one big component, and • many tiny ones. • Edge changes often sit within components. • Remaining insertions merge components. • Deletions are more difficult... 19 / 57
  • 24. Connected components: Deleted edges The difficult case • Very few deletions matter. • Maintain a spanning tree, & ignore deletion if 1 not in spanning tree, 2 endpoints share a common neighbor∗ , 3 loose endpoint reaches root∗ . • In the last two (∗), also fix the spanning tree. • Rules out 99.7% of deletions. • (Long history, see related work in [MTAAP11].) 20 / 57
  • 25. STINGER component monitoring rates Edges per second Number of threads Connectedcomponentupdatespersecond 102 102.5 103 103.5 104 104.5 105 105.5 E7−8870 q qq qqq q q q qqq qq q q qq q q q qqq qq q qq q qqq qqq qqq q qq qqq qqq q q qqqq q qq q qq q qqqqq q q q qqq qqq qqq qqq q q q q q q qqq q qq qqq q qq q q q qqq q q q qqq qq q q q q qqq qq q qq q qqq q q q q q q qqq q q q q q q qq qqqq qqq qqq qqq qqq qq q q qq qqq qqq qq q qqq q q q qq q qqq qqq qqq q qq qq q qq q q q q qqq qqq q qq q q q qqq qq q q qq q qq qq q q q q q qq qq q qq q qqq qqq qqq qqq qqq qqq q q q qqq q qq qqq q q q qqq qqq qqq q q q qqq qqq qqq qq q qqq q q q q qq qqq qqq qqq q q q qq q q q q q q q qqq q qq qqq q q q qqq q qq q q q qqq qqq qqq qqq q qq q q q q q q q qq qqq qq q qqq qqq qqq qqq qqq qqq qqq qqq qqq qq q q q q qqq 20 40 60 80 X5570 q qq q q q qq q q q q q qq qqq qqq qq qqq qq qq q q qqq qq q qq q q q q qqq q qq q q q qqqqq q qq q qq q q q q q q qq q qqq q qq q q q q qq q q q q q qq q q qq qqq qq q q q qq q q q q qqq qqq qqq qq q qqq qq q q q qqq qqq qqq q qq qq q q q qqq qqq qqq qqq q qq q qq qq qq q qqq qqq qqq qqq qq qqq q qq qqq qqq q q q qqq 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 21 / 57
  • 26. STINGER component monitoring rates Seconds per batch, “latency” Number of threads Secondsperbatchprocessed 10−1.5 10−1 10−0.5 100 100.5 101 E7−8870 q qq qqq q q q qqq qq q q qq q q q qqq qq q qq q qqq qqq qqq q qq qqq qqq q q qqq q q qq q qq q qqqqq q q q qqq qqq qqq qqq q q q q q q qqq q qq qqq q qq q q q qqq q q q qqq qq q q q q qq q qq q qq qqqq q q q q q q qqq q q q q q q qq q q qq qqq qq q qqq qqq qq q q qq qqq qqq qq q qqq q q q qq q qqq qqq qqq q qq q q q qq q q q q qqq qqq q qq q q q qqq qq q q q q q qq qq q q q q q qq qq q qq q qqq qqq qqq qqq qqq qqq q q q qqq q qq qqq q q q qqq qqq qqq q q q qqq qqq q qq qq q qqq q q q q qq qqq qqq qqq q q q q q q q q q q q q qqq q qq qqq q q q q qq q qq q q q qqq qqq qqq q qq q qq q q q q q q q q q qqq qq q qqq qqq qqq q qq qqq qqq qqq q qq q qq qq q q q q qqq 20 40 60 80 X5570 q qq q q q qq q q q q q qq qqq qq q qq q qq qq q q q q qqq q q q qq q q q q qqq q qq q q q qq qqq q qq qqq q q q q q q qq q qqq q qq q q q q qq q q q q q qq q q qq qqq qq q q q qq q q q q qqq qqq qqq qq q qqq qq q q q qqq qqq qqq q qq qq q q q qqq qqq qqq qqq q qq q qq qq qq q qqq qqq qqq qqq qq qqq q qq qqq q q q q q q qqq 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 21 / 57
  • 27. STINGER component monitoring rates Speed-up over static recalculation Number of threads Connectedcomponentsspeedupoverrecalculation 10−1 100 101 102 E7−8870 q qq qqq q q q qqq qq q q qq qq q qqq qq q qqq qqq q qq q q q qqq qqq qqq q q q qqq q qq q qq q qq qqq q qq qqq qqq qqq qqq q q q q q q qqq q qq qqq q qq q q q qqq q q q qqq qq q qqq qq q qq q qq q qqq q q q q q q qqq q q q q q q qqq qqq qqq qqq q q q qqq q qq q qq qqq qqq qq q qqq q q q qq q qqq qqq qqq q qq q q q qqq q q q qqq qqq qqq qq q qqq qq q q qq q qq qqq q q q q qq qqq qqq q q q qqq qqq qqq qqq qqq q qq qqq q qq qqq q q q qqq qqq qqq qq q qqq qqq qqq qqq qqq q q q q qq qqq qqq qqq q q q qqq q qq q q q qqq qqq qqq q q q qqq q qq q qq qqq qqq qqq qqq qqq q q q q q q qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq q qq q q q qq qqqq 20 40 60 80 X5570 q qq q q q qq q qq q q qq qq q qqq qq qqq qq qq q q qqq qq q qqq q q q qqq q qq qqq qqqqq q qq q qq q qq q q q qq q qqqq qq q q q q qq q q q q q qq q qqq qqq qq q qq qq q q q q qqq qq q qqq qqq qqq qq q q q qqq qqq qqq qqq qq q q q q qq q q q qqq qq q qqq qqq qq q q q qqq qqq qqq qqq q q qq q q q q qqq qqq q qq qqq 5 10 15 20 Batch size q 3000000 q 1000000 q 300000 q 100000 q 30000 q 10000 q 3000 q 1000 q 300 q 100 21 / 57
  • 28. Community detection Motivation STING for streaming, graph-structured data on Intel-based platforms World-leading community detection Defining community detection Our parallel, agglomerative algorithm Performance Community-building interactions Plans and direction 22 / 57
  • 29. Community detection What do we mean? • Partition a graph’s vertices into disjoint communities. • A community locally maximizes some metric. • Modularity, conductance, ... • Trying to capture that vertices are more similar within one community than between communities. Jason’s network via LinkedIn Labs 23 / 57
  • 30. Community detection Assumptions • Disjoint partitioning of vertices. • There is no one unique answer. • Many metrics: NP-complete to opti- mize [Brandes, et al.]. • Graph is lossy representation. • Want an adaptable detection method. • For rest, assume modularity [Newman]. • Important: Edge local scoring. Jason’s network via LinkedIn Labs 23 / 57
  • 31. Multi-threaded algorithm design points Targeting massive graphs Multi-threading over shared memory, up to 2 GiB on Intel-based platforms. A scalable multi-threaded graph analysis algorithm • ... avoids global locks and frequent global synchronization. • ... distributes computation over edges rather than only vertices. • ... works with data as local to an edge as possible. • ... uses compact data structures that agglomerate memory references. 24 / 57
  • 32. Sequential agglomerative method A B C D E F G • A common method (e.g. [Clauset, et al.]) agglomerates vertices into communities. • Each vertex begins in its own community. • An edge is chosen to contract. • Merging maximally increases modularity. • Priority queue. • Known often to fall into an O(n2) performance trap with modularity [Wakita and Tsurumi]. 25 / 57
  • 33. Sequential agglomerative method A B C D E F G C B • A common method (e.g. [Clauset, et al.]) agglomerates vertices into communities. • Each vertex begins in its own community. • An edge is chosen to contract. • Merging maximally increases modularity. • Priority queue. • Known often to fall into an O(n2) performance trap with modularity [Wakita and Tsurumi]. 25 / 57
  • 34. Sequential agglomerative method A B C D E F G C B D A • A common method (e.g. [Clauset, et al.]) agglomerates vertices into communities. • Each vertex begins in its own community. • An edge is chosen to contract. • Merging maximally increases modularity. • Priority queue. • Known often to fall into an O(n2) performance trap with modularity [Wakita and Tsurumi]. 25 / 57
  • 35. Sequential agglomerative method A B C D E F G C B D A B C • A common method (e.g. [Clauset, et al.]) agglomerates vertices into communities. • Each vertex begins in its own community. • An edge is chosen to contract. • Merging maximally increases modularity. • Priority queue. • Known often to fall into an O(n2) performance trap with modularity [Wakita and Tsurumi]. 25 / 57
  • 36. Parallel agglomerative method A B C D E F G • We use a matching to avoid the queue. • Compute a heavy weight, large matching. • Simple greedy algorithm. • Maximal matching. • Within factor of 2 in weight. • Merge all matched communities at once. • Maintains some balance. • Produces different results. • Agnostic to weighting, matching... • Can maximize modularity, minimize conductance. • Modifying matching permits easy exploration. 26 / 57
  • 37. Parallel agglomerative method A B C D E F G C D G • We use a matching to avoid the queue. • Compute a heavy weight, large matching. • Simple greedy algorithm. • Maximal matching. • Within factor of 2 in weight. • Merge all matched communities at once. • Maintains some balance. • Produces different results. • Agnostic to weighting, matching... • Can maximize modularity, minimize conductance. • Modifying matching permits easy exploration. 26 / 57
  • 38. Parallel agglomerative method A B C D E F G C D G E B C • We use a matching to avoid the queue. • Compute a heavy weight, large matching. • Simple greedy algorithm. • Maximal matching. • Within factor of 2 in weight. • Merge all matched communities at once. • Maintains some balance. • Produces different results. • Agnostic to weighting, matching... • Can maximize modularity, minimize conductance. • Modifying matching permits easy exploration. 26 / 57
  • 39. Implementation: Data structures Extremely basic for graph G = (V, E) • An array of (i, j; w) weighted edge pairs, each i, j stored only once and packed, uses 3|E| space • An array to store self-edges, d(i) = w, |V| • A temporary floating-point array for scores, |E| • A additional temporary arrays using 4|V| + 2|E| to store degrees, matching choices, offsets... • Weights count number of agglomerated vertices or edges. • Scoring methods (modularity, conductance) need only vertex-local counts. • Storing an undirected graph in a symmetric manner reduces memory usage drastically and works with our simple matcher. 27 / 57
  • 40. Implementation: Data structures Extremely basic for graph G = (V, E) • An array of (i, j; w) weighted edge pairs, each i, j stored only once and packed, uses 3|E| space • An array to store self-edges, d(i) = w, |V| • A temporary floating-point array for scores, |E| • A additional temporary arrays using 4|V| + 2|E| to store degrees, matching choices, offsets... • Original ignored order in edge array, killed OpenMP. • Roughly bucket edge array by first stored index. Non-adjacent CSR-like structure [MTAAP12]. • Hash i, j to determine order. Scatter among buckets [MTAAP12]. 27 / 57
  • 41. Implementation: Routines Three primitives: Scoring, matching, contracting Scoring Trivial. Matching Repeat until no ready, unmatched vertex: 1 For each unmatched vertex in parallel, find the best unmatched neighbor in its bucket. 2 Try to point remote match at that edge (lock, check if best, unlock). 3 If pointing succeeded, try to point self-match at that edge. 4 If both succeeded, yeah! If not and there was some eligible neighbor, re-add self to ready, unmatched list. (Possibly too simple, but...) 28 / 57
  • 42. Implementation: Routines Contracting 1 Map each i, j to new vertices, re-order by hashing. 2 Accumulate counts for new i bins, prefix-sum for offset. 3 Copy into new bins. • Only synchronizing in the prefix-sum. That could be removed if I don’t re-order the i , j pair; haven’t timed the difference. • Actually, the current code copies twice... On short list for fixing. • Binning as opposed to original list-chasing enabled Intel/OpenMP support with reasonable performance. 29 / 57
  • 43. Performance summary Two moderate-sized graphs, one large Graph |V| |E| Reference rmat-24-16 15 580 378 262 482 711 [Chakrabarti, et al.] [Bader, et al.] soc-LiveJournal1 4 847 571 68 993 773 [Leskovec] uk-2007-05 105 896 555 3 301 876 564 [Boldi, et al.] Peak processing rates in edges/second Platform rmat-24-16 soc-LiveJournal1 uk-2007-05 X5570 1.83 × 106 3.89 × 106 X5650 2.54 × 106 4.98 × 106 E7-8870 5.86 × 106 6.90 × 106 6.54 × 106 XMT 1.20 × 106 0.41 × 106 XMT2 2.11 × 106 1.73 × 106 3.11 × 106 30 / 57
  • 44. Performance: Time to solution OpenMP Threads Time(s) 3 10 32 100 316 1000 3162 rmat−24−16 qqqq q q q qq qqq qqqqq q qqq q q q qqqqqq 1 2 4 8 16 32 64 128 soc−LiveJournal1 qqqqqq qqqqqq qqq qqq qqq qqq qq qqqq 1 2 4 8 16 32 64 128 Platform q X5570 (4−core) X5650 (6−core) E7−8870 (10−core) 31 / 57
  • 45. Performance: Rate (edges/second) OpenMP Threads Edgespersecond 105.5 106 106.5 107 107.5 rmat−24−16 qqqq q q q qq qq q qqqqq q qq q q q q q q qqqq 1 2 4 8 16 32 64 128 soc−LiveJournal1 q q qq qq q qq q qq qq q q qqqq q qqqq q qqq q 1 2 4 8 16 32 64 128 Platform q X5570 (4−core) X5650 (6−core) E7−8870 (10−core) 32 / 57
  • 46. Performance: Modularity step Modularity 0.0 0.2 0.4 0.6 0.8 q q q 0 10 20 30 Graph q coAuthorsCiteseer q eu−2005 q uk−2002 Termination metric q Coverage Max Average • Timing results: Stop when coverage ≥ 0.5 (communities cover 1/2 edges). • More work ⇒ higher modularity. Choice up to application. 33 / 57
  • 47. Performance: Large-scale time Threads (OpenMP) / Processors (XMT) Time(s) 210 212 214 q q q q q q q q q q q q q q q q q q q q q q q q 504.9s 1063s 6917s 31568s 1 2 4 8 16 32 64 Platform q E7−8870 (10−core) XMT2 34 / 57
  • 48. Performance: Large-scale speedup Threads (OpenMP) / Processors (XMT) Speedupoversingleprocessor/thread 21 22 23 24 q q q q q q q q q q q q q q q q q q q q q q q 13.7x 29.6x 1 2 4 8 16 32 64 Platform q E7−8870 (10−core) XMT2 35 / 57
  • 49. Building a community around streaming, graph-structured data Motivation STING for streaming, graph-structured data on Intel-based platforms World-leading community detection Community-building interactions External STING users∗ Graph500 (synergistic) Conferences, DIMACS Challenge, etc. Plans and direction 36 / 57
  • 50. Graph500 Not part of this project, but... To keep people apprised: • Adding single-source shortest paths benchmark. • Re-working the generator to be more scalable, vectorized, etc. • Addressing some generator parameter issues. Fewer duplicate edges. • Improving shared-memory reference implementation performance... • (Possibly adding an OpenCL reference implementation.) mirasol is very useful for development and reference tuning. 37 / 57
  • 51. 10th DIMACS Implementation Challenge Graph partitioning and clustering • Georgia Tech assisted in organizing the 10th DIMACS Implementation Challenge. • Co-sponsored by Intel. • Community detection implementation on mirasol won the Mix Challenge combining speed and multiple clustering criteria. • One of only two implementations that tackled uk-2007-05 with 106 million vertices and 3.3 billion edges. • Other implementation required multiple days across a cluster. • Next closest for performance used Intel’s TBB (Fagginger Auer & Bisseling) or CUDA (small examples). 38 / 57
  • 52. Conferences and minisymposiums • GT hosted the bi-annual SIAM Parallel Processing conference. • Included a well-attended minisymposium organized by Riedy & Meyerhenke on “Parallel Analysis of Massive Social Networks,” including KDT work from UCSB. • Submitting a massive graph minisymposium for SIAM CSE 2013 (25 Feb – 1 Mar)... Looking for speakers. • List of presentations appended to slides, too many to include. • 10 conference presentations / invited talks / posters • 3 tutorials • 5 accepted (refereed) publications, 1 in submission 39 / 57
  • 53. Plans and direction Motivation STING for streaming, graph-structured data on Intel-based platforms World-leading community detection Community-building interactions Plans and direction Plans Software 40 / 57
  • 54. Plans • Finish the split into a client/server model, including subgraph visualization, looking into KDT interface. • Integrate performance counters, build a performance model / monitor. • Evaluate Intel features for bottlenecks (vectors, better NUMA allocation, etc.) • Incorporate community detection, seed set expansion. 41 / 57
  • 55. STING on-line resources Home page http://www.cc.gatech.edu/stinger/ Download .../downloads.php Reference guide .../doxygen/ General tutorial (URL too long, linked from above) The current release of STING includes the following static and dynamic analysis kernel examples: • multi-platform benchmarks for edge insertion and removal (dynamic), • breadth-first search (static), • clustering and transitivity coefficients (static and dynamic), and • connected components (static and dynamic). 42 / 57
  • 56. Community detection Home page http://www.cc.gatech.edu/˜jriedy/ community-detection/ Git source repository http://www.cc.gatech.edu/ ˜jriedy/gits/community-el.git • “Experimental” and subject to extreme change. • Current version packaged as a separate, easy to use / import executable. • Being adapted into STING. 43 / 57
  • 57. Presentations and tutorials I David A. Bader. Opportunities and challenges in massive data-intensive computing,. In 9th International Conference on Parallel Processing and Applied Mathematics (PPAM11), September 2011. (invited keynote). David A. Bader. Opportunities and challenges in massive data-intensive computing. Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs (ParGraph), December 2011. (invited keynote). 44 / 57
  • 58. Presentations and tutorials II Jason Riedy, David Ediger, David A. Bader, and Henning Meyerhenke. Tracking structure of streaming social networks. 2011 Graph Exploitation Symposium hosted by MIT Lincoln Labs, August 2011. (presentation). David A. Bader. Fourth Graph500 results. Graph500 Birds of a Feather, 27th International Supercomputing Conference (ISC), June 2012. (presentation). 45 / 57
  • 59. Presentations and tutorials III David A. Bader. Massive data analytics using heterogeneous computing. In 21st International Heterogeneity in Computing Workshop (HCW 2012), held in conjunction with The International Parallel and Distributed Processing Symposium (IPDPS 2012), May 2012. (invited keynote). David A. Bader. Opportunities and challenges in massive data-intensive computing. Distinguished Lecture, May 2012. (invited presentation). 46 / 57
  • 60. Presentations and tutorials IV David A. Bader. Opportunities and challenges in massive data-intensive computing. From Data to Knowledge: Machine-Learning with Real-time and Streaming Applications, May 2012. (invited presentation). David A. Bader. Opportunities and challenges in massive data-intensive computing. NSF Workshop on Research Directions in the Principles of Parallel Computation, June 2012. (invited presentation). 47 / 57
  • 61. Presentations and tutorials V David Ediger, Jason Riedy, Rob McColl, and David A. Bader. Parallel programming for graph analysis. In 17th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, February 2012. (tutorial). Henning Meyerhenke, E. Jason Riedy, and David A. Bader. Parallel community detection in streaming graphs. SIAM Parallel Processing for Scientific Computing, February 2012. (presentation). 48 / 57
  • 62. Presentations and tutorials VI E. Jason Riedy, David Ediger, Henning Meyerhenke, and David A. Bader. Sting: Software for analysis of spatio-temporal interaction networks and graphs. SIAM Parallel Processing for Scientific Computing, February 2012. (poster). E. Jason Riedy and Henning Meyerhenke. Scalable algorithms for analysis of massive, streaming graphs. SIAM Parallel Processing for Scientific Computing, February 2012. (presentation). 49 / 57
  • 63. Presentations and tutorials VII Anita Zakrzewska and Oded Green. Stinger tutorial. Presented to GTRI and SAIC software engineers, July 2012. (tutorial). 50 / 57
  • 64. Publications I David Ediger, Rob McColl, Jason Riedy, and David A. Bader. Stinger: High performance data structure for streaming graphs. In IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 2012. (to appear). Xin Liu, Pushkar Pande, Henning Meyerhenke, and David A. Bader. Pasqual: A parallel de novo assembler for next generation genome sequencing. IEEE Transactions on Parallel and Distributed Systems, 2012. (to appear). 51 / 57
  • 65. Publications II E. Jason Riedy, David A. Bader, and Henning Meyerhenke. Scalable multi-threaded community detection in social networks. In 6th Workshop on Multithreaded Architectures and Applications (MTAAP), held in conjunction with The International Parallel and Distributed Processing Symposium (IPDPS 2012), May 2012. (9/15 papers accepted, 60% acceptance). E. Jason Riedy, Henning Meyerhenke, David Ediger, and David A. Bader. Parallel community detection for massive graphs. In 10th DIMACS Implementation Challenge - Graph Partitioning and Graph Clustering. (workshop paper), Atlanta, Georgia, February 2012. Won first place in the Mix Challenge and Mix Pareto Challenge. 52 / 57
  • 66. Publications III Jason Riedy, Henning Meyerhenke, David A. Bader, David Ediger, and Timothy G. Mattson. Analysis of streaming social networks and graphs on multicore architectures. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan, March 2012. 53 / 57
  • 67. References I D. Bader, J. Gilbert, J. Kepner, D. Koester, E. Loh, K. Madduri, W. Mann, and T. Meuse. HPCS SSCA#2 Graph Analysis Benchmark Specifications v1.1. Jul. 2005. D.A. Bader and J. McCloskey. Modularity and graph algorithms. Presented at UMBC, September 2009. J.W. Berry., B. Hendrickson, R.A. LaViolette, and C.A. Phillips. Tolerating the community detection resolution limit with edge weighting. CoRR abs/0903.1072 (2009). 54 / 57
  • 68. References II P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, vol. 34, no. 8, pp. 711–726, 2004. U. Brandes, D. Delling, M. Gaertler, R. G¨orke, M. Hoefer, Z. Nikoloski, and D. Wagner. On modularity clustering. In IEEE Trans. Knowledge and Data Engineering, vol. 20, no. 2, pp. 172–188, 2008. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proc. 4th SIAM Intl. Conf. on Data Mining (SDM), Orlando, FL, Apr. 2004. SIAM. 55 / 57
  • 69. References III A. Clauset, M. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, vol. 70, no. 6, p. 66111, 2004. D. Ediger, E. J. Riedy, and D. A. Bader. Tracking structure of streaming social networks. In Proceedings of the Workshop on Multithreaded Architectures and Applications (MTAAP’11), May 2011. J. Leskovec. Stanford large network dataset collection. At http://snap.stanford.edu/data/, Oct. 2011. M. Newman. Modularity and community structure in networks. In Proc. of the National Academy of Sciences, vol. 103, no. 23, pp. 8577–8582, 2006. 56 / 57
  • 70. References IV K. Wakita and T. Tsurumi. Finding community structure in mega-scale social networks. CoRR, vol. abs/cs/0702048, 2007. D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440–442, Jun 1998. 57 / 57