SlideShare a Scribd company logo
1 of 19
STINGER
Dynamic Graph Analysis
Contributors
• David Bader
• David Ediger
• Rob McColl
• Jason Riedy
• Kamesh Madduri
• Jason Poovey
Outline
• Motivation


• Dynamic Graph Basics


• What is STINGER?


• What can STINGER do?


• Why STINGER?
Big Data problems need Graph Analysis
    Health Care      • Finding outbreaks, population epidemiology


   Social Networks   • Advertising, searching, grouping, influence


     Intelligence    • Decisions at scale, regulating algorithms


  Systems Biology    • Understanding interactions, drug design


     Power Grid      • Disruptions, conversion


     Simulation      • Discrete events, cracking meshes
Graphs are pervasive
 • Graphs: things and relationships
    • Different kinds of things, different kinds of relationships, but graphs provide a
      framework for analyzing the relationships.
    • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.


         Astrophysics                     Bioinformatics                  Social Informatics
Problem: Outlier detection       Problem:                           Problem: Emergent behavior,
Challenges: Massive data         Identifying target proteins        information spread
sets, temporal variation         Challenges:                        Challenges: New analysis,
Graph Problems: matching,        Data heterogeneity, quality        data uncertainty, scale
clustering                       Graph Problems:                    Graph Problems: clustering,
                                 Centrality, clustering             flows, shortest paths
Data rates and volumes are immense
• Facebook:
  • ~1 billion users
  • average 130 friends
  • 30 billion pieces of content shared / month
• Twitter:
   • 500 million active users
   • 340 million tweets / day
• Internet – 100s of exabytes / year
   • 300 million new websites per year
   • 48 hours of video to You Tube per minute
   • 30,000 YouTube videos played per second
Our focus is streaming graphs
• As relationships change
  • Edges (relationships) are inserted, updated, and removed
  • New vertices (things) join and leave the network


• What are the effects?
  • On information flow
  • On community structure
                                                z       x      y
  • On the integrity of data and structure


• Which actors and relationships are…
  • The key players and influencers in the change?
  • The anomalies and threats?
What is STINGER?
Spatio-Temporal Interaction Networks and Graphs Extensible Representation
D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarr´ıa-Miranda, C. Hastings, K. Madduri, S. C. Poulos


• A scalable, high performance in-memory dynamic graph data
  structure
   •   Stores semantic and temporal information.
   •   Designed to be flexible and extendable.
   •   Be useful for the entire “large graph” community.
   •   Permit good performance: No single structure is optimal for all.
   •   Assume globally addressable memory access.
   •   Support multiple, parallel readers and a single parallel writer.

• A software suite for dynamic graph analysis
  • Targets large shared-memory x86 and the Cray XMT
  • Written in C with OpenMP and XMT pragma support for parallelism
As a data structure
• Fast insertions, deletions, and updates:
 A data structure that grows and changes at the speed of the data.

• Edge and vertex types and weights:
 Represent complex relationships and multiple simultaneous networks.

• Filtering traversal mechanisms:
 Traverse serially or in parallel on specific edge types, time ranges,
 vertex sets, etc.

• Experimental workflow server:
 Multiple data streams and analytics with one persistent data structure.

• Experimental Java and Python bindings:
 Use efficiency-oriented languages without sacrificing performance-
 oriented results.
As an analysis package
• Streaming edge insertions and deletions:
  Performs new edge insertions, updates, and deletions in batches or individually.

• Streaming clustering coefficients:
  Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions.

• Streaming connected components:
  Accurately tracks the connected components of a graph with insertions and deletions.

• Streaming community detection:
  Track and update the community structures within the graph as they change.

• Parallel agglomerative clustering:
  Find clusters that are optimized for a user-defined edge scoring function.

• Streaming Betweenness Centrality:
  Find the key points within information flows and structural vulnerabilities.

• K-core Extraction:
  Extract additional communities and filter noisy high-degree vertices.

• Classic breadth-first search:
  Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths.
How is the graph stored?
What can STINGER represent?
• Nearly any set of
  relationships
   •   Healthcare
   •   Social Networks
   •   Intelligence
   •   Systems biology
   •   Power grid
   •   Travel networks

• Example: Twitter
   • Users, hashtags, tweets as vertex types
   • Authorship, retweet, mentions, follows / followed by edge types


• Example: Work Environment
   • Users, PCs, printers, emails, URLs, files, etc. as vertex types
   • Email alias, from, to, access, logon/off, print, IM, etc. as edge types
What can STINGER do?
• Optimized to update at rates of over 3 million edges per second on
 graphs of one billion edges
  •   D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming
      Graphs,'' The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20-
      22, 2012. Best Paper Award.




                       RMAT – Recursive MATrix graph generator. RMAT(N) indicates 2^N vertices.
What can STINGER do?
• Maintaining connected components in a graph of half a billion edges
  • Up to 1.26 million updates per sec.
  • 137x faster than recomputing.

• Scalable parallel streaming community detection
  • Built on parallel insert / delete mechanisms.

• Streaming approximate betweenness
  • Used to analyze influencers on Twitter during Hurricane Sandy over time.
What does STINGER not do?
• Does not provide all ACID properties
   • Why: Not intended to be the backing data store.
   • Why: Allows for greater ingest and processing speeds.
   • Alternative: Back STINGER ingest with an ACID DB
   • Alternative: STINGER does provide consistency, partial isolation


• No text base query language – for now
   • Why: Currently, no language is general enough to describe most or all queries
   • Alternative: Filtering traversal APIs, unlimited query flexibility through code
   • Alternative: Productivity language bindings (Python, Java)


• No distributed / Hadoop-like cluster support
   • Why: Good fit for ingest, but poor for streaming analysis, random access is too slow
   • Alternative: Larger shared memory systems such as the Cray XMT and SGI UV systems
   • Alternative: Processing billion-edge graphs in shared memory on affordable Intel servers
   • Alternative: Extract key portions of the graph from a larger data store and perform fast in-
     memory processing in STINGER
What sizes, performance can it handle?
                                                                  Server 4x Opteron 6282 256GB DDR3
    Desktop (Intel Core i7-2600 16GB DDR3)                                                     Connected      Updates
                                                            V      E      Config Size (GB)
                                 Connected      Updates                                      Components (s)   per Sec.
V      E    Config Size (GB)
                               Components (s)   per Sec.
                                                           16M 512M       25-14    60GB           13.7         696K
1M    8M    22-14    1.184         0.316         2.7M
                                                           16M 256M       25-14    24.6GB         9.82         2.1M
2M    16M   22-14    2.384          0.75         2.3M
4M    33M   22-14    4.768           2           2.3M           Cray XMT2 – 64 Processors 2TB DDR2
8M    67M   24-14    9.536          5.36         0.85M                                         Connected      Updates
                                                            V       E     Config Size (GB)
                                                                                             Components (s)   per Sec.
4M    67M   24-14    7.984           3           1.38M
                                                           67M    512M     28-32    86GB          13.8         3.3M
4M   134M   24-14    14.336         5.7          0.8M
                                                           268M    4.3B    28-32   312GB          52.3         2.34M


                        • The only limitation on size is system memory
                            • Billions of vertices and edges are possible

                        • V vertices and E edges in each graph
                             • E counts are undirected
                             • STINGER stores both directions
                        • Config is STINGER-specific parameters
Why not existing technologies?
• Traditional SQL databases
   • Not structured to do any meaningful graph queries with any level of
     efficiency or timeliness

• Graph databases - mostly on-disk
  • Distributed disk can keep up with storing / indexing, but is simply too
    slow at random graph access to process on as the graph updates

• Hadoop and HDFS-based projects
  • Not really the right programming model for many structural queries
    over the entire graph, random access performance is poor

• Smaller graph libraries, processing tools
  • Can't scale, can't process dynamic graphs, frequently leads to
    impossible visualization attempts
Who is GTRI?
• Georgia Tech Research Institute
  • Largest research entity at Georgia Institute of Technology
  • One of the world's premier university-based applied R&D
    organizations for 75 years
  • Non-profit with over 1,600 employees and 21 locations world-wide
  • Over $240 million per year of government and industry contracts


• Innovative Computing Division
 of the Cyber Technology and Information Security Lab
  • Dedicated to the application of practical HPC expertise and
    cutting-edge fundamental research to solve real-world problems
  • Experts in high-performance computing, algorithms, and big data
How can I start using STINGER?
• Information, code, help
   • http://cc.gatech.edu/stinger
   • robert.mccoll@gtri.gatech.edu


• Together, GTRI and Georgia Tech can offer
   • Consulting
     Understand how your organization can benefit from graph analytics.

  • Training
    Learn how to use graph analysis and apply STINGER to your data.

  • Implementation
    Customize and extend STINGER to suit your needs using our experts.

  • Research Expertise
    Connect with researchers on the cutting edge of big data to develop novel
    solutions to your open problems.

More Related Content

Viewers also liked

Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCGilad Lotan
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsdatablend
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphsNicola Barbieri
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionWill Johnson
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 

Viewers also liked (16)

Networkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYCNetworkx & Gephi Tutorial #Pydata NYC
Networkx & Gephi Tutorial #Pydata NYC
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
 
Temporal graph
Temporal graphTemporal graph
Temporal graph
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Gephi with CSV File
Gephi with CSV FileGephi with CSV File
Gephi with CSV File
 
Sparksee overview
Sparksee overviewSparksee overview
Sparksee overview
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 
Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 

Similar to Dynamic Graph Analysis with STINGER

Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...Simon Lia-Jonassen
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Deepak Shankar
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law? Jeffrey Funk
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scaleRan Levy
 

Similar to Dynamic Graph Analysis with STINGER (20)

Google file system
Google file systemGoogle file system
Google file system
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law?
 
TARDEC Presentation 2
TARDEC Presentation 2TARDEC Presentation 2
TARDEC Presentation 2
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Dynamic Graph Analysis with STINGER

  • 2. Contributors • David Bader • David Ediger • Rob McColl • Jason Riedy • Kamesh Madduri • Jason Poovey
  • 3. Outline • Motivation • Dynamic Graph Basics • What is STINGER? • What can STINGER do? • Why STINGER?
  • 4. Big Data problems need Graph Analysis Health Care • Finding outbreaks, population epidemiology Social Networks • Advertising, searching, grouping, influence Intelligence • Decisions at scale, regulating algorithms Systems Biology • Understanding interactions, drug design Power Grid • Disruptions, conversion Simulation • Discrete events, cracking meshes
  • 5. Graphs are pervasive • Graphs: things and relationships • Different kinds of things, different kinds of relationships, but graphs provide a framework for analyzing the relationships. • New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Astrophysics Bioinformatics Social Informatics Problem: Outlier detection Problem: Problem: Emergent behavior, Challenges: Massive data Identifying target proteins information spread sets, temporal variation Challenges: Challenges: New analysis, Graph Problems: matching, Data heterogeneity, quality data uncertainty, scale clustering Graph Problems: Graph Problems: clustering, Centrality, clustering flows, shortest paths
  • 6. Data rates and volumes are immense • Facebook: • ~1 billion users • average 130 friends • 30 billion pieces of content shared / month • Twitter: • 500 million active users • 340 million tweets / day • Internet – 100s of exabytes / year • 300 million new websites per year • 48 hours of video to You Tube per minute • 30,000 YouTube videos played per second
  • 7. Our focus is streaming graphs • As relationships change • Edges (relationships) are inserted, updated, and removed • New vertices (things) join and leave the network • What are the effects? • On information flow • On community structure z x y • On the integrity of data and structure • Which actors and relationships are… • The key players and influencers in the change? • The anomalies and threats?
  • 8. What is STINGER? Spatio-Temporal Interaction Networks and Graphs Extensible Representation D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarr´ıa-Miranda, C. Hastings, K. Madduri, S. C. Poulos • A scalable, high performance in-memory dynamic graph data structure • Stores semantic and temporal information. • Designed to be flexible and extendable. • Be useful for the entire “large graph” community. • Permit good performance: No single structure is optimal for all. • Assume globally addressable memory access. • Support multiple, parallel readers and a single parallel writer. • A software suite for dynamic graph analysis • Targets large shared-memory x86 and the Cray XMT • Written in C with OpenMP and XMT pragma support for parallelism
  • 9. As a data structure • Fast insertions, deletions, and updates: A data structure that grows and changes at the speed of the data. • Edge and vertex types and weights: Represent complex relationships and multiple simultaneous networks. • Filtering traversal mechanisms: Traverse serially or in parallel on specific edge types, time ranges, vertex sets, etc. • Experimental workflow server: Multiple data streams and analytics with one persistent data structure. • Experimental Java and Python bindings: Use efficiency-oriented languages without sacrificing performance- oriented results.
  • 10. As an analysis package • Streaming edge insertions and deletions: Performs new edge insertions, updates, and deletions in batches or individually. • Streaming clustering coefficients: Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions. • Streaming connected components: Accurately tracks the connected components of a graph with insertions and deletions. • Streaming community detection: Track and update the community structures within the graph as they change. • Parallel agglomerative clustering: Find clusters that are optimized for a user-defined edge scoring function. • Streaming Betweenness Centrality: Find the key points within information flows and structural vulnerabilities. • K-core Extraction: Extract additional communities and filter noisy high-degree vertices. • Classic breadth-first search: Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths.
  • 11. How is the graph stored?
  • 12. What can STINGER represent? • Nearly any set of relationships • Healthcare • Social Networks • Intelligence • Systems biology • Power grid • Travel networks • Example: Twitter • Users, hashtags, tweets as vertex types • Authorship, retweet, mentions, follows / followed by edge types • Example: Work Environment • Users, PCs, printers, emails, URLs, files, etc. as vertex types • Email alias, from, to, access, logon/off, print, IM, etc. as edge types
  • 13. What can STINGER do? • Optimized to update at rates of over 3 million edges per second on graphs of one billion edges • D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming Graphs,'' The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20- 22, 2012. Best Paper Award. RMAT – Recursive MATrix graph generator. RMAT(N) indicates 2^N vertices.
  • 14. What can STINGER do? • Maintaining connected components in a graph of half a billion edges • Up to 1.26 million updates per sec. • 137x faster than recomputing. • Scalable parallel streaming community detection • Built on parallel insert / delete mechanisms. • Streaming approximate betweenness • Used to analyze influencers on Twitter during Hurricane Sandy over time.
  • 15. What does STINGER not do? • Does not provide all ACID properties • Why: Not intended to be the backing data store. • Why: Allows for greater ingest and processing speeds. • Alternative: Back STINGER ingest with an ACID DB • Alternative: STINGER does provide consistency, partial isolation • No text base query language – for now • Why: Currently, no language is general enough to describe most or all queries • Alternative: Filtering traversal APIs, unlimited query flexibility through code • Alternative: Productivity language bindings (Python, Java) • No distributed / Hadoop-like cluster support • Why: Good fit for ingest, but poor for streaming analysis, random access is too slow • Alternative: Larger shared memory systems such as the Cray XMT and SGI UV systems • Alternative: Processing billion-edge graphs in shared memory on affordable Intel servers • Alternative: Extract key portions of the graph from a larger data store and perform fast in- memory processing in STINGER
  • 16. What sizes, performance can it handle? Server 4x Opteron 6282 256GB DDR3 Desktop (Intel Core i7-2600 16GB DDR3) Connected Updates V E Config Size (GB) Connected Updates Components (s) per Sec. V E Config Size (GB) Components (s) per Sec. 16M 512M 25-14 60GB 13.7 696K 1M 8M 22-14 1.184 0.316 2.7M 16M 256M 25-14 24.6GB 9.82 2.1M 2M 16M 22-14 2.384 0.75 2.3M 4M 33M 22-14 4.768 2 2.3M Cray XMT2 – 64 Processors 2TB DDR2 8M 67M 24-14 9.536 5.36 0.85M Connected Updates V E Config Size (GB) Components (s) per Sec. 4M 67M 24-14 7.984 3 1.38M 67M 512M 28-32 86GB 13.8 3.3M 4M 134M 24-14 14.336 5.7 0.8M 268M 4.3B 28-32 312GB 52.3 2.34M • The only limitation on size is system memory • Billions of vertices and edges are possible • V vertices and E edges in each graph • E counts are undirected • STINGER stores both directions • Config is STINGER-specific parameters
  • 17. Why not existing technologies? • Traditional SQL databases • Not structured to do any meaningful graph queries with any level of efficiency or timeliness • Graph databases - mostly on-disk • Distributed disk can keep up with storing / indexing, but is simply too slow at random graph access to process on as the graph updates • Hadoop and HDFS-based projects • Not really the right programming model for many structural queries over the entire graph, random access performance is poor • Smaller graph libraries, processing tools • Can't scale, can't process dynamic graphs, frequently leads to impossible visualization attempts
  • 18. Who is GTRI? • Georgia Tech Research Institute • Largest research entity at Georgia Institute of Technology • One of the world's premier university-based applied R&D organizations for 75 years • Non-profit with over 1,600 employees and 21 locations world-wide • Over $240 million per year of government and industry contracts • Innovative Computing Division of the Cyber Technology and Information Security Lab • Dedicated to the application of practical HPC expertise and cutting-edge fundamental research to solve real-world problems • Experts in high-performance computing, algorithms, and big data
  • 19. How can I start using STINGER? • Information, code, help • http://cc.gatech.edu/stinger • robert.mccoll@gtri.gatech.edu • Together, GTRI and Georgia Tech can offer • Consulting Understand how your organization can benefit from graph analytics. • Training Learn how to use graph analysis and apply STINGER to your data. • Implementation Customize and extend STINGER to suit your needs using our experts. • Research Expertise Connect with researchers on the cutting edge of big data to develop novel solutions to your open problems.