SlideShare a Scribd company logo
1 of 44
Exploring Titan and Spark GraphX for
Analyzing Time-Varying Electrical Networks
Hadoop Summit 2016, Dublin, April 13th
Thomas VIAL, OCTO Technology - tvial@octo.com
Guillaume GERMAINE, EDF R&D - guillaume.germaine@edf.fr
Marie-Luce Picard, Benoit Grossin, Martin Soppé, Michel Lutz
2
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
3
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
4
ELECTRICITY GENERATION
623.5 TWH
All electricity-related activities
Generation
Transmission & Distribution
Trading and Sales & Marketing
Energy services
Key figures*
€72.9 billion in sales
38.5 million customers
158,161 employees worldwide
84.7% of generation does not emit CO2
2014 INVESTMENTS
€4.5 BILLION
EDF: A GLOBAL LEADER IN ELECTRICITY
*as of 2015
5
Description of the French electrical network
 2200 distribution substations
 618 000 kilometers of medium voltage network
 697 000 kilometers of low voltage network
 A great diversity of components: lines, transformers, switches,
line brakers, meters…
High voltage Medium voltage Low voltage
Transmission network Distribution network
 More than 100 million components
6
French electrical network: more than 100m equipments
to manage
Distribution
substation Meters
… X2200
200+
nodes in
depth
~ 50000
nodes
100m+
elements
7
Some business cases to tackle
 BC1 Get a global picture of all powered
components, at any given time
 BC2 Track the physical evolution of the
network over time (new meters,
reconfiguration of the network…)
 BC3 Process electrical energy balance for
any subpart of the network, over any period
of time
 BC4 If an equipment fails, figure out the best
way to restore power supply as quickly as
possible, given the state of the network at
this specific time
8
How to modelize the problem?
 define relationships between technical equipments
… it’s all about graphs
 associate some properties to each component
 keep track of events across time
 explore the network, with complex calculations
at hand
We have to find a way to:
9
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
10
A property graph as a data model
Edge
Vertex
Properties
key1:value
key2:value
key1:value
key2:value
key1:value
key2:value
Label
11
 All components of the network are modelized as vertices:
• Distribution substations
• Switches
• Line brakers
• Transformers
• Loads
• Lines
• …
 Edges only represent symbolic, non-oriented links between components
 Electrical lines are also modelized as vertices!
A property graph as a data model
 Load curves (metering data) are not stored as properties into the graph
structure, but in a separate storage (HBase in this study)
12
Graphs and temporal data: not so simple!
 A unique graph to record every event that has ever occurred
 All events are recorded as vertex properties, in the form of time intervals:
 Validity: track equipments life cycle (actual period as a working unit)
 Failure: track equipments failures
 SwitchState: track network reconfiguration through switches changes of state
(open/close)
 Each time an event occurs, a time interval is updated, or created
The concept: apply graph mutations at each new event, enriching the “maximal” graph
13
Graphs and temporal data: not so simple!
The concept: apply graph mutations at each new event, enriching the “maximal” graph
Line LoadSrc
Line Load
Switch
Line
Validity : ]-∞, +∞[
Failure : []
LoadLine
Validity : ]-∞, +∞[
Failure : []
Validity : ]-∞, +∞[
Failure : []
Validity : [t1, +∞[
Failure : []
SwitchState : []
DefaultState : closed
Validity : ]-∞, +∞[
Failure : []
Validity : ]-∞, +∞[
Failure : []
Validity : ]-∞, +∞[
Failure : []
Validity : ]-∞, t2]
Failure : []
Validity : ]-∞, +∞[
Failure : [t3, +∞[
Validity : [t1, +∞[
Failure : []
SwitchState : [t4, +∞[
DefaultState : closed
14
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
15
Spark GraphX: global architecture
 Offers a graph API built on top of Spark
 Extends RDD to represent property graphs: VertexRDD and EdgeRDD
 Implements a collection of graph algorithms (e.g., Page Rank, Triangle
Counting, Connected Components…) and also some fundamental
operations on graphs (e.g., mapVertices, mapEdges, subgraph,
collectNeighbors…)
16
Spark GraphX: Pregel
 Implements a variant of Pregel, a very popular graph processing architecture
developed at Google in 2010: “Pregel: a system for large-scale graph
processing - Malewicz et al.”
 Based on BSP (Bulk Synchronous Parallel)
 Vertex-centric: edges don't carry any computation
 Runs in sequence of iterations (supersteps)
 For each superstep, every vertex can:
• read messages sent to it during the previous superstep
• execute a user-defined function and generate some messages
• send messages to other vertices (that will be received at the next
superstep)
• vote to halt
17
Spark GraphX: Pregel
collecte
fusion
Vertex
Vertex
Vertex
Vertex
Vertex
Vertex
Superstep n-1 Superstep n Superstep n+1
sendMsg
sendMsg
sendMsg
(other vertices)
sendMsg
sendMsg
(other vertices)
mergeMsg vprog
haltVertex
18
TITAN: global architecture
 Optimized for storing and querying billions of vertices and edges over a cluster
 Supports thousands of concurrent users
 Can execute local queries (OLTP) or distributed queries across a cluster (OLAP)
19
TITAN: GREMLIN
 Gremlin OLTP: for local graphs traversal, queries run in a single process
 Gremlin Hadoop (OLAP): for large graph analysis, queries are distributed
across a cluster
 Gremlin Hadoop implements BSP-based vertex-centric computing
 Gremlin is a graph traversal scripting language and is part
of Tinkerpop, a widely supported open-source graph
computing framework (e.g., Neo4J, OrientDB, Sparksee…)
 Blueprints is like a JDBC driver for graphs
+++
Storage backend
Graph Database
Tinkerpop API Graph Processing
20
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
21
Traversing a time-varying graph
 We have as inputs
• A period of observation [t1, t2]
• A particular vertex as a starting point
 We know, for each vertex in the graph,
when it was inactive
• For an equipment: taken down, under
maintenance, commutated, …
 We want to traverse the graph, following
the paths as they vary according to the
states of the vertices encountered
 We devise a “Trickling” algorithm that can
be easily implemented with either
paradigm, Pregel or Gremlin
22
&
=
&
=
&
=
&
=
&
=
&
=
?
t1 t2
What paths from source
S can we follow between
t1 and t2?
VERTEX ACTIVITY
OBSERVATION
ACTUAL FLOW
S
Trickling down
23
=
=
=
==
=
What vertices are
reachable from substation
S at time t?
S
CONNECTED
CONNECTED
CONNECTED
CONNECTEDCONNECTED
DISCONNECTED
t t+ε
ACTUAL FLOW
BC1 – Get a picture of all powered components
?
24
=
=
=
==
=
t1 t2
For how long were vertices
reachable from S actually
powered between t1 and t2?
∑ intervals = 100%
∑ intervals = 100%
∑ intervals = 75%
∑ intervals = 55%∑ intervals = 75%
∑ intervals = 0%
ACTUAL FLOWS
BC2 – Track how long equipments are powered
?
25
==
=
How much aggregated power
did flow from substation S to
leaves between t1 and t2?
∑ loads = 1,988 MWh
1,132 MWh 856 MWh
0 MWh
kW
kW
kW
S
t1 t2
BC3 – Manage energy balance
?
26
Trickling with Gremlin
27
Trickling with GraphX
28
From what other substations S’ can vertices below
S be reached?  Pattern matching
S
S’
SUBSTATION
LINE
SWITCH
SUBSTATION
BC4 – Restore power supply after a failure
?
SWITCH
(many things)
29
^[d]+.*$!
// God created comments, and saw that it was good
Pattern matching with Gremlin
30
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
31
Titan at scale
 A set of connected trees adding up to ~ 100 million vertices and ~ 100 million
edges, loaded on a 10-node HBase 1.1 cluster
• The quasi-tree below each substation S contains roughly 50,000
vertices
• The Titan table is made of 3 regions that are equally sollicited
• The client machine is on the same network as the HBase servers
 Unitary execution with Titan’s OLTP interface
Query Approximate time
BC1 – Get powered components
Trickling
1 min
BC3 – Get aggregated load
Trickling
2 min
BC4 – Find backup substations
Pattern matching
1 min
33
Scaling across substations
 The execution times above are for a single source substation S
 To compute over several substations, we run the same query in
parallel with a different input
Client node
Titan
+
Gremlin
Query Query Query
Query Query Query
Query Query Query
… … … HBase Cluster
34
Scaling across substations – Results
35
What about Titan OLAP?
 The execution times above are predictible but quickly become impractical
for interactive querying
 We may want to compute some KPIs in advance with OLAP backends,
re-using the same Gremlin queries
Titan 1.0.0 or 1.1.0-snapshot?
TinkerPop 3.0.1 or 3.1.0?
Titan + TinkerPop JARs or the other way around?
Giraph or Spark as the backend?
Be patient!
Support for Hadoop 2 OLAP is still limited. Bits are
being moved around between the two projects,
which are undergoing a big refactoring…
36
Next step: GraphX
 Spark GraphX is another natural candidate for OLAP workloads
 We have yet to benchmark it against our big graph
But wait!
GraphFrames for Spark is just being
released by Databricks.
What’s going on?
http://graphframes.github.io/
37
Outline
1. CONTEXT AND PROBLEM DESCRIPTION
2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS
3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN
4. QUERYING TIME-VARYING GRAPHS
5. SCALING UP ON BIG GRAPHS
6. AS A CONCLUSION
Brice Richard - Flickr
KC Tan Phoyography - Flickr
38
A turning point for graph analytics
 A lot is happening these days
• ThinkAurelius’s acquisition by Datastax (will they continue to support
HBase?)
• Titan and TinkerPop cross-refactorings
• Improvements in the computation backends (Giraph, Spark)
• Better support for Hadoop 2 and YARN
• Introduction of GraphFrames by Databricks
 Graph analytics frameworks are becoming commodity, and this is a good thing!
39
Which framework to use?
 A quick sketch about the typical usages of the frameworks:
 Be sure to test them thoroughly in your environment before making
a final decision. Be prepared to go deep into Hadoop and Java/Scala
internals!
 Or maybe wait a little bit until the situation gets clearer?
Framework Usages
Titan OLTP
Interactive querying, for a relatively small number
of vertices
Titan OLAP
Batch computations, KPI
(but wait for Hadoop 2 support if you’re using HBase)
Spark GraphX Batch computations, KPI
GraphFrames ? (it’s too early)
40
Appendix
Appendix A: The ecosystem of graph-based solutions (Graph Databases)
Appendix B: The ecosystem of graph-based solutions (Graph Processing
frameworks)
Appendix C: Graph traversal
Appendix D: Architecture with Spark GraphX
Appendix E: Architecture with Titan
41
Appendix A: The ecosystem of graph-based solutions
Graph Databases (OLTP)
 Optimized for local graph exploration (traversal), with low latency
 Optimized for handling multiple concurrent users
 Data can be distributed across several machines
 Queries themselves are not distributed: global graph analyses are
inefficient
42
Appendix B: The ecosystem of graph-based solutions
Graph Processing Frameworks (OLAP)
 Optimized for global graph processing (batch)
 Queries and data are distributed across a cluster, and can handle very
large graphs
 Has a higher latency than OLTP solutions
 Cannot handle a lot of concurrent users
43
Job 4Job 2 Job 3Job 1
Appendix C: Graph traversal
Src
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx Vtx
Vtx Vtx
Vtx Vtx
Vtx
Vtx
Src
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx
Vtx Vtx
Vtx Vtx
Vtx Vtx
Vtx
Vtx
Computation
nodes
Depth-first:
Breadth-first:
Hard to
parallelize !
Low latency
High latency
44
Spark
Edges Vertices Vertex States Load curves
Edge RDD Vertex RDD
GraphX processors
Use case 1 Use case 2 Use case 3 …
Final result
Add intervals as properties
Combine
Appendix D: Architecture with GraphX
45
JVM
Groovy scripts library
Edges Vertices Load curves
Titan APIs
Transactional Gremlin
Add intervals as
properties
Combine load curves
HBase API
Use case
1
Use case
2
…
Vertex States
Appendix E: Architecture with Titan
Store load curves as time series

More Related Content

What's hot

Elasticsearch Monitoring in Openshift
Elasticsearch Monitoring in OpenshiftElasticsearch Monitoring in Openshift
Elasticsearch Monitoring in OpenshiftLukas Vlcek
 
CI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdCI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdBilly Yuen
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodologylaeshin park
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDell World
 
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsiRoom 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsiVietnam Open Infrastructure User Group
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesDeploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesPetteriTeikariPhD
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD processHYS Enterprise
 
A Hands-On Introduction To Docker Containers.pdf
A Hands-On Introduction To Docker Containers.pdfA Hands-On Introduction To Docker Containers.pdf
A Hands-On Introduction To Docker Containers.pdfEdith Puclla
 
Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...
Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...
Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...Susanne Kaiser
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesMax De Marzi
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsJulian Mazzitelli
 
CI-Jenkins.pptx
CI-Jenkins.pptxCI-Jenkins.pptx
CI-Jenkins.pptxMEDOBEST1
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system Sarvesh Meena
 

What's hot (20)

Elasticsearch Monitoring in Openshift
Elasticsearch Monitoring in OpenshiftElasticsearch Monitoring in Openshift
Elasticsearch Monitoring in Openshift
 
cloud computing models
cloud computing modelscloud computing models
cloud computing models
 
CI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdCI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cd
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsiRoom 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesDeploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and Kubernetes
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
A Hands-On Introduction To Docker Containers.pdf
A Hands-On Introduction To Docker Containers.pdfA Hands-On Introduction To Docker Containers.pdf
A Hands-On Introduction To Docker Containers.pdf
 
Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...
Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...
Building Adaptive Systems with Wardley Mapping, Domain-Driven Design, and Tea...
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
CI-Jenkins.pptx
CI-Jenkins.pptxCI-Jenkins.pptx
CI-Jenkins.pptx
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system
 
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Meetup 23 - 03 - Application Delivery on K8S with GitOpsMeetup 23 - 03 - Application Delivery on K8S with GitOps
Meetup 23 - 03 - Application Delivery on K8S with GitOps
 
Api enablement-mainframe
Api enablement-mainframeApi enablement-mainframe
Api enablement-mainframe
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
cilium-public.pdf
cilium-public.pdfcilium-public.pdf
cilium-public.pdf
 

Viewers also liked

GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveSpark Summit
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analyticsSigmoid
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™Databricks
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with SparkGhulam Imaduddin
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageMarko Rodriguez
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaSpark Summit
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphIoan Toma
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Yves Raimond
 

Viewers also liked (20)

Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Temporal graph
Temporal graphTemporal graph
Temporal graph
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy Ryza
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
Graph Based Pattern Recognition
Graph Based Pattern RecognitionGraph Based Pattern Recognition
Graph Based Pattern Recognition
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 

Similar to Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks

Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsHarnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsSimeon Fitch
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Kevin Xu
 
Dimensional Building in Ark Design V1, V2, V3.pdf
Dimensional Building in Ark Design V1, V2, V3.pdfDimensional Building in Ark Design V1, V2, V3.pdf
Dimensional Building in Ark Design V1, V2, V3.pdfBrij Consulting, LLC
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkDatabricks
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengDatabricks
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...Spark Summit
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Numerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parametersNumerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parametersMilan Milošević
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantParis Carbone
 

Similar to Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks (20)

Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsHarnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
 
Dimensional Building in Ark Design V1, V2, V3.pdf
Dimensional Building in Ark Design V1, V2, V3.pdfDimensional Building in Ark Design V1, V2, V3.pdf
Dimensional Building in Ark Design V1, V2, V3.pdf
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Numerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parametersNumerical inflation: simulation of observational parameters
Numerical inflation: simulation of observational parameters
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks

  • 1. Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Hadoop Summit 2016, Dublin, April 13th Thomas VIAL, OCTO Technology - tvial@octo.com Guillaume GERMAINE, EDF R&D - guillaume.germaine@edf.fr Marie-Luce Picard, Benoit Grossin, Martin Soppé, Michel Lutz
  • 2. 2 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 3. 3 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 4. 4 ELECTRICITY GENERATION 623.5 TWH All electricity-related activities Generation Transmission & Distribution Trading and Sales & Marketing Energy services Key figures* €72.9 billion in sales 38.5 million customers 158,161 employees worldwide 84.7% of generation does not emit CO2 2014 INVESTMENTS €4.5 BILLION EDF: A GLOBAL LEADER IN ELECTRICITY *as of 2015
  • 5. 5 Description of the French electrical network  2200 distribution substations  618 000 kilometers of medium voltage network  697 000 kilometers of low voltage network  A great diversity of components: lines, transformers, switches, line brakers, meters… High voltage Medium voltage Low voltage Transmission network Distribution network  More than 100 million components
  • 6. 6 French electrical network: more than 100m equipments to manage Distribution substation Meters … X2200 200+ nodes in depth ~ 50000 nodes 100m+ elements
  • 7. 7 Some business cases to tackle  BC1 Get a global picture of all powered components, at any given time  BC2 Track the physical evolution of the network over time (new meters, reconfiguration of the network…)  BC3 Process electrical energy balance for any subpart of the network, over any period of time  BC4 If an equipment fails, figure out the best way to restore power supply as quickly as possible, given the state of the network at this specific time
  • 8. 8 How to modelize the problem?  define relationships between technical equipments … it’s all about graphs  associate some properties to each component  keep track of events across time  explore the network, with complex calculations at hand We have to find a way to:
  • 9. 9 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 10. 10 A property graph as a data model Edge Vertex Properties key1:value key2:value key1:value key2:value key1:value key2:value Label
  • 11. 11  All components of the network are modelized as vertices: • Distribution substations • Switches • Line brakers • Transformers • Loads • Lines • …  Edges only represent symbolic, non-oriented links between components  Electrical lines are also modelized as vertices! A property graph as a data model  Load curves (metering data) are not stored as properties into the graph structure, but in a separate storage (HBase in this study)
  • 12. 12 Graphs and temporal data: not so simple!  A unique graph to record every event that has ever occurred  All events are recorded as vertex properties, in the form of time intervals:  Validity: track equipments life cycle (actual period as a working unit)  Failure: track equipments failures  SwitchState: track network reconfiguration through switches changes of state (open/close)  Each time an event occurs, a time interval is updated, or created The concept: apply graph mutations at each new event, enriching the “maximal” graph
  • 13. 13 Graphs and temporal data: not so simple! The concept: apply graph mutations at each new event, enriching the “maximal” graph Line LoadSrc Line Load Switch Line Validity : ]-∞, +∞[ Failure : [] LoadLine Validity : ]-∞, +∞[ Failure : [] Validity : ]-∞, +∞[ Failure : [] Validity : [t1, +∞[ Failure : [] SwitchState : [] DefaultState : closed Validity : ]-∞, +∞[ Failure : [] Validity : ]-∞, +∞[ Failure : [] Validity : ]-∞, +∞[ Failure : [] Validity : ]-∞, t2] Failure : [] Validity : ]-∞, +∞[ Failure : [t3, +∞[ Validity : [t1, +∞[ Failure : [] SwitchState : [t4, +∞[ DefaultState : closed
  • 14. 14 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 15. 15 Spark GraphX: global architecture  Offers a graph API built on top of Spark  Extends RDD to represent property graphs: VertexRDD and EdgeRDD  Implements a collection of graph algorithms (e.g., Page Rank, Triangle Counting, Connected Components…) and also some fundamental operations on graphs (e.g., mapVertices, mapEdges, subgraph, collectNeighbors…)
  • 16. 16 Spark GraphX: Pregel  Implements a variant of Pregel, a very popular graph processing architecture developed at Google in 2010: “Pregel: a system for large-scale graph processing - Malewicz et al.”  Based on BSP (Bulk Synchronous Parallel)  Vertex-centric: edges don't carry any computation  Runs in sequence of iterations (supersteps)  For each superstep, every vertex can: • read messages sent to it during the previous superstep • execute a user-defined function and generate some messages • send messages to other vertices (that will be received at the next superstep) • vote to halt
  • 17. 17 Spark GraphX: Pregel collecte fusion Vertex Vertex Vertex Vertex Vertex Vertex Superstep n-1 Superstep n Superstep n+1 sendMsg sendMsg sendMsg (other vertices) sendMsg sendMsg (other vertices) mergeMsg vprog haltVertex
  • 18. 18 TITAN: global architecture  Optimized for storing and querying billions of vertices and edges over a cluster  Supports thousands of concurrent users  Can execute local queries (OLTP) or distributed queries across a cluster (OLAP)
  • 19. 19 TITAN: GREMLIN  Gremlin OLTP: for local graphs traversal, queries run in a single process  Gremlin Hadoop (OLAP): for large graph analysis, queries are distributed across a cluster  Gremlin Hadoop implements BSP-based vertex-centric computing  Gremlin is a graph traversal scripting language and is part of Tinkerpop, a widely supported open-source graph computing framework (e.g., Neo4J, OrientDB, Sparksee…)  Blueprints is like a JDBC driver for graphs +++ Storage backend Graph Database Tinkerpop API Graph Processing
  • 20. 20 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 21. 21 Traversing a time-varying graph  We have as inputs • A period of observation [t1, t2] • A particular vertex as a starting point  We know, for each vertex in the graph, when it was inactive • For an equipment: taken down, under maintenance, commutated, …  We want to traverse the graph, following the paths as they vary according to the states of the vertices encountered  We devise a “Trickling” algorithm that can be easily implemented with either paradigm, Pregel or Gremlin
  • 22. 22 & = & = & = & = & = & = ? t1 t2 What paths from source S can we follow between t1 and t2? VERTEX ACTIVITY OBSERVATION ACTUAL FLOW S Trickling down
  • 23. 23 = = = == = What vertices are reachable from substation S at time t? S CONNECTED CONNECTED CONNECTED CONNECTEDCONNECTED DISCONNECTED t t+ε ACTUAL FLOW BC1 – Get a picture of all powered components ?
  • 24. 24 = = = == = t1 t2 For how long were vertices reachable from S actually powered between t1 and t2? ∑ intervals = 100% ∑ intervals = 100% ∑ intervals = 75% ∑ intervals = 55%∑ intervals = 75% ∑ intervals = 0% ACTUAL FLOWS BC2 – Track how long equipments are powered ?
  • 25. 25 == = How much aggregated power did flow from substation S to leaves between t1 and t2? ∑ loads = 1,988 MWh 1,132 MWh 856 MWh 0 MWh kW kW kW S t1 t2 BC3 – Manage energy balance ?
  • 28. 28 From what other substations S’ can vertices below S be reached?  Pattern matching S S’ SUBSTATION LINE SWITCH SUBSTATION BC4 – Restore power supply after a failure ? SWITCH (many things)
  • 29. 29 ^[d]+.*$! // God created comments, and saw that it was good Pattern matching with Gremlin
  • 30. 30 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 31. 31 Titan at scale  A set of connected trees adding up to ~ 100 million vertices and ~ 100 million edges, loaded on a 10-node HBase 1.1 cluster • The quasi-tree below each substation S contains roughly 50,000 vertices • The Titan table is made of 3 regions that are equally sollicited • The client machine is on the same network as the HBase servers  Unitary execution with Titan’s OLTP interface Query Approximate time BC1 – Get powered components Trickling 1 min BC3 – Get aggregated load Trickling 2 min BC4 – Find backup substations Pattern matching 1 min
  • 32. 33 Scaling across substations  The execution times above are for a single source substation S  To compute over several substations, we run the same query in parallel with a different input Client node Titan + Gremlin Query Query Query Query Query Query Query Query Query … … … HBase Cluster
  • 34. 35 What about Titan OLAP?  The execution times above are predictible but quickly become impractical for interactive querying  We may want to compute some KPIs in advance with OLAP backends, re-using the same Gremlin queries Titan 1.0.0 or 1.1.0-snapshot? TinkerPop 3.0.1 or 3.1.0? Titan + TinkerPop JARs or the other way around? Giraph or Spark as the backend? Be patient! Support for Hadoop 2 OLAP is still limited. Bits are being moved around between the two projects, which are undergoing a big refactoring…
  • 35. 36 Next step: GraphX  Spark GraphX is another natural candidate for OLAP workloads  We have yet to benchmark it against our big graph But wait! GraphFrames for Spark is just being released by Databricks. What’s going on? http://graphframes.github.io/
  • 36. 37 Outline 1. CONTEXT AND PROBLEM DESCRIPTION 2. GRAPHS: KEY CONCEPTS AND TECHNICAL INSIGHTS 3. TWO CHALLENGERS: SPARK GRAPHX AND TITAN 4. QUERYING TIME-VARYING GRAPHS 5. SCALING UP ON BIG GRAPHS 6. AS A CONCLUSION Brice Richard - Flickr KC Tan Phoyography - Flickr
  • 37. 38 A turning point for graph analytics  A lot is happening these days • ThinkAurelius’s acquisition by Datastax (will they continue to support HBase?) • Titan and TinkerPop cross-refactorings • Improvements in the computation backends (Giraph, Spark) • Better support for Hadoop 2 and YARN • Introduction of GraphFrames by Databricks  Graph analytics frameworks are becoming commodity, and this is a good thing!
  • 38. 39 Which framework to use?  A quick sketch about the typical usages of the frameworks:  Be sure to test them thoroughly in your environment before making a final decision. Be prepared to go deep into Hadoop and Java/Scala internals!  Or maybe wait a little bit until the situation gets clearer? Framework Usages Titan OLTP Interactive querying, for a relatively small number of vertices Titan OLAP Batch computations, KPI (but wait for Hadoop 2 support if you’re using HBase) Spark GraphX Batch computations, KPI GraphFrames ? (it’s too early)
  • 39. 40 Appendix Appendix A: The ecosystem of graph-based solutions (Graph Databases) Appendix B: The ecosystem of graph-based solutions (Graph Processing frameworks) Appendix C: Graph traversal Appendix D: Architecture with Spark GraphX Appendix E: Architecture with Titan
  • 40. 41 Appendix A: The ecosystem of graph-based solutions Graph Databases (OLTP)  Optimized for local graph exploration (traversal), with low latency  Optimized for handling multiple concurrent users  Data can be distributed across several machines  Queries themselves are not distributed: global graph analyses are inefficient
  • 41. 42 Appendix B: The ecosystem of graph-based solutions Graph Processing Frameworks (OLAP)  Optimized for global graph processing (batch)  Queries and data are distributed across a cluster, and can handle very large graphs  Has a higher latency than OLTP solutions  Cannot handle a lot of concurrent users
  • 42. 43 Job 4Job 2 Job 3Job 1 Appendix C: Graph traversal Src Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Src Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Vtx Computation nodes Depth-first: Breadth-first: Hard to parallelize ! Low latency High latency
  • 43. 44 Spark Edges Vertices Vertex States Load curves Edge RDD Vertex RDD GraphX processors Use case 1 Use case 2 Use case 3 … Final result Add intervals as properties Combine Appendix D: Architecture with GraphX
  • 44. 45 JVM Groovy scripts library Edges Vertices Load curves Titan APIs Transactional Gremlin Add intervals as properties Combine load curves HBase API Use case 1 Use case 2 … Vertex States Appendix E: Architecture with Titan Store load curves as time series