SlideShare uma empresa Scribd logo
1 de 38
hosted by
HBaseConAsia2018
JanusGraph —
Distributed graph database with HBase
XueMin Zhang @ TalkingData
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
About us
• Seven years of practical experience in technical research and development(R&D),focusing
on distributed storage, distributed computing, real-time computing, etc.
• Successively worked in Sina Weibo and TalkingData, and served as the big data Team
Leader of Sina r&d center.
• Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference
and DTCC.
About me
• Founded in 2011, TalkingData is China’s leading third-party big data platform. With
SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers
enterprises and helps them achieve a data-driven digital transformation.
• From the beginning, TalkingData’s vision of using “big data for smarter business
decisions and a better world” has allowed it to gradually become China’s leading data
intelligence solution provider. TalkingData creates value for clients and serves as their
“performance partner,” helping modern enterprises achieve data-driven transformation
and accelerating the digitization of clients from various industries. Using data-generated
insights to change how people see the world and themselves, TalkingData hopes to
ultimately improve people’s lives.
About TalkingData
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Something about Graph
What is a Graph Database
 As name suggests, it is a database.
 Uses graph structures for semantic queries with nodes, edges
and properties to represent and store data.
 Allow data in the store to be linked together directly.
 compare with traditional relational databases
 Hybrid relations.
 Handy in finding connections between entities.
hosted by
Something about Graph
Graph Structures - Vertices
 Vertices are the nodes or
points in a graph
structure
 Every vertex may contain
a unique ID.
hosted by
Something about Graph
Graph Structures - Vertices
 Vertices are the nodes or
points in a graph
structure
 Every vertex may contain
a unique ID.
 Vertices can be
associated with a set of
properties (key-value
pairs)
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the
connections between the
vertices in a graph
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the connections
between the vertices in a
graph
 Edges can be
nondirectional, directional,
or bidirectional
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the connections
between the vertices in a
graph
 Edges can be
nondirectional,
directional, or
bidirectional
 Edges like vertices can
have properties and id
hosted by
Something about Graph
Graph Structures - Graph
 G = (V, E)
 The graph is the
collection of vertices,
edges, and associated
properties
 Vertices and edges can
use label classification
hosted by
Something about Graph
Graph Storage Model - Adjacency Matrix
0 1 1 1 0 0
1 0 0 0 0 0
1 0 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 0
0 0 1 0 0 0
1 2 3 4 5 6G.vertices =
G.edges = 1
2
3
4
5
6
1 2 3 4 5 6
hosted by
Something about Graph
Graph Storage Model - Adjacency Lists
1
2
3
4
5
6
2 3 4 Λ
1 Λ
1 4 6 Λ
1 3 5 Λ
4 Λ
3 Λ
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Introduction to JanusGraph
 Scalable graph database distribute on multi-maching clusters with
pluggable storage and indexing.
 Fully compliant with Apache TinkerPop graph computing framework.
 Optimized for storing/querying billions of vertices and edges.
 Supports thousands of concurrent users.
 Can execute local queries (OLTP) or cross-cluster distributed queries
(OLAP).
 Sponsored by the Linux Foundation.
 Apache License 2.0
hosted by
Introduction to JanusGraph
Architecture
hosted by
Introduction to JanusGraph
Apache Tinkerpop & Gremlin
 A graph computing
framework for both graph
databases (OLTP) and
graph analytic systems
(OLAP)
 Gremlin graph traversal
language
hosted by
Introduction to JanusGraph
Schema and Data Modeling
 Consist of edge labels, property keys, vertex labels ,index
 Explicit or Implicit
 Can evolve over time without database downtime
hosted by
Introduction to JanusGraph
Schema - Edge Label Multiplicity
 MULTI: Multiple edges of the same label between vertices
 SIMPLE: One edge with that label (unique per label)
 MANY2ONE: One outgoing edge with that label
 ONE2MANY: One incoming edge with that label
 ONE2ONE: One incoming, one outgoing edge with that label
hosted by
Introduction to JanusGraph
Schema - Property Key Data Types
hosted by
Introduction to JanusGraph
Schema - Property Key Cardinality
 SINGLE: At most one value per element.
 LIST: Arbitrary number of values per element. Allows duplicates.
 SET: Multiple values, but no duplicates.
hosted by
Introduction to JanusGraph
Storage Model
hosted by
Introduction to JanusGraph
What is Graph Partitioning?
 When the JanusGraph cluster consists of multiple storage backend
instances, the graph must be partitioned across those machines.
 Stores graph in an adjacency list , ssignment of vertices to machines
determines the partitioning.
 Different ways to partition a graph
 Random Graph Partitioning
 Explicit Graph Partitioning
hosted by
Introduction to JanusGraph
Random Graph Partitioning
 Pros
 Very efficient
 Requires no configuration
 Results in balanced partitions
 Cons
 Less efficient query processing as the cluster grows
 Requires more cross-instance communication to retrieve the
desired
hosted by
Introduction to JanusGraph
Explicit Graph Partitioning
 Pros
 Ensures strongly connected subgraphs are stored on the same
instance
 Reduces the communication overhead significantly
 Easy to setup
 Cons
 Only enabled against storage backends that support ordered key
 Hotspot issue
hosted by
Introduction to JanusGraph
Edge Cut & Vertex Cut
 Edge Cut
 Vertices are hosted on separate machines.
 Optimization aims to reduce the cross communication and thereby
improve query execution.
 Vertex Cut (by label)
 A vertex label can be defined as partitioned which means that all
vertices of that label will be partitiond across the cluster.
 In other words, Storing a subset of that vertex’s adjacency list on each
partition .
 Address the hotspot issue caused by vertices with a large number of
incident edges.
hosted by
Introduction to JanusGraph
What is Graph Index?
 graph indexes : efficient retrieval of vertices or edges by their
properties
 Composite Index (supported through the primary storage
backend)
 Mixed Index (supported through external indexing backend)
 vertex-centric indexes : effectively address query performance for
large degree vertices
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 Tight integration with the Apache Hadoop ecosystem.
 Native support for strong consistency.
 Linear scalability with the addition of more machines.
 Scalability and partitioning
 Read and write speed
 Big enough for your biggest graph
 Support for exporting metrics via JMX.
 Great open community
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 Simple configuration
 storage.backend=hbase
 storage.hostname=zk-host1,zk-host2,zk-host3
 storage.hbase.table=janusgraph
 storage.port=2181
 storage.hbase.ext.zookeeper.znode.parent=/hbase
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 A variety of reading and writing way
 Batch to mutate
 Get or Multi Get
 Key range scan
 ColumnRangeFilter
 ColumnPaginationFilter
hosted by
JanusGraph with HBase
HBase Storage Model - Column Families
CF attributes can be set. E.g. compression, TTL.
 Edge store -> e
 Index store -> g
 Id store -> i
 Transaction log store -> l
 System property store -> s
hosted by
JanusGraph with HBase
HBase Storage Model - Edge store -> e
 Storage vertex label, edge, property data
 RowKey -> Vertex ID
• Count
• ID padding
• Partition ID
 Vertex label save as edge
 Vertex property and edge save as relation
• Relation ID( Property key id / Edge label id + direction )
hosted by
JanusGraph with HBase
HBase Storage Model - Edge store -> e
hosted by
JanusGraph with HBase
HBase Storage Model - Index store -> g
 Storage graph indexes (Composite Index) data
 Rowkey -> property values
 Cell value->
• relationId
• outVertexId
• typeId
• inVertexId
hosted by
JanusGraph with HBase
Optimization Suggestions
 hbase.regionserver.thread.compaction.large/small
 hbase.hstore.flusher.count
 hbase.hregion.memstore.flush.sizeh
 base.hregion.memstore.block.multiplier
 hbase.hregion.percolumnfamilyflush.size.lower.bound
 hbase.regionserver.global.memstore.size
 hfile.block.cache.size
 hbase.regionserver.global.memstore.size.lower.limit
(hbase.regionserver.global.memstore.lowerLimit)
 Random vs. Explicit Partitioning
hosted by
Thanks

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all togetherBigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
 
07 big data sgbd
07 big data sgbd07 big data sgbd
07 big data sgbd
 
Chp1 - Introduction à l'Informatique Décisionnelle
Chp1 - Introduction à l'Informatique DécisionnelleChp1 - Introduction à l'Informatique Décisionnelle
Chp1 - Introduction à l'Informatique Décisionnelle
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Operations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningOperations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from Happening
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 
Presentation mantis
Presentation mantisPresentation mantis
Presentation mantis
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
BigData_Chp1: Introduction à la Big Data
BigData_Chp1: Introduction à la Big DataBigData_Chp1: Introduction à la Big Data
BigData_Chp1: Introduction à la Big Data
 
Polymorphisme (cours, résumé)
Polymorphisme (cours, résumé)Polymorphisme (cours, résumé)
Polymorphisme (cours, résumé)
 
[기초개념] Recurrent Neural Network (RNN) 소개
[기초개념] Recurrent Neural Network (RNN) 소개[기초개념] Recurrent Neural Network (RNN) 소개
[기초개념] Recurrent Neural Network (RNN) 소개
 
La gestion des exceptions avec Java
La gestion des exceptions avec JavaLa gestion des exceptions avec Java
La gestion des exceptions avec Java
 
Fondamentaux d’une API REST
Fondamentaux d’une API RESTFondamentaux d’une API REST
Fondamentaux d’une API REST
 
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
아름답고 유연한 데이터 파이프라인 구축을 위한 Amazon Managed Workflow for Apache Airflow - 유다니엘 A...
 
Cours éthique et droit liés aux données numériques
Cours éthique et droit liés aux données numériquesCours éthique et droit liés aux données numériques
Cours éthique et droit liés aux données numériques
 

Semelhante a HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Flink Forward
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 

Semelhante a HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase (20)

Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 

Mais de Michael Stack

Mais de Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 

Último

➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 

Último (20)

➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 

HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

  • 1. hosted by HBaseConAsia2018 JanusGraph — Distributed graph database with HBase XueMin Zhang @ TalkingData
  • 2. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 3. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 4. hosted by About us • Seven years of practical experience in technical research and development(R&D),focusing on distributed storage, distributed computing, real-time computing, etc. • Successively worked in Sina Weibo and TalkingData, and served as the big data Team Leader of Sina r&d center. • Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference and DTCC. About me • Founded in 2011, TalkingData is China’s leading third-party big data platform. With SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers enterprises and helps them achieve a data-driven digital transformation. • From the beginning, TalkingData’s vision of using “big data for smarter business decisions and a better world” has allowed it to gradually become China’s leading data intelligence solution provider. TalkingData creates value for clients and serves as their “performance partner,” helping modern enterprises achieve data-driven transformation and accelerating the digitization of clients from various industries. Using data-generated insights to change how people see the world and themselves, TalkingData hopes to ultimately improve people’s lives. About TalkingData
  • 5. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 6. hosted by Something about Graph What is a Graph Database  As name suggests, it is a database.  Uses graph structures for semantic queries with nodes, edges and properties to represent and store data.  Allow data in the store to be linked together directly.  compare with traditional relational databases  Hybrid relations.  Handy in finding connections between entities.
  • 7. hosted by Something about Graph Graph Structures - Vertices  Vertices are the nodes or points in a graph structure  Every vertex may contain a unique ID.
  • 8. hosted by Something about Graph Graph Structures - Vertices  Vertices are the nodes or points in a graph structure  Every vertex may contain a unique ID.  Vertices can be associated with a set of properties (key-value pairs)
  • 9. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph
  • 10. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph  Edges can be nondirectional, directional, or bidirectional
  • 11. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph  Edges can be nondirectional, directional, or bidirectional  Edges like vertices can have properties and id
  • 12. hosted by Something about Graph Graph Structures - Graph  G = (V, E)  The graph is the collection of vertices, edges, and associated properties  Vertices and edges can use label classification
  • 13. hosted by Something about Graph Graph Storage Model - Adjacency Matrix 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 2 3 4 5 6G.vertices = G.edges = 1 2 3 4 5 6 1 2 3 4 5 6
  • 14. hosted by Something about Graph Graph Storage Model - Adjacency Lists 1 2 3 4 5 6 2 3 4 Λ 1 Λ 1 4 6 Λ 1 3 5 Λ 4 Λ 3 Λ
  • 15. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 16. hosted by Introduction to JanusGraph  Scalable graph database distribute on multi-maching clusters with pluggable storage and indexing.  Fully compliant with Apache TinkerPop graph computing framework.  Optimized for storing/querying billions of vertices and edges.  Supports thousands of concurrent users.  Can execute local queries (OLTP) or cross-cluster distributed queries (OLAP).  Sponsored by the Linux Foundation.  Apache License 2.0
  • 17. hosted by Introduction to JanusGraph Architecture
  • 18. hosted by Introduction to JanusGraph Apache Tinkerpop & Gremlin  A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)  Gremlin graph traversal language
  • 19. hosted by Introduction to JanusGraph Schema and Data Modeling  Consist of edge labels, property keys, vertex labels ,index  Explicit or Implicit  Can evolve over time without database downtime
  • 20. hosted by Introduction to JanusGraph Schema - Edge Label Multiplicity  MULTI: Multiple edges of the same label between vertices  SIMPLE: One edge with that label (unique per label)  MANY2ONE: One outgoing edge with that label  ONE2MANY: One incoming edge with that label  ONE2ONE: One incoming, one outgoing edge with that label
  • 21. hosted by Introduction to JanusGraph Schema - Property Key Data Types
  • 22. hosted by Introduction to JanusGraph Schema - Property Key Cardinality  SINGLE: At most one value per element.  LIST: Arbitrary number of values per element. Allows duplicates.  SET: Multiple values, but no duplicates.
  • 23. hosted by Introduction to JanusGraph Storage Model
  • 24. hosted by Introduction to JanusGraph What is Graph Partitioning?  When the JanusGraph cluster consists of multiple storage backend instances, the graph must be partitioned across those machines.  Stores graph in an adjacency list , ssignment of vertices to machines determines the partitioning.  Different ways to partition a graph  Random Graph Partitioning  Explicit Graph Partitioning
  • 25. hosted by Introduction to JanusGraph Random Graph Partitioning  Pros  Very efficient  Requires no configuration  Results in balanced partitions  Cons  Less efficient query processing as the cluster grows  Requires more cross-instance communication to retrieve the desired
  • 26. hosted by Introduction to JanusGraph Explicit Graph Partitioning  Pros  Ensures strongly connected subgraphs are stored on the same instance  Reduces the communication overhead significantly  Easy to setup  Cons  Only enabled against storage backends that support ordered key  Hotspot issue
  • 27. hosted by Introduction to JanusGraph Edge Cut & Vertex Cut  Edge Cut  Vertices are hosted on separate machines.  Optimization aims to reduce the cross communication and thereby improve query execution.  Vertex Cut (by label)  A vertex label can be defined as partitioned which means that all vertices of that label will be partitiond across the cluster.  In other words, Storing a subset of that vertex’s adjacency list on each partition .  Address the hotspot issue caused by vertices with a large number of incident edges.
  • 28. hosted by Introduction to JanusGraph What is Graph Index?  graph indexes : efficient retrieval of vertices or edges by their properties  Composite Index (supported through the primary storage backend)  Mixed Index (supported through external indexing backend)  vertex-centric indexes : effectively address query performance for large degree vertices
  • 29. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 30. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  Tight integration with the Apache Hadoop ecosystem.  Native support for strong consistency.  Linear scalability with the addition of more machines.  Scalability and partitioning  Read and write speed  Big enough for your biggest graph  Support for exporting metrics via JMX.  Great open community
  • 31. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  Simple configuration  storage.backend=hbase  storage.hostname=zk-host1,zk-host2,zk-host3  storage.hbase.table=janusgraph  storage.port=2181  storage.hbase.ext.zookeeper.znode.parent=/hbase
  • 32. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  A variety of reading and writing way  Batch to mutate  Get or Multi Get  Key range scan  ColumnRangeFilter  ColumnPaginationFilter
  • 33. hosted by JanusGraph with HBase HBase Storage Model - Column Families CF attributes can be set. E.g. compression, TTL.  Edge store -> e  Index store -> g  Id store -> i  Transaction log store -> l  System property store -> s
  • 34. hosted by JanusGraph with HBase HBase Storage Model - Edge store -> e  Storage vertex label, edge, property data  RowKey -> Vertex ID • Count • ID padding • Partition ID  Vertex label save as edge  Vertex property and edge save as relation • Relation ID( Property key id / Edge label id + direction )
  • 35. hosted by JanusGraph with HBase HBase Storage Model - Edge store -> e
  • 36. hosted by JanusGraph with HBase HBase Storage Model - Index store -> g  Storage graph indexes (Composite Index) data  Rowkey -> property values  Cell value-> • relationId • outVertexId • typeId • inVertexId
  • 37. hosted by JanusGraph with HBase Optimization Suggestions  hbase.regionserver.thread.compaction.large/small  hbase.hstore.flusher.count  hbase.hregion.memstore.flush.sizeh  base.hregion.memstore.block.multiplier  hbase.hregion.percolumnfamilyflush.size.lower.bound  hbase.regionserver.global.memstore.size  hfile.block.cache.size  hbase.regionserver.global.memstore.size.lower.limit (hbase.regionserver.global.memstore.lowerLimit)  Random vs. Explicit Partitioning