SlideShare a Scribd company logo
1 of 18
Benchmarking graph databases on the problem 
of community detection 
Sotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris 
Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI) 
ADBIS 2014 Conference 
Ohrid, Republic of Macedonia, September 7-10, 2014
Important Note 
• This is an update of the actual presentation given in 
September 2014. After the presentation, we received 
comments about the implementation of our benchmark, 
which led us to update the implementation and rerun the 
experiments. We would like to thank @lgarulli and 
@OrientDB for their contribution. 
• The updated paper is now available on: 
http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf 
while the original is still available on the Springer site: 
http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1 
#2
#3 
Overview 
• Problem formulation 
• Related work 
• Workload description 
• Evaluation 
• Conclusions
Motivation 
• The rapid growth of Online Social Networks contributes 
to the creation of high-volume and velocity graph data 
– Twitter had 145M users in 2010 and 300M in 2011 
http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/ 
• The need for massive graph data management and 
processing systems is constantly increasing 
– Many methods and technologies available (RDBMS, OODBMS 
and graph databases) 
• Benchmarks to evaluate candidate solutions are 
absolutely essential! 
– We had to choose one in the context of the SocialSensor 
research project 
#4
Problem Formulation 
• Given a set of database management systems assess 
the performance of each system with respect to a 
common mining task: community detection 
• Many DBMS available to use 
– Graph databases selected, as they are designed to store 
and manage efficiently big graph data 
– Benchmarked systems: Titan, OrientDB and Neo4j 
• Many operations used in databases to benchmark 
– Workloads that simulate common operations in OSNs 
employed 
#5
Related Work (1) 
• Evaluation between different DBMS 
– Giatsoglou et al. focused on the Social Tagging System use 
case scenario 
– Angles et al. and Armstrong et al. proposed a synthetic 
graph generator with OSN characteristics and use this data 
for evaluation 
– Grossniklaus et al. used a workloads that cover a wide 
variety of graph data use case 
– Vicknair et al. utilized query workload that simulates 
typical operations performed in provenance system 
#6
Related Work (2) 
• Evaluation between graph DBMS 
– Bader et al. proposed a benchmark with four operations: 
insert, h-hops node and edge selection, while Dominguez 
et al. reports the implementation. 
– A similar workload is proposed by Jouili et al. emphasizing 
the effects of increasing multiple concurrent users. 
– Ciglan et al. proposed a benchmark focusing on graph 
traversal operations. 
– Dayarathna et al. used similar workload, focusing on graph 
databases server and cloud environments. 
#7
Workloads Description (1) 
Our Contribution 
•Clustering Workload (CW) 
– Very important due to its numerous applications, such as 
topic detection, photo clustering and event detection. 
– Until now most community detection algorithms used 
main memory to store the graph and perform the 
required operations  fast execution time, unable to 
manage big graphs. 
– We propose an implementation of Louvain Method on top 
of graph databases employing cache techniques  take 
advantage of both graph database capabilities and in-memory 
execution speed. 
#8
Workload Description (2) 
Supplementary Workloads 
•Massive Insertion Workload (MIW) 
– Populates a graph with bulk load operations. 
– Simulates batch creation of a graph. 
•Single Insertion Workload (SIW) 
– Every object insertion is committed directly. 
– Simulates the growth of a OSN. 
•Query Workload (QW) 
– FindNeighbors query (FN)  Finds the neighbors of all nodes 
• simulates the friends retrieval a Facebook user 
– FindAdjacentNodes query (FA)  Finds the adjacent nodes of all edges 
• used to find whether two user joined the same Facebook group 
– FindShortestPath (FS)  Finds the shortest paths between 100 nodes 
• used to find the level two LinkedIn users are connected 
#9
Experimental Study 
• Datasets 
– Synthetic for CW, generated with LFR generator. 
– Real for MIW, SIW & QW, from SNAP dataset collection. 
• Evaluation Measure: execution time (second) 
#10
Results: Comparison wrt CW 
#11
Results: Cache size impact 
#12 
Titan 
Neo4j 
OrientDB
Results: comparison wrt MIW and QW 
#13
Results: Comparison wrt SIW 
YouTube LiveJournal 
#14
Conclusions 
SUMMARY 
• OrientDB is the most efficient solution for CW, while Titan is 
the winner in SIW 
• Neo4j is the fastest graph database for MIW and QW 
• Titan and OrientDB have competitive performance in MIW 
and QW respectively  don't scale as good as Neo4j 
FUTURE WORK 
• Test with bigger graphs. 
• Run the benchmark employing the distributed version of 
Titan and OrientDB and other graph databases (Sparksee 
already added). 
• Improve performance of the implemented community 
detection method. 
#15
References (1) 
Evaluation between different DBMS 
Giatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web 
and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data 
Management 1. Volume 331 of Studies in Computational Intelligence. Springer 
Berlin Heidelberg (2011) 19-58 
Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database 
systems for social network applications. In: First International Workshop on 
Graph Data Management Experiences and Systems. GRADES '13, New York, NY, 
USA, ACM (2013) 15:1-15:7 
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database 
benchmark based on the facebook social graph. (2013) 
Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data 
management and processing. (2013) 
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a 
graph database and a relational database: A data provenance perspective. In: 
Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, 
New York, NY, USA, ACM (2010) 42:1-42:6 
#16
References (2) 
Evaluation between graph DBMS 
Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, 
B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009) 
Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez- 
Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc 
scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., 
Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. 
Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg 
(2010) 37-48 
Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph 
databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th 
International Conference on. (April 2012) 186-189 
Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social 
Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708- 
715 
Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores 
in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 
2012 IEEE 4th International Conference on. (Dec 2012) 363-370 
#17
Source: https://github.com/socialsensor/graphdb-benchmarks 
#18 
Questions 
Further contact: 
sotbeis@iti.gr / papadop@iti.gr 
Acknowledgement: 
www.socialsensor.eu

More Related Content

What's hot

Mike Sharples - Generative AI and Large Language Models in Digital Education....
Mike Sharples - Generative AI and Large Language Models in Digital Education....Mike Sharples - Generative AI and Large Language Models in Digital Education....
Mike Sharples - Generative AI and Large Language Models in Digital Education....EADTU
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
 
If You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are WrongIf You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are WrongMario Fusco
 
Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...
Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...
Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...Thomas Lancaster
 
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...Alain Goudey
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache SparkDatabricks
 
Adversarial machine learning
Adversarial machine learningAdversarial machine learning
Adversarial machine learningRama Chetan
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
 
[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI ServiceNaoki (Neo) SATO
 
[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...
[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...
[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...Amazon Web Services
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
Chat GPT - A Game Changer in Education
Chat GPT - A Game Changer in EducationChat GPT - A Game Changer in Education
Chat GPT - A Game Changer in EducationThiyagu K
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
 

What's hot (20)

Bias in AI
Bias in AIBias in AI
Bias in AI
 
Generative AI.pptx
Generative AI.pptxGenerative AI.pptx
Generative AI.pptx
 
Template C++ OOP
Template C++ OOPTemplate C++ OOP
Template C++ OOP
 
Mike Sharples - Generative AI and Large Language Models in Digital Education....
Mike Sharples - Generative AI and Large Language Models in Digital Education....Mike Sharples - Generative AI and Large Language Models in Digital Education....
Mike Sharples - Generative AI and Large Language Models in Digital Education....
 
ChatGPT.pptx
ChatGPT.pptxChatGPT.pptx
ChatGPT.pptx
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
If You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are WrongIf You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are Wrong
 
Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...
Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...
Promoting Academic Integrity in the Age of Generative AI and Contract Cheatin...
 
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
 
07. Virtual Functions
07. Virtual Functions07. Virtual Functions
07. Virtual Functions
 
Adversarial machine learning
Adversarial machine learningAdversarial machine learning
Adversarial machine learning
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service
 
[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...
[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...
[NEW LAUNCH!] Introducing Amazon Comprehend Medical (AIM398) - AWS re:Invent ...
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Chat GPT - A Game Changer in Education
Chat GPT - A Game Changer in EducationChat GPT - A Game Changer in Education
Chat GPT - A Game Changer in Education
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
 

Viewers also liked

OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesCurtis Mosters
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandrajohnrjenson
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraNakul Jeirath
 
OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0Orient Technologies
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseMayuree Srikulwong
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Sebastian Verheughe
 
Tackling a 1 billion member social network
Tackling a 1 billion member social networkTackling a 1 billion member social network
Tackling a 1 billion member social networkArtur Bańkowski
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)Luca Garulli
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQLLuigi Dell'Aquila
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Curtis Mosters
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQLLuca Garulli
 

Viewers also liked (20)

OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0
 
MongoDB vs OrientDB
MongoDB vs OrientDBMongoDB vs OrientDB
MongoDB vs OrientDB
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing Database
 
La especificacion MoReq2: Modelo de Requisitos para ERMS
La especificacion MoReq2: Modelo de Requisitos para ERMSLa especificacion MoReq2: Modelo de Requisitos para ERMS
La especificacion MoReq2: Modelo de Requisitos para ERMS
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
Tackling a 1 billion member social network
Tackling a 1 billion member social networkTackling a 1 billion member social network
Tackling a 1 billion member social network
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)
 
OrientDB
OrientDBOrientDB
OrientDB
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
 
Graph db
Graph dbGraph db
Graph db
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 

Similar to Benchmarking graph databases on the problem of community detection

EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal
 
YandongWang_Resume
YandongWang_ResumeYandongWang_Resume
YandongWang_Resumeyandong wang
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentationPaolo Missier
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneResearch Data Alliance
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchTill Blume
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSijdms
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slidesARDC
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 

Similar to Benchmarking graph databases on the problem of community detection (20)

EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
YandongWang_Resume
YandongWang_ResumeYandongWang_Resume
YandongWang_Resume
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Observlets
Observlets Observlets
Observlets
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Benchmarking graph databases on the problem of community detection

  • 1. Benchmarking graph databases on the problem of community detection Sotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI) ADBIS 2014 Conference Ohrid, Republic of Macedonia, September 7-10, 2014
  • 2. Important Note • This is an update of the actual presentation given in September 2014. After the presentation, we received comments about the implementation of our benchmark, which led us to update the implementation and rerun the experiments. We would like to thank @lgarulli and @OrientDB for their contribution. • The updated paper is now available on: http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf while the original is still available on the Springer site: http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1 #2
  • 3. #3 Overview • Problem formulation • Related work • Workload description • Evaluation • Conclusions
  • 4. Motivation • The rapid growth of Online Social Networks contributes to the creation of high-volume and velocity graph data – Twitter had 145M users in 2010 and 300M in 2011 http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/ • The need for massive graph data management and processing systems is constantly increasing – Many methods and technologies available (RDBMS, OODBMS and graph databases) • Benchmarks to evaluate candidate solutions are absolutely essential! – We had to choose one in the context of the SocialSensor research project #4
  • 5. Problem Formulation • Given a set of database management systems assess the performance of each system with respect to a common mining task: community detection • Many DBMS available to use – Graph databases selected, as they are designed to store and manage efficiently big graph data – Benchmarked systems: Titan, OrientDB and Neo4j • Many operations used in databases to benchmark – Workloads that simulate common operations in OSNs employed #5
  • 6. Related Work (1) • Evaluation between different DBMS – Giatsoglou et al. focused on the Social Tagging System use case scenario – Angles et al. and Armstrong et al. proposed a synthetic graph generator with OSN characteristics and use this data for evaluation – Grossniklaus et al. used a workloads that cover a wide variety of graph data use case – Vicknair et al. utilized query workload that simulates typical operations performed in provenance system #6
  • 7. Related Work (2) • Evaluation between graph DBMS – Bader et al. proposed a benchmark with four operations: insert, h-hops node and edge selection, while Dominguez et al. reports the implementation. – A similar workload is proposed by Jouili et al. emphasizing the effects of increasing multiple concurrent users. – Ciglan et al. proposed a benchmark focusing on graph traversal operations. – Dayarathna et al. used similar workload, focusing on graph databases server and cloud environments. #7
  • 8. Workloads Description (1) Our Contribution •Clustering Workload (CW) – Very important due to its numerous applications, such as topic detection, photo clustering and event detection. – Until now most community detection algorithms used main memory to store the graph and perform the required operations  fast execution time, unable to manage big graphs. – We propose an implementation of Louvain Method on top of graph databases employing cache techniques  take advantage of both graph database capabilities and in-memory execution speed. #8
  • 9. Workload Description (2) Supplementary Workloads •Massive Insertion Workload (MIW) – Populates a graph with bulk load operations. – Simulates batch creation of a graph. •Single Insertion Workload (SIW) – Every object insertion is committed directly. – Simulates the growth of a OSN. •Query Workload (QW) – FindNeighbors query (FN)  Finds the neighbors of all nodes • simulates the friends retrieval a Facebook user – FindAdjacentNodes query (FA)  Finds the adjacent nodes of all edges • used to find whether two user joined the same Facebook group – FindShortestPath (FS)  Finds the shortest paths between 100 nodes • used to find the level two LinkedIn users are connected #9
  • 10. Experimental Study • Datasets – Synthetic for CW, generated with LFR generator. – Real for MIW, SIW & QW, from SNAP dataset collection. • Evaluation Measure: execution time (second) #10
  • 12. Results: Cache size impact #12 Titan Neo4j OrientDB
  • 13. Results: comparison wrt MIW and QW #13
  • 14. Results: Comparison wrt SIW YouTube LiveJournal #14
  • 15. Conclusions SUMMARY • OrientDB is the most efficient solution for CW, while Titan is the winner in SIW • Neo4j is the fastest graph database for MIW and QW • Titan and OrientDB have competitive performance in MIW and QW respectively  don't scale as good as Neo4j FUTURE WORK • Test with bigger graphs. • Run the benchmark employing the distributed version of Titan and OrientDB and other graph databases (Sparksee already added). • Improve performance of the implemented community detection method. #15
  • 16. References (1) Evaluation between different DBMS Giatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data Management 1. Volume 331 of Studies in Computational Intelligence. Springer Berlin Heidelberg (2011) 19-58 Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database systems for social network applications. In: First International Workshop on Graph Data Management Experiences and Systems. GRADES '13, New York, NY, USA, ACM (2013) 15:1-15:7 Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. (2013) Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data management and processing. (2013) Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: A data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, New York, NY, USA, ACM (2010) 42:1-42:6 #16
  • 17. References (2) Evaluation between graph DBMS Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009) Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez- Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010) 37-48 Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on. (April 2012) 186-189 Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708- 715 Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 2012 IEEE 4th International Conference on. (Dec 2012) 363-370 #17
  • 18. Source: https://github.com/socialsensor/graphdb-benchmarks #18 Questions Further contact: sotbeis@iti.gr / papadop@iti.gr Acknowledgement: www.socialsensor.eu