SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Introduction

Approaches

Effziente Verarbeitung von grossen Datenmengen
Teil II
Tristan Schneider

January 9, 2014

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Inhalt
Introduction
Social Graph
Problems and Motivation
Approaches
TAO
Horton
Pregel
Trinity
Unicorn
Conclusion
Comparison
Future Work
Effziente Verarbeitung von grossen Datenmengen Teil II

Approaches

Conclusion
Introduction

Approaches

Social Graph

Consists of Nodes and Edges
Describes Entities and their Relation
Used by Facebook, Google, Amazon etc
About 100+ million nodes and 10+ billion edges

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Problems and Motivation

amount of data exceeds capability of a single machine
necessary to distribute data and computation
data access managed by framework
different requirements (latency, throughput)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO

developed by Facebook
read optimized
fixed set of queries
Strength low latency access

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO: Data Model

data identified by 64-bit integer
Objects (id) → (otype, (key → value)*)
Associations (id1, atype, id2) → (time, (key → value)*)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO: API

fixed set of queries
assoc add, assoc delete, assoc change type
assoc get, assoc count, assoc range, assoc time range

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO: Architecture

data divided into shard (via hashing)
each server handles one or more shard
objects and their associations are in the same shard
an object never changes the shard

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO: Architecture

servers divided in leaders and followers
clients always communicate with followers
cache misses and writes redirected to leader
slave servers support master servers if necessary

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO: Architecture: Scheme

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

TAO: Fault Tolerance and Performance

efficiency and availability > consistency
global mark for down server
followers are interchangeable
slave databases promoted to master, if master crashes

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Conclusion

TAO: Fault Tolerance and Performance

Figure: Write Access Latencies
https://www.facebook.com/download/273893712748848/atc13-bronson.pdf
Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Horton

query language execution engine
written in C#
Strength interactive queries with low latency

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Horton: Data Model

similar to TAO
divided in partitions
additional data can be attached (e.g. key-value-pairs)
directed edges stored at source and target

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Horton: API

horton query language
initiated via client (library)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Horton: Architecture

Graph Client Library translates query to regular expression
Graph Coordinator translates regular expression to finite state
machine and finds most effective execution plan
Graph Partitions executes the finite state machine and traverses
the graph
Graph Manager provides an interface to administrate the graph

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Pregel

C++ based
computation consists of parallel iteration
communication using messaging
Strength high throughput (for analysis)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Pregel: Data Model

graph divided in partitions
partition assignment based on node id (hash(id) mod n)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Pregel: API

implementation of a Vertex class (task)
define methods like Compute(...), SendMessageTo(...)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Pregel: Architecture

runs on a cluster management system
uses distributed file system (eg. Bigtable)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Pregel: Basic Work Flow

1. copy task to worker machines, one is promoted to master
2. master assigns one or more partitions to each worker
3. master invokes supersteps
4. save graph after computation

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Conclusion

Pregel: Fault Tolerance and Performance

workers save their progress at checkpoint supersteps
worker failure detected using ping
reassign partitions failed servers to available workers
reload state of the most recent available checkpoint superstep
process termination if master failed

Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Pregel: Fault Tolerance and Performance

Figure: varying number of worker on 1 billion vertex binary tree
http://kowshik.github.io/JPregel/pregel paper.pdf
Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Trinity
developed by Microsoft
flexible in data and computation
supports online query processing and offline computation
on top well-connected cluster (memory cloud)
based on TFS (similar to HDFS)
Strength low latency and high throughput (not at the same
time)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Conclusion

Trinity: Data Model

key-value-store
one table for nodes
one table for each type of relation
relations represented by id-pairs in the specific table
customisation possible with Trinity Structure Language (TSL)
data backed up in persistent file system

Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Trinity: API

Trinity Desktop Environment (TDE)
supports query requests (similar to Horton/SQL)
supports offline computation (similar to pregel)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Trinity: Architecture

Slaves Stores a part of the data, processes tasks and
messages.
Proxies Optional middle tier between slaves and clients.
Handles messages, does not store data.
Clients Responsible for user interaction with the cluster.

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Trinity: Architecture

Figure: Trinity Cluster Structure
https://research.microsoft.com/pubs/161291/trinity.pdf
Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Conclusion

Trinity: Fault Tolerance and Performance

no ACID support, but atomicity of operations
dead machines are replaced by alive ones, reload memory from
TFS
requesting machine will wait till the dead machine is replaced
recovering the state of the most recent checkpoint superstep
(similar to pregel)

Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Trinity: Fault Tolerance and Performance

Figure: Response time of subgraph match queries
Effziente Verarbeitung von grossen Datenmengen Teil II
https://research.microsoft.com/pubs/161291/trinity.pdf

Conclusion
Introduction

Approaches

Unicorn

in-memory social graph-aware indexing system
search offering backend of Facebook
based on Hadoop
Strength Typeahead
Good performance on complex queries.

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Unicorn: Data Model

sharded data (similar to Facebooks TAO)
indices built and converted using custom Hadoop pipeline

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Conclusion

Unicorn: API

Queries in Unicorn Query Language
e.g. (term likers:104076956295773))
≈ 6M Likers of ”Computer Science”
apply allows to query a (truncated) set of id and then use
those to construct a new query
extract attaches matches as metadata within the forward
index of the query set

Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Conclusion

Unicorn: Architeture

top-aggregator dispatches the query to one rack-aggregator of
each rack, combines and returns result
rack-aggregator forwards the query to all index servers of its rack
(high bandwidth), combines results
index server about 40-80 machines per rack, stores adjacency
lists, performs operations

Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Unicorn: Fault Tolerance and Performance

sharding and replication
automatically replacing machines
serving incomplete results is strongly preferable to serving
empty results

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Conclusion

Unicorn: Fault Tolerance and Performance
(apply friend: likers:104076956295773) ≈ Friends of Likers of
”Computer Science”

https://www.facebook.com/download/138915572976390/UnicornVLDBfinal.pdf
Effziente Verarbeitung von grossen Datenmengen Teil II
Introduction

Approaches

Conclusion

Comparison

Framework
TAO
Horton
Pregel
Trinity
Unicorn

Query Language
no
yes
no
yes
yes

Effziente Verarbeitung von grossen Datenmengen Teil II

low latency
yes
yes
no
yes
yes

high throughput
no
no
yes
yes
no
Introduction

Approaches

Future Work

query language vs fixed set queries
all-in-one framework difficult (Trinity as best attempt)

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion
Introduction

Approaches

Thank you for your attention.

Questions?
Sources
1.
2.
3.
4.
5.

https://research.microsoft.com/pubs/161291/trinity.pdf
http://research.microsoft.com/pubs/162643/icde12 demo 679.pdf
http://kowshik.github.io/JPregel/pregel paper.pdf
https://www.facebook.com/download/273893712748848/atc13-bronson.pdf
https://www.facebook.com/download/138915572976390/UnicornVLDB-final.pdf

Effziente Verarbeitung von grossen Datenmengen Teil II

Conclusion

Mais conteúdo relacionado

Mais procurados

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Event-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-CommerceEvent-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-Commerceijtsrd
 
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTINGREPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTINGcsandit
 
Ijircce publish this paper
Ijircce publish this paperIjircce publish this paper
Ijircce publish this paperSANTOSH WAYAL
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill MapR Technologies
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaNithin Kakkireni
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationijcsit
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsShivansh Gaur
 
" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer...
" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer..." NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer...
" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer...Dataconomy Media
 
Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Ajay Ohri
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...distributed matters
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...Till Blume
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsZhenyun Zhuang
 

Mais procurados (20)

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Event-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-CommerceEvent-Driven, Client-Server Archetypes for E-Commerce
Event-Driven, Client-Server Archetypes for E-Commerce
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTINGREPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
REPLICATION STRATEGY BASED ON DATA RELATIONSHIP IN GRID COMPUTING
 
Ijircce publish this paper
Ijircce publish this paperIjircce publish this paper
Ijircce publish this paper
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
 
Harvard poster
Harvard posterHarvard poster
Harvard poster
 
" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer...
" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer..." NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer...
" NoSQL Databases: An Overview" Lena Wiese, Research Group Knowledge Engineer...
 
Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...Replication and Synchronization Algorithms for Distributed Databases - Lena W...
Replication and Synchronization Algorithms for Distributed Databases - Lena W...
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
 

Semelhante a Effiziente Verarbeitung von grossen Datenmengen

Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediAnimesh Chaturvedi
 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in RRory Winston
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dipayan Dev
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 

Semelhante a Effiziente Verarbeitung von grossen Datenmengen (20)

Paper ijert
Paper ijertPaper ijert
Paper ijert
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in R
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
disertation
disertationdisertation
disertation
 
BigData
BigDataBigData
BigData
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 

Mais de Florian Stegmaier

Ansätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches FilteringAnsätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches FilteringFlorian Stegmaier
 
Fortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative FilteringFortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative FilteringFlorian Stegmaier
 
Realtime
 Distributed Analysis
 of Datastreams
Realtime
 Distributed Analysis
 of DatastreamsRealtime
 Distributed Analysis
 of Datastreams
Realtime
 Distributed Analysis
 of DatastreamsFlorian Stegmaier
 
Effiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen DatenmengenEffiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen DatenmengenFlorian Stegmaier
 
Trust-based recommender systems
Trust-based recommender systemsTrust-based recommender systems
Trust-based recommender systemsFlorian Stegmaier
 
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für EmpfehlungssystemeTrust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für EmpfehlungssystemeFlorian Stegmaier
 
Robustheit in Empfehlungssystemen
Robustheit in EmpfehlungssystemenRobustheit in Empfehlungssystemen
Robustheit in EmpfehlungssystemenFlorian Stegmaier
 
Linked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für EmpfehlungssystemeLinked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für EmpfehlungssystemeFlorian Stegmaier
 
Entscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender SystemEntscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender SystemFlorian Stegmaier
 
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem FilternFunktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem FilternFlorian Stegmaier
 
Context Basierte Personalisierungsansätze
Context Basierte PersonalisierungsansätzeContext Basierte Personalisierungsansätze
Context Basierte PersonalisierungsansätzeFlorian Stegmaier
 
Evaluierung von Empfehlungssystemen
Evaluierung von EmpfehlungssystemenEvaluierung von Empfehlungssystemen
Evaluierung von EmpfehlungssystemenFlorian Stegmaier
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCFlorian Stegmaier
 
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...Florian Stegmaier
 
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...Florian Stegmaier
 

Mais de Florian Stegmaier (16)

Ansätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches FilteringAnsätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches Filtering
 
Fortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative FilteringFortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative Filtering
 
Realtime
 Distributed Analysis
 of Datastreams
Realtime
 Distributed Analysis
 of DatastreamsRealtime
 Distributed Analysis
 of Datastreams
Realtime
 Distributed Analysis
 of Datastreams
 
Effiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen DatenmengenEffiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen Datenmengen
 
Trust-based recommender systems
Trust-based recommender systemsTrust-based recommender systems
Trust-based recommender systems
 
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für EmpfehlungssystemeTrust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
 
Musikempfehlungssysteme
MusikempfehlungssystemeMusikempfehlungssysteme
Musikempfehlungssysteme
 
Robustheit in Empfehlungssystemen
Robustheit in EmpfehlungssystemenRobustheit in Empfehlungssystemen
Robustheit in Empfehlungssystemen
 
Linked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für EmpfehlungssystemeLinked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für Empfehlungssysteme
 
Entscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender SystemEntscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender System
 
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem FilternFunktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
 
Context Basierte Personalisierungsansätze
Context Basierte PersonalisierungsansätzeContext Basierte Personalisierungsansätze
Context Basierte Personalisierungsansätze
 
Evaluierung von Empfehlungssystemen
Evaluierung von EmpfehlungssystemenEvaluierung von Empfehlungssystemen
Evaluierung von Empfehlungssystemen
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
 
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Effiziente Verarbeitung von grossen Datenmengen