SlideShare uma empresa Scribd logo
1 de 67
Scaling up Linked Data
Presented by:
Marin Dimitrov (Ontotext)
Analysis &
Mining Module

Visualization
Module

RDFa

Data acquisition

LD Dataset

Access

Application

EUCLID Objective

SPARQL
Endpoint
Publishing
Vocabulary
Mapping

Interlinking

LD Wrapper

Physical Wrapper

Integrated
Dataset

Cleansing

R2R Transf.

LD Wrapper

RDF/
XML
Streaming providers

Downloads

Musical Content

Metadata

EUCLID – Scaling up Linked Data

Other content
2
Motivation: Music!
• Our aim: build a music-based portal using Linked
CH 1
Data technologies
• So far, we have studied different mechanisms for:
•
•
•
•

Linked Data management via SPARQL queries
Reasoning over Linked Data
Linked Data access (RDF dumps, endpoints, RDFa)
Linked Data storage in repositories
CH 5

CH 2
CH 3

• In this chapter, we will study current research and
technologies to scale up to very large volumes of
Linked Data
EUCLID – Scaling up Linked Data

3
Agenda
1. Introduction to Big (Linked) Data
2. NoSQL databases for Linked Data
3. Hadoop for Linked Data
4. Stream processing for Linked Data
5. … and more

EUCLID – Scaling up Linked Data

4
INTRODUCTION TO BIG (LINKED)
DATA
EUCLID – Scaling up Linked Data

5
Introduction to Big Data

Big
Data

Management of data which is “too
complex” for being processed with
traditional solutions
•

Big does not stand primarily for size,
but as an analogy for “overwhelming”

•

Big can mean “high variety”, “high
volume” or “high velocity”

EUCLID – Scaling up Linked Data

6
The 3 Vs of Big Data
Variety

Big
Big
Data
Data

Different forms of data

Volume

Petabytes of data

Velocity

Real-time data streams

EUCLID – Scaling up Linked Data

7
The 3 Vs of Big Data
Variety

Volume

Velocity

time

Data
characteristic

Structured, semi- Large volumes of Streams, sensors,
structured and
data
near real-time
unstructured
data, IoT

Challenge

Data integration

Reasoning and
querying

Reasoning &
querying

Solution

Semantic
technologies are
a good fit

Distributed
storage &
processing,
parallel
processing

Stream reasoning
& querying

EUCLID – Scaling up Linked Data

8
The Extended Vs of Big Data
Variety

Volume

Velocity

• Veracity: Uncertainty of the data
• Variability: Variation in meaning in different contexts
• Value: turning data into information into insight
• Not easy measure
• Depend on context and intended use
• Linked Data & Semantic Technologies can help
EUCLID – Scaling up Linked Data

9
Beyond Big Data

EUCLID – Scaling up Linked Data

10
Beyond Big Data (2)

Semantic Technologies
Semantic technologies extract meaning from data, ranging from quantitative
data and text, to video, voice and images. Many of these techniques have
existed for years and are based on advanced statistics, data mining, machine
learning and knowledge management. One reason they are garnering more
interest is the renewed business requirement for monetizing information as a
strategic asset. Even more pressing is the technical need. Increasing volumes,
variety and velocity — big data — in IM and business operations, requires
semantic technology that makes sense out of data for humans, or
automates decisions
Source: Gartner Inc. “Gartner Identifies Top Technology Trends Impacting Information
Infrastructure in 2013”
EUCLID – Scaling up Linked Data

11
Towards Big Linked Data
• This characteristic is the most inherent to Linked Data

Variety

• Agile data model
• Different vocabularies

Volume
2007

Velocity

2008

2009

2010

2011

• RDF Streams

• Semantic Sensors

EUCLID – Scaling up Linked Data

12
Towards Big Linked Data (2)

EUCLID – Scaling up Linked Data

13
Big Linked Data &
Linked Big Data
Big Linked Data

Linked Big Data

• Exponential growth of Linked
Data in the last five years
• Big Data approach adopted by
the Linked Data community,
especially to handle
Volume

Velocity

• Linked Data approach
adopted by the Big Data
community
• RDF data model for

Variety

• Enrich Big Data with metadata
and semantics
• Interlink Big Data sets &
reduce duplication
• Simplify data access,
discovery & integration

Source: M. Dimitrov. “Semantic Technologies for Big Data”
EUCLID – Scaling up Linked Data

14
NOSQL DATABASES FOR
LINKED DATA
EUCLID – Scaling up Linked Data

15
RDF Databases
• Native or RDBMS based RDF databases
– OWLIM (http://www.ontotext.com/owlim)
– Virtuoso Universal Server (http://virtuoso.openlinksw.com/ )
– Stardog (http://stardog.com)
– AllegroGraph (http://www.franz.com/agraph/allegrograph/ )

– Systap Bigdata (http://www.systap.com/)
– Jena TDB (http://jena.apache.org/documentation/tdb/)
– Oracle, DB2
EUCLID – Scaling up Linked Data

16
RDF Database Advantages
• RDF (graph) based data model
– Global identifies of resources/entities

– Agile schema

• Inference of implicit facts
– Forward, backward, hybrid reasoning strategy

• Expressive query language (SPARQL)
• Compliance to standards
EUCLID – Scaling up Linked Data

17
NoSQL Databases
• “Not Only SQL”
• a group of databases technologies which don’t
follow the relational data model

• Typical requirements
– Distributed
– High availability

– Handle big data & query volumes (scalability)
– Hierarchical or graph data structures
– Flexible schema
EUCLID – Scaling up Linked Data

18
NoSQL Taxonomy
Conceptual structures

• Key/value stores

Value

Key

– Each key associated with a value (DHT)

• Wide-column stores

Artist

– Each key is associated with many attributes,
columns are stored together

Album

Song

The
Beatles

Let it be

Get back

Queen

Jazz

Fun it

• Document databases
– Each key associated with a complex data
structure

Key

Structureddocument

Key

Structureddocument

• Graph databases
– Data is represented as nodes and edges
Data
EUCLID – Scaling up Linked Data

Relationship

Data
19
Key/Value Stores
• Efficient key/value lookups

Key

Value

• Schema-less

• Simpler read/write operations
– Low latency & high throughput

• Examples
– DynamoDB, Azure Table Storage, Riak, Redis, MemcacheDB,
Voldemort

EUCLID – Scaling up Linked Data

20
Wide-Column Stores
•
•
•
•
•
•
•

A key is associated with several attributes
Data in the same column is stored together
Efficient for complex aggregations over data
Artist
Album
Song
Schema-less / dynamic schema
The
Let it be
Get back
Beatles
Easy to add new columns
Queen
Jazz
Fun it
Columns can be grouped together (column family)
Examples:
– HBase (http://hbase.apache.org)
– Cassandra (http://cassandra.apache.org)

EUCLID – Scaling up Linked Data

21
HBase
•
•
•
•
•
•
•

Open source column-oriented store
Based on Google’s BigTable
Built on top of HDFS and Hadoop
Horizontally scalable, automatic sharding
high availability / automatic failover
Strongly consistent reads/writes
Java/REST API

EUCLID – Scaling up Linked Data

22
Document Databases
• Each key associated with a complex data structure
(document)
• Documents can contain key/value pairs, key/array
pairs, or even nested structures
• Schema-less / dynamic schema
– New fields can be easily added to the document structure

• Typical document formats
– JSON, XML

• Examples:

Key

– Couchbase (http://www.couchbase.com)
– MongoDB (http://www.mongodb.org)
EUCLID – Scaling up Linked Data

Structureddocument

Key

Structureddocument

23
Document Databases (2)
Example:
{
Homepage: "thebeatles.com",
Origin: "Liverpool",
Albums: [
{Title: "Let it be", Year: "1970", Duration: "35:16"},
{Title: "Help!", Year: "1965"},
{Title: "Revolver", Year: "1966", Duration: "35:01"}
]

The Beatles

}

{

Elvis Presley

FullName: "Elvis Aaron Presley",
Homepage: "elvis.com",
Origin: "Memphis"
Albums: [
{Title: "Blue Hawaii", Year: "1961", Duration:
"32:02"}
]
}
EUCLID – Scaling up Linked Data

24
Couchbase
• Document-oriented database
– Documents are stored as JSON

• Flexible schema
– Document structure easy to change

• Optimised to run in-memory and on several
nodes
– Ejection and eventual persistence

• Incremental views & indexes
• Scalability, rebalancing, replication, failover
• RESTful API
EUCLID – Scaling up Linked Data

25
Graph Databases
Motivation
Graphs: Representation of highly connected data

Network of Friends in a High School

Relationship among artists in Last.fm
http://sixdegrees.hu/last.fm/

A Fragment of Facebook
EUCLID – Scaling up Linked Data

Relationships between Tweets
26
Graph Databases
• Based on the property graph model
• Support for query languages and core graph-based
tasks
– reachability, traversal, adjacency and pattern matching

• Examples
Relationship
Data
– Neo4j (http://neo4j.org)
– Dex (http://sparsity-technologies.com/dex.php)
– HyperGraphDB (http://www.hypergraphdb.org)

EUCLID – Scaling up Linked Data

Data

27
Graph Databases
Example: Property Graph Model
Year: 1970
Duration: 35:16

Let it be
Homepage:
thebeatles.com
Origin: Liverpool

The Beatles

created

Year: 1961
Duration: 32:02

Year: 1965

Elvis Presley

created
Revolver

Revolver

Year: 1966
Duration: 35:01

Fullname: Elvis Aaron
Presley
Homepage: elvis.com
Origin: Memphis

Help!

• Nodes and edges may have properties
• Properties: Key-value pairs
EUCLID – Scaling up Linked Data

28
Neo4j
• Graph database
– Nodes, Relationships, Properties, Paths
– Indexes over properties

•
•
•
•
•

Flexible schema
Cypher graph query language
ACID transactions
High availability, distributed clusters
RESTful and Java APIs
EUCLID – Scaling up Linked Data

29
Rya
• RDF store based on Accumulo
– Column-store, HDFS
– Sesame query parser, SAIL
implementation

• 3 table index
– SPO, POS, OSP

– Sufficient for all triple patterns
– All triple parts (S, P, O) encoded in
the RowID
– Clustered index
Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds”
EUCLID – Scaling up Linked Data

30
Rya (2)
• Query processing
– Sesame (SPARQL) query plan translated to Accumulo range
scans & lookups

– Parallel scans for joins (x10-20 speedup)
– Batch scans (Accumulo) to reduce number of range scans
– Statistics for triple patterns selectivity, query re-ordering

• Performance evaluation (LUBM)
– No significant degradation when data grows with 2-3 orders
of magnitude
Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds”
EUCLID – Scaling up Linked Data

31
“NoSQL Databases f0r RDF: An
Empirical Evaluation”
• Goal
– Store RDF data in HBase, Couchbase, Hive & Cassandra
– Benchmark query performance against a native
distributed RDF database (4store)

• HBase prototype
– Jena for SPARQL queries

– 3 index tables (SPO, POS, OSP)
– Row key encodes S+P+O, cells are empty
– Jena query plan translated to HBase filters & lookups
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
EUCLID – Scaling up Linked Data

32
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (2)
• Hive+HBase prototype
– SPARQL to HiveQL translation

– Property table
• Row key is S
• a column for each P

• cell value stores O
• Multi-valued attributes have different timestamps
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
EUCLID – Scaling up Linked Data

33
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (3)
• CumulusRDF prototype
– Sesame for SPARQL queries, Cassandra for data management
– 3 index tables (SPO, POS, OSP)

– Sesame query plan translated to Cassandra index lookups

• Couchbase prototype
– Map RDF into JSON documents
• all triples with the same S stored in the same document (molecule)
• 2 JSON arrays for Ps and Os

– Jena as a SPARQL query engine
– 3 indexes (Couchbase views): SPO, POS, OSP
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
EUCLID – Scaling up Linked Data

34
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (4)
• Benchmarks
– BSBM 10M, 100M
and 1B triples
– 1, 2, 4, 8, 16 node
cluster
– AWS cost & query
execution time

Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
EUCLID – Scaling up Linked Data

35
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (5)
• Results
– Simple SPARQL queries can be executed more
efficiently on a NoSQL datastore

– Data loading time for some NoSQL datastores
comparable or better than the native RDF store
– Complex SPARQL queries perform significantly slower
on NoSQL systems
• Query optimisations are required

– MapReduce operations (Hive & Couchbase) introduce
high latency for view maintenance / query execution
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
EUCLID – Scaling up Linked Data

36
HADOOP FOR LINKED DATA

EUCLID – Scaling up Linked Data

37
Working with Distributed Data
• Apache Hadoop (http://hadoop.apache.org) is an open source
implementation of MapReduce
• MapReduce
– Distributed batch processing
– Map phase partitions the input set (K/V pairs), Reduce phase performs
aggregated processing over the partitions in parallel
– Shuffle intermediate results (from Map nodes to Reduce nodes)

• Allows for the processing of distributed large data sets across
clusters of computers
– On a distributed file system (HDFS)
– Scales up to thousands of nodes, each offering local processing power
and storage

EUCLID – Scaling up Linked Data

38
“Scalable Distributed Reasoning
with MapReduce”
• Goal
– Utilise Hadoop for large scale reasoning

• Approach
– Implement each RDFS rule (join) via a Map & Reduce function
– Map outputs original triple as value, and the join term as key
– Reducer receives all needed triples to perform the join

Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
EUCLID – Scaling up Linked Data

39
“Scalable Distributed Reasoning
with MapReduce” (2)

Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
EUCLID – Scaling up Linked Data

40
“Scalable Distributed Reasoning
with MapReduce” (3)
• Challenge
– Too many duplicates (unique to derived
triple ratio of 1:50)

• Optimisations
– Replicate schema triples on each mode
(in memory)
• Needed for each join; usually a small set

– Rule re-ordering
• Which rule may be triggered by another
rule?
• Reduce the number of required iterations
Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
EUCLID – Scaling up Linked Data

41
“Scalable Distributed Reasoning
with MapReduce” (4)
• Results
– Throughput of 4.5M triples / sec on a 16-node cluster
– 16+ nodes do not improve the performance
significantly

Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
EUCLID – Scaling up Linked Data

42
Lessons Learned from Largescale Reasoning (J. Urbani)
• 1st Law: Treat schema triples differently
– Replicate on all nodes to minimise subsequent data transfer

• 2nd Law: Data skew dominates data distribution
– No universal partitioning scheme for input data
– Computation tasks moved to the nodes storing the data
(data locality)

• 3rd Law: Certain problems only appear at a very large
scale
– Proof-of-concept prototypes are often not representative
Source: Jacopo Urbani “Three Laws Learned from Web-scale Reasoning”
EUCLID – Scaling up Linked Data

43
STREAM PROCESSING FOR
LINKED DATA
EUCLID – Scaling up Linked Data

45
Streaming Data
• A large amount of new data is constantly being created or
data is being updated at a rapid rate
– Traffic data, sensor networks, social networks, financial markets

time

• Many data sources create a constant “stream of information”
– Not always practical to store all data and then query it
– Continuous queries over transient data

• More recent data is more important
– Describes the current state of a dynamic system
EUCLID – Scaling up Linked Data

46
Stream Processing
• Streams are observed through windows
• Continuous queries can be registered over the stream
• Continuous queries are iteratively evaluated over the data in the
current window
– Can leverage static background knowledge (e.g., schema information)

• Generates a stream of answers
Window

time
Background
Knowledge

Continuous
Query
EUCLID – Scaling up Linked Data

Stream of answers

47
Linked Stream Data
• A representation of sensor/stream data following the
Linked Data principles
– Sensor data can be enriched with semantics
– Facilitates data discovery and integration of heterogeneous data
sources

• Challenges
– RDF Triples must be annotated with timestamps
– Extensions to the SPARQL language – windows, continuous queries,
streaming operators
– Continuous semantics
– Scalability (Volume)
– High throughput and low latency (Velocity)
– Approximate reasoning

EUCLID – Scaling up Linked Data

48
Querying Streams with
SPARQL Extensions
• The mechanism to evaluate queries over streaming data is the
specification of continuous queries
• The corresponding results to the continuous query are
updated while new data arrives
• Several SPARQL extensions with streaming operators based on
CQL (Continuous Query Language)
– C-SPARQL
– SPARQLStream
– EP-SPARQL, CQELS, Instants
EUCLID – Scaling up Linked Data

49
C-SPARQL (1)
C-SPARQL is an extension of SPARQL 1.1
1. RDF Streams: Sequence of RDF triples annotated with timestamps:
<(s,p,o), timestamp>

2. FROM STREAM extension for stream sources and windows
FromStrClause

 'FROM' ['NAMED'] 'STREAM' StreamIRI
' [ RANGE' Window ']'

Window

 LogicalWindow | PhysicalWindow

LogicalWindow

 Number TimeUnit WindowOverlap

TimeUnit
'DAY'

 'MSEC' | 'SEC' | 'MIN' | 'HOUR' |

WindowOverlap

 'STEP' Number TimeUnit | 'TUMBLING'

PhysicalWindow

 'TRIPLES' Number

EUCLID – Scaling up Linked Data

50
C-SPARQL (2)
3. Registration
• Creates a continuous query over the data source
• The query output is variable bindings, RDF graph, or a
new stream
Registration  'REGISTER' ('QUERY'|'STREAM') QName 'AS' Query

EUCLID – Scaling up Linked Data

51
C-SPARQL (3)
Example
Query:

Retrieve the cars and districts, where the car was registered in a toll.

REGISTER QUERY CarsEnteringInDistricts AS
SELECT DISTINCT ?district ?car
FROM STREAM <www.uc.eu/tollgates.trdf> [RANGE 40 SEC STEP 10 SEC]
WHERE {
?toll t:registers ?car .
?toll c:placedIn ?street .
?district c:contains ?street . }

Source: Barbieri, Davide Francesco, et al. "Querying rdf streams with c-sparql." ACM SIGMOD
Record 39.1 (2010): 20-26.
EUCLID – Scaling up Linked Data

52
C-SPARQL (4)

Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)”
EUCLID – Scaling up Linked Data

53
SPARQLStream (1)
• Utilizes the same definition of RDF streams as in C-SPARQL:
<(s,p,o), timestamp>

• The language is defined as follows:
NamedStream  'FROM' ['NAMED'] 'STREAM' StreamIRI ' [' Window ']'
Window

 'NOW-' Integer TimeUnit [UpperBound] [Slide]

UpperBound

 'TO NOW-' Integer TimeUnit

Slide

 'SLIDE' Integer TimeUnit

TimeUnit

 'MS' | 'S' | 'MINUTES' | 'HOURS' | 'DAY'

Select



'SELECT' [XStream] [DISTINCT | REDUCED] …

Xstream



'ISTREAM' | 'DSTREAM' | 'RSTREAM'

Source: Jean-Paul Calbimonte and Oscar Corcho. ”SPARQLStream: Ontology-based access to data
streams." Tutorial at ISWC 2013
EUCLID – Scaling up Linked Data

54
SPARQLStream (2)
Example
Query:

Retrieve a rstream with the observations captured by all sensors in the last
10 minutes.

PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns/#>
SELECT RSTREAM ?sensor ?observation
FROM STREAM <www.semsorgrid4env.eu/SensorReadings.srdf>
[FROM NOW – 10 MINUTES TO NOW STEP 1 MINUTE]
WHERE {
?observation a ssn:Observation;
ssn:observedBy ?sensor .
}

EUCLID – Scaling up Linked Data

55
Classification of Existing
Systems

Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)”
EUCLID – Scaling up Linked Data

56
W3C Semantic Sensor Networks
• SSN Ontology
–
–
–
–

http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
OWL DL ontology
used to semantically describe sensors and sensor networks & data
Recommendations for applying the ontology for Linked Sensor Data

EUCLID – Scaling up Linked Data

57
W3C Semantic Sensor Networks
(2)
• Different perspectives
– Sensor, data/observation, system

EUCLID – Scaling up Linked Data

58
… AND MORE

EUCLID – Scaling up Linked Data

59
A Trillion RDF Triples
• Use case
– Use RDF and Linked Data for the customer management
database of a big telecom

– Franz Inc / AllegroGraph

EUCLID – Scaling up Linked Data

60
uRiKA Appliance
• YarcData
• Big Data appliance for graph
analytics
– 8K processors, 1TB RAM

– In-memory RDF database
– SPARQL 1.1 support

EUCLID – Scaling up Linked Data

61
RDFS Reasoning on GPUs
• Similar approach to Urbani et al. for large scale
reasoning with Hadoop
– Handle rules with 2 antecedents
– Rule reordering
– Dictionary encoding

• Shared-memory architecture
– Efficient GPU algorithm implementation is challenging
Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012
EUCLID – Scaling up Linked Data

62
RDFS Reasoning on GPUs (2)
• Data parallelism
– Apply one rule (thread) on one instance triple, join to a schema triple
if possible
– Hundreds / thousands of threads working on parallel

• Challenge
– Duplicate removal

• Benchmark
– x5 speedup of computation
– But… memory transfer overhead is significant
Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012
EUCLID – Scaling up Linked Data

63
Benchmarks
• BSBM v3.1 (April 2013)
– http://wifo5-03.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/results/V7/
– Includes benchmarks with up to 150 billion triples
– x750 scale increase since the last BSBM result (200M triples)

• LDBC
– Industry neutral, non-profit organisation
– Benchmarks for RDF and graph databases, similar to TPC
– Big data volume, complex queries

EUCLID – Scaling up Linked Data

64
SUMMARY

EUCLID – Scaling up Linked Data

65
Summary
• Linked Data is a good fit for the Variety
challenge of Big Data
• Linked Data can simplify data discovery, data
access, data integration challenges for Big Data
• Exponential growth of Linked Data

• Linked Data benchmarks target bigger
workloads

EUCLID – Scaling up Linked Data

66
Summary (2)
• Ongoing R&D towards scaling up Linked Data
for high data Volume and Velocity
– NoSQL datastores for RDF data management
– Hadoop for scalable RDF reasoning
– GPUs for scalable RDF reasoning

• Adapting Linked Data & SPARQL for streaming
data scenarios

EUCLID – Scaling up Linked Data

67
For exercises, quiz and further material visit our website:

http://www.euclid-project.eu

Course

eBook

Other channels:

@euclid_project

euclidproject
EUCLID – Scaling up Linked Data

euclidproject
68

Mais conteúdo relacionado

Mais procurados

Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410Arnaud Le Hors
 
Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesNandana Mihindukulasooriya
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...
ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...
ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...eswcsummerschool
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Hector Correa
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015Cason Snow
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 

Mais procurados (20)

Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410
 
Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examples
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...
ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...
ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing ...
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Introduction to W3C Linked Data Platform
Introduction to W3C Linked Data PlatformIntroduction to W3C Linked Data Platform
Introduction to W3C Linked Data Platform
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 

Semelhante a Scaling up Linked Data

Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemUldis Bojars
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked DataAdrian Stevenson
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentationmlang222
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftRuleML
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 

Semelhante a Scaling up Linked Data (20)

Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management System
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 

Mais de EUCLID project

Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on AndroidEUCLID project
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data EducationEUCLID project
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionEUCLID project
 
Mapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataMapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataEUCLID project
 
Speech Technology and Big Data
Speech Technology and Big DataSpeech Technology and Big Data
Speech Technology and Big DataEUCLID project
 
Data Science Curriculum for Professionals
Data Science Curriculum for ProfessionalsData Science Curriculum for Professionals
Data Science Curriculum for ProfessionalsEUCLID project
 

Mais de EUCLID project (7)

Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Best Practices for Linked Data Education
Best Practices for Linked Data EducationBest Practices for Linked Data Education
Best Practices for Linked Data Education
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
 
Mapping Relational Databases to Linked Data
Mapping Relational Databases to Linked DataMapping Relational Databases to Linked Data
Mapping Relational Databases to Linked Data
 
Speech Technology and Big Data
Speech Technology and Big DataSpeech Technology and Big Data
Speech Technology and Big Data
 
Data Science Curriculum for Professionals
Data Science Curriculum for ProfessionalsData Science Curriculum for Professionals
Data Science Curriculum for Professionals
 

Último

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Scaling up Linked Data

  • 1. Scaling up Linked Data Presented by: Marin Dimitrov (Ontotext)
  • 2. Analysis & Mining Module Visualization Module RDFa Data acquisition LD Dataset Access Application EUCLID Objective SPARQL Endpoint Publishing Vocabulary Mapping Interlinking LD Wrapper Physical Wrapper Integrated Dataset Cleansing R2R Transf. LD Wrapper RDF/ XML Streaming providers Downloads Musical Content Metadata EUCLID – Scaling up Linked Data Other content 2
  • 3. Motivation: Music! • Our aim: build a music-based portal using Linked CH 1 Data technologies • So far, we have studied different mechanisms for: • • • • Linked Data management via SPARQL queries Reasoning over Linked Data Linked Data access (RDF dumps, endpoints, RDFa) Linked Data storage in repositories CH 5 CH 2 CH 3 • In this chapter, we will study current research and technologies to scale up to very large volumes of Linked Data EUCLID – Scaling up Linked Data 3
  • 4. Agenda 1. Introduction to Big (Linked) Data 2. NoSQL databases for Linked Data 3. Hadoop for Linked Data 4. Stream processing for Linked Data 5. … and more EUCLID – Scaling up Linked Data 4
  • 5. INTRODUCTION TO BIG (LINKED) DATA EUCLID – Scaling up Linked Data 5
  • 6. Introduction to Big Data Big Data Management of data which is “too complex” for being processed with traditional solutions • Big does not stand primarily for size, but as an analogy for “overwhelming” • Big can mean “high variety”, “high volume” or “high velocity” EUCLID – Scaling up Linked Data 6
  • 7. The 3 Vs of Big Data Variety Big Big Data Data Different forms of data Volume Petabytes of data Velocity Real-time data streams EUCLID – Scaling up Linked Data 7
  • 8. The 3 Vs of Big Data Variety Volume Velocity time Data characteristic Structured, semi- Large volumes of Streams, sensors, structured and data near real-time unstructured data, IoT Challenge Data integration Reasoning and querying Reasoning & querying Solution Semantic technologies are a good fit Distributed storage & processing, parallel processing Stream reasoning & querying EUCLID – Scaling up Linked Data 8
  • 9. The Extended Vs of Big Data Variety Volume Velocity • Veracity: Uncertainty of the data • Variability: Variation in meaning in different contexts • Value: turning data into information into insight • Not easy measure • Depend on context and intended use • Linked Data & Semantic Technologies can help EUCLID – Scaling up Linked Data 9
  • 10. Beyond Big Data EUCLID – Scaling up Linked Data 10
  • 11. Beyond Big Data (2) Semantic Technologies Semantic technologies extract meaning from data, ranging from quantitative data and text, to video, voice and images. Many of these techniques have existed for years and are based on advanced statistics, data mining, machine learning and knowledge management. One reason they are garnering more interest is the renewed business requirement for monetizing information as a strategic asset. Even more pressing is the technical need. Increasing volumes, variety and velocity — big data — in IM and business operations, requires semantic technology that makes sense out of data for humans, or automates decisions Source: Gartner Inc. “Gartner Identifies Top Technology Trends Impacting Information Infrastructure in 2013” EUCLID – Scaling up Linked Data 11
  • 12. Towards Big Linked Data • This characteristic is the most inherent to Linked Data Variety • Agile data model • Different vocabularies Volume 2007 Velocity 2008 2009 2010 2011 • RDF Streams • Semantic Sensors EUCLID – Scaling up Linked Data 12
  • 13. Towards Big Linked Data (2) EUCLID – Scaling up Linked Data 13
  • 14. Big Linked Data & Linked Big Data Big Linked Data Linked Big Data • Exponential growth of Linked Data in the last five years • Big Data approach adopted by the Linked Data community, especially to handle Volume Velocity • Linked Data approach adopted by the Big Data community • RDF data model for Variety • Enrich Big Data with metadata and semantics • Interlink Big Data sets & reduce duplication • Simplify data access, discovery & integration Source: M. Dimitrov. “Semantic Technologies for Big Data” EUCLID – Scaling up Linked Data 14
  • 15. NOSQL DATABASES FOR LINKED DATA EUCLID – Scaling up Linked Data 15
  • 16. RDF Databases • Native or RDBMS based RDF databases – OWLIM (http://www.ontotext.com/owlim) – Virtuoso Universal Server (http://virtuoso.openlinksw.com/ ) – Stardog (http://stardog.com) – AllegroGraph (http://www.franz.com/agraph/allegrograph/ ) – Systap Bigdata (http://www.systap.com/) – Jena TDB (http://jena.apache.org/documentation/tdb/) – Oracle, DB2 EUCLID – Scaling up Linked Data 16
  • 17. RDF Database Advantages • RDF (graph) based data model – Global identifies of resources/entities – Agile schema • Inference of implicit facts – Forward, backward, hybrid reasoning strategy • Expressive query language (SPARQL) • Compliance to standards EUCLID – Scaling up Linked Data 17
  • 18. NoSQL Databases • “Not Only SQL” • a group of databases technologies which don’t follow the relational data model • Typical requirements – Distributed – High availability – Handle big data & query volumes (scalability) – Hierarchical or graph data structures – Flexible schema EUCLID – Scaling up Linked Data 18
  • 19. NoSQL Taxonomy Conceptual structures • Key/value stores Value Key – Each key associated with a value (DHT) • Wide-column stores Artist – Each key is associated with many attributes, columns are stored together Album Song The Beatles Let it be Get back Queen Jazz Fun it • Document databases – Each key associated with a complex data structure Key Structureddocument Key Structureddocument • Graph databases – Data is represented as nodes and edges Data EUCLID – Scaling up Linked Data Relationship Data 19
  • 20. Key/Value Stores • Efficient key/value lookups Key Value • Schema-less • Simpler read/write operations – Low latency & high throughput • Examples – DynamoDB, Azure Table Storage, Riak, Redis, MemcacheDB, Voldemort EUCLID – Scaling up Linked Data 20
  • 21. Wide-Column Stores • • • • • • • A key is associated with several attributes Data in the same column is stored together Efficient for complex aggregations over data Artist Album Song Schema-less / dynamic schema The Let it be Get back Beatles Easy to add new columns Queen Jazz Fun it Columns can be grouped together (column family) Examples: – HBase (http://hbase.apache.org) – Cassandra (http://cassandra.apache.org) EUCLID – Scaling up Linked Data 21
  • 22. HBase • • • • • • • Open source column-oriented store Based on Google’s BigTable Built on top of HDFS and Hadoop Horizontally scalable, automatic sharding high availability / automatic failover Strongly consistent reads/writes Java/REST API EUCLID – Scaling up Linked Data 22
  • 23. Document Databases • Each key associated with a complex data structure (document) • Documents can contain key/value pairs, key/array pairs, or even nested structures • Schema-less / dynamic schema – New fields can be easily added to the document structure • Typical document formats – JSON, XML • Examples: Key – Couchbase (http://www.couchbase.com) – MongoDB (http://www.mongodb.org) EUCLID – Scaling up Linked Data Structureddocument Key Structureddocument 23
  • 24. Document Databases (2) Example: { Homepage: "thebeatles.com", Origin: "Liverpool", Albums: [ {Title: "Let it be", Year: "1970", Duration: "35:16"}, {Title: "Help!", Year: "1965"}, {Title: "Revolver", Year: "1966", Duration: "35:01"} ] The Beatles } { Elvis Presley FullName: "Elvis Aaron Presley", Homepage: "elvis.com", Origin: "Memphis" Albums: [ {Title: "Blue Hawaii", Year: "1961", Duration: "32:02"} ] } EUCLID – Scaling up Linked Data 24
  • 25. Couchbase • Document-oriented database – Documents are stored as JSON • Flexible schema – Document structure easy to change • Optimised to run in-memory and on several nodes – Ejection and eventual persistence • Incremental views & indexes • Scalability, rebalancing, replication, failover • RESTful API EUCLID – Scaling up Linked Data 25
  • 26. Graph Databases Motivation Graphs: Representation of highly connected data Network of Friends in a High School Relationship among artists in Last.fm http://sixdegrees.hu/last.fm/ A Fragment of Facebook EUCLID – Scaling up Linked Data Relationships between Tweets 26
  • 27. Graph Databases • Based on the property graph model • Support for query languages and core graph-based tasks – reachability, traversal, adjacency and pattern matching • Examples Relationship Data – Neo4j (http://neo4j.org) – Dex (http://sparsity-technologies.com/dex.php) – HyperGraphDB (http://www.hypergraphdb.org) EUCLID – Scaling up Linked Data Data 27
  • 28. Graph Databases Example: Property Graph Model Year: 1970 Duration: 35:16 Let it be Homepage: thebeatles.com Origin: Liverpool The Beatles created Year: 1961 Duration: 32:02 Year: 1965 Elvis Presley created Revolver Revolver Year: 1966 Duration: 35:01 Fullname: Elvis Aaron Presley Homepage: elvis.com Origin: Memphis Help! • Nodes and edges may have properties • Properties: Key-value pairs EUCLID – Scaling up Linked Data 28
  • 29. Neo4j • Graph database – Nodes, Relationships, Properties, Paths – Indexes over properties • • • • • Flexible schema Cypher graph query language ACID transactions High availability, distributed clusters RESTful and Java APIs EUCLID – Scaling up Linked Data 29
  • 30. Rya • RDF store based on Accumulo – Column-store, HDFS – Sesame query parser, SAIL implementation • 3 table index – SPO, POS, OSP – Sufficient for all triple patterns – All triple parts (S, P, O) encoded in the RowID – Clustered index Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds” EUCLID – Scaling up Linked Data 30
  • 31. Rya (2) • Query processing – Sesame (SPARQL) query plan translated to Accumulo range scans & lookups – Parallel scans for joins (x10-20 speedup) – Batch scans (Accumulo) to reduce number of range scans – Statistics for triple patterns selectivity, query re-ordering • Performance evaluation (LUBM) – No significant degradation when data grows with 2-3 orders of magnitude Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds” EUCLID – Scaling up Linked Data 31
  • 32. “NoSQL Databases f0r RDF: An Empirical Evaluation” • Goal – Store RDF data in HBase, Couchbase, Hive & Cassandra – Benchmark query performance against a native distributed RDF database (4store) • HBase prototype – Jena for SPARQL queries – 3 index tables (SPO, POS, OSP) – Row key encodes S+P+O, cells are empty – Jena query plan translated to HBase filters & lookups Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation” EUCLID – Scaling up Linked Data 32
  • 33. “NoSQL Databases f0r RDF: An Empirical Evaluation” (2) • Hive+HBase prototype – SPARQL to HiveQL translation – Property table • Row key is S • a column for each P • cell value stores O • Multi-valued attributes have different timestamps Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation” EUCLID – Scaling up Linked Data 33
  • 34. “NoSQL Databases f0r RDF: An Empirical Evaluation” (3) • CumulusRDF prototype – Sesame for SPARQL queries, Cassandra for data management – 3 index tables (SPO, POS, OSP) – Sesame query plan translated to Cassandra index lookups • Couchbase prototype – Map RDF into JSON documents • all triples with the same S stored in the same document (molecule) • 2 JSON arrays for Ps and Os – Jena as a SPARQL query engine – 3 indexes (Couchbase views): SPO, POS, OSP Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation” EUCLID – Scaling up Linked Data 34
  • 35. “NoSQL Databases f0r RDF: An Empirical Evaluation” (4) • Benchmarks – BSBM 10M, 100M and 1B triples – 1, 2, 4, 8, 16 node cluster – AWS cost & query execution time Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation” EUCLID – Scaling up Linked Data 35
  • 36. “NoSQL Databases f0r RDF: An Empirical Evaluation” (5) • Results – Simple SPARQL queries can be executed more efficiently on a NoSQL datastore – Data loading time for some NoSQL datastores comparable or better than the native RDF store – Complex SPARQL queries perform significantly slower on NoSQL systems • Query optimisations are required – MapReduce operations (Hive & Couchbase) introduce high latency for view maintenance / query execution Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation” EUCLID – Scaling up Linked Data 36
  • 37. HADOOP FOR LINKED DATA EUCLID – Scaling up Linked Data 37
  • 38. Working with Distributed Data • Apache Hadoop (http://hadoop.apache.org) is an open source implementation of MapReduce • MapReduce – Distributed batch processing – Map phase partitions the input set (K/V pairs), Reduce phase performs aggregated processing over the partitions in parallel – Shuffle intermediate results (from Map nodes to Reduce nodes) • Allows for the processing of distributed large data sets across clusters of computers – On a distributed file system (HDFS) – Scales up to thousands of nodes, each offering local processing power and storage EUCLID – Scaling up Linked Data 38
  • 39. “Scalable Distributed Reasoning with MapReduce” • Goal – Utilise Hadoop for large scale reasoning • Approach – Implement each RDFS rule (join) via a Map & Reduce function – Map outputs original triple as value, and the join term as key – Reducer receives all needed triples to perform the join Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce” EUCLID – Scaling up Linked Data 39
  • 40. “Scalable Distributed Reasoning with MapReduce” (2) Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce” EUCLID – Scaling up Linked Data 40
  • 41. “Scalable Distributed Reasoning with MapReduce” (3) • Challenge – Too many duplicates (unique to derived triple ratio of 1:50) • Optimisations – Replicate schema triples on each mode (in memory) • Needed for each join; usually a small set – Rule re-ordering • Which rule may be triggered by another rule? • Reduce the number of required iterations Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce” EUCLID – Scaling up Linked Data 41
  • 42. “Scalable Distributed Reasoning with MapReduce” (4) • Results – Throughput of 4.5M triples / sec on a 16-node cluster – 16+ nodes do not improve the performance significantly Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce” EUCLID – Scaling up Linked Data 42
  • 43. Lessons Learned from Largescale Reasoning (J. Urbani) • 1st Law: Treat schema triples differently – Replicate on all nodes to minimise subsequent data transfer • 2nd Law: Data skew dominates data distribution – No universal partitioning scheme for input data – Computation tasks moved to the nodes storing the data (data locality) • 3rd Law: Certain problems only appear at a very large scale – Proof-of-concept prototypes are often not representative Source: Jacopo Urbani “Three Laws Learned from Web-scale Reasoning” EUCLID – Scaling up Linked Data 43
  • 44. STREAM PROCESSING FOR LINKED DATA EUCLID – Scaling up Linked Data 45
  • 45. Streaming Data • A large amount of new data is constantly being created or data is being updated at a rapid rate – Traffic data, sensor networks, social networks, financial markets time • Many data sources create a constant “stream of information” – Not always practical to store all data and then query it – Continuous queries over transient data • More recent data is more important – Describes the current state of a dynamic system EUCLID – Scaling up Linked Data 46
  • 46. Stream Processing • Streams are observed through windows • Continuous queries can be registered over the stream • Continuous queries are iteratively evaluated over the data in the current window – Can leverage static background knowledge (e.g., schema information) • Generates a stream of answers Window time Background Knowledge Continuous Query EUCLID – Scaling up Linked Data Stream of answers 47
  • 47. Linked Stream Data • A representation of sensor/stream data following the Linked Data principles – Sensor data can be enriched with semantics – Facilitates data discovery and integration of heterogeneous data sources • Challenges – RDF Triples must be annotated with timestamps – Extensions to the SPARQL language – windows, continuous queries, streaming operators – Continuous semantics – Scalability (Volume) – High throughput and low latency (Velocity) – Approximate reasoning EUCLID – Scaling up Linked Data 48
  • 48. Querying Streams with SPARQL Extensions • The mechanism to evaluate queries over streaming data is the specification of continuous queries • The corresponding results to the continuous query are updated while new data arrives • Several SPARQL extensions with streaming operators based on CQL (Continuous Query Language) – C-SPARQL – SPARQLStream – EP-SPARQL, CQELS, Instants EUCLID – Scaling up Linked Data 49
  • 49. C-SPARQL (1) C-SPARQL is an extension of SPARQL 1.1 1. RDF Streams: Sequence of RDF triples annotated with timestamps: <(s,p,o), timestamp> 2. FROM STREAM extension for stream sources and windows FromStrClause  'FROM' ['NAMED'] 'STREAM' StreamIRI ' [ RANGE' Window ']' Window  LogicalWindow | PhysicalWindow LogicalWindow  Number TimeUnit WindowOverlap TimeUnit 'DAY'  'MSEC' | 'SEC' | 'MIN' | 'HOUR' | WindowOverlap  'STEP' Number TimeUnit | 'TUMBLING' PhysicalWindow  'TRIPLES' Number EUCLID – Scaling up Linked Data 50
  • 50. C-SPARQL (2) 3. Registration • Creates a continuous query over the data source • The query output is variable bindings, RDF graph, or a new stream Registration  'REGISTER' ('QUERY'|'STREAM') QName 'AS' Query EUCLID – Scaling up Linked Data 51
  • 51. C-SPARQL (3) Example Query: Retrieve the cars and districts, where the car was registered in a toll. REGISTER QUERY CarsEnteringInDistricts AS SELECT DISTINCT ?district ?car FROM STREAM <www.uc.eu/tollgates.trdf> [RANGE 40 SEC STEP 10 SEC] WHERE { ?toll t:registers ?car . ?toll c:placedIn ?street . ?district c:contains ?street . } Source: Barbieri, Davide Francesco, et al. "Querying rdf streams with c-sparql." ACM SIGMOD Record 39.1 (2010): 20-26. EUCLID – Scaling up Linked Data 52
  • 52. C-SPARQL (4) Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)” EUCLID – Scaling up Linked Data 53
  • 53. SPARQLStream (1) • Utilizes the same definition of RDF streams as in C-SPARQL: <(s,p,o), timestamp> • The language is defined as follows: NamedStream  'FROM' ['NAMED'] 'STREAM' StreamIRI ' [' Window ']' Window  'NOW-' Integer TimeUnit [UpperBound] [Slide] UpperBound  'TO NOW-' Integer TimeUnit Slide  'SLIDE' Integer TimeUnit TimeUnit  'MS' | 'S' | 'MINUTES' | 'HOURS' | 'DAY' Select  'SELECT' [XStream] [DISTINCT | REDUCED] … Xstream  'ISTREAM' | 'DSTREAM' | 'RSTREAM' Source: Jean-Paul Calbimonte and Oscar Corcho. ”SPARQLStream: Ontology-based access to data streams." Tutorial at ISWC 2013 EUCLID – Scaling up Linked Data 54
  • 54. SPARQLStream (2) Example Query: Retrieve a rstream with the observations captured by all sensors in the last 10 minutes. PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns/#> SELECT RSTREAM ?sensor ?observation FROM STREAM <www.semsorgrid4env.eu/SensorReadings.srdf> [FROM NOW – 10 MINUTES TO NOW STEP 1 MINUTE] WHERE { ?observation a ssn:Observation; ssn:observedBy ?sensor . } EUCLID – Scaling up Linked Data 55
  • 55. Classification of Existing Systems Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)” EUCLID – Scaling up Linked Data 56
  • 56. W3C Semantic Sensor Networks • SSN Ontology – – – – http://www.w3.org/2005/Incubator/ssn/ssnx/ssn OWL DL ontology used to semantically describe sensors and sensor networks & data Recommendations for applying the ontology for Linked Sensor Data EUCLID – Scaling up Linked Data 57
  • 57. W3C Semantic Sensor Networks (2) • Different perspectives – Sensor, data/observation, system EUCLID – Scaling up Linked Data 58
  • 58. … AND MORE EUCLID – Scaling up Linked Data 59
  • 59. A Trillion RDF Triples • Use case – Use RDF and Linked Data for the customer management database of a big telecom – Franz Inc / AllegroGraph EUCLID – Scaling up Linked Data 60
  • 60. uRiKA Appliance • YarcData • Big Data appliance for graph analytics – 8K processors, 1TB RAM – In-memory RDF database – SPARQL 1.1 support EUCLID – Scaling up Linked Data 61
  • 61. RDFS Reasoning on GPUs • Similar approach to Urbani et al. for large scale reasoning with Hadoop – Handle rules with 2 antecedents – Rule reordering – Dictionary encoding • Shared-memory architecture – Efficient GPU algorithm implementation is challenging Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012 EUCLID – Scaling up Linked Data 62
  • 62. RDFS Reasoning on GPUs (2) • Data parallelism – Apply one rule (thread) on one instance triple, join to a schema triple if possible – Hundreds / thousands of threads working on parallel • Challenge – Duplicate removal • Benchmark – x5 speedup of computation – But… memory transfer overhead is significant Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012 EUCLID – Scaling up Linked Data 63
  • 63. Benchmarks • BSBM v3.1 (April 2013) – http://wifo5-03.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/results/V7/ – Includes benchmarks with up to 150 billion triples – x750 scale increase since the last BSBM result (200M triples) • LDBC – Industry neutral, non-profit organisation – Benchmarks for RDF and graph databases, similar to TPC – Big data volume, complex queries EUCLID – Scaling up Linked Data 64
  • 64. SUMMARY EUCLID – Scaling up Linked Data 65
  • 65. Summary • Linked Data is a good fit for the Variety challenge of Big Data • Linked Data can simplify data discovery, data access, data integration challenges for Big Data • Exponential growth of Linked Data • Linked Data benchmarks target bigger workloads EUCLID – Scaling up Linked Data 66
  • 66. Summary (2) • Ongoing R&D towards scaling up Linked Data for high data Volume and Velocity – NoSQL datastores for RDF data management – Hadoop for scalable RDF reasoning – GPUs for scalable RDF reasoning • Adapting Linked Data & SPARQL for streaming data scenarios EUCLID – Scaling up Linked Data 67
  • 67. For exercises, quiz and further material visit our website: http://www.euclid-project.eu Course eBook Other channels: @euclid_project euclidproject EUCLID – Scaling up Linked Data euclidproject 68