SlideShare uma empresa Scribd logo
LDBC Semantic Publishing Benchmark
Evolution
March 2015
LDBC Semantic Publishing Benchmark Evolution #1Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#2LDBC Semantic Publishing Benchmark Evolution Mar 2015
SPB needed to evolve in order to:
• Allow for retrieval of semantically relevant content
– Based on rich metadata descriptions and diverse reference knowledge
• Demonstrate that triplestores offer simplified and
more efficient querying
And this way to:
• Cover more advanced and pertinent usage of
triplestores
• Present a bigger challenge to the engines
Change is Needed
#3LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Background and issues with SPB 1.0:
– “Reference data” = master data in Dynamic Semantic Publishing
• Essentially, taxonomies and entity datasets used to describe “creative works”
– SPB 1.0 has very small set of reference data
– Reference data is not interconnected – few relations between entities
– All descriptions of content assets (articles, pictures, etc.) refer to the
Reference Data, but not to one another
– Star-shaped graph with small reference dataset in the middle
• This way SPB is not using the full potential of graph databases
• Bigger and better connected Reference Data
– More reference data and more connections between the entities
• A better connected dataset
– Make cross references between the different pieces of content
Directions for Improvement
#4LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Make use of inference
– In SPB 1.0 it is really trivial: couple of subclasses and sub-properties
– It would be relevant to have some transitive, symmetric and inverse
properties
– Such semantics can foster retrieval of relevant content
• More interesting and challenging query mixes
– This can go in plenty of different directions, but the most obvious one
is to evolve the queries so that they take benefit from changes above
– On the bigger picture, there should be a way to make some of the
queries nicer, i.e. simpler and cleaner
• Testing high-availability
– FT acceptance tests for HA cluster are great starting point
– Too ambitious for SPB 2.0
Directions for Improvement (2)
#5LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#6LDBC Semantic Publishing Benchmark Evolution Mar 2015
• 22M explicit statements total in SPB v.2.0
– vs. 127k statements in SPB 1.0; BBC lists + tiny fractions of GeoNames
• Added reference data from DBPedia 2014
– Extracts alike: DESCRIBE ?e WHERE { ?e a dbp-ont:Company }
– Companies (85 000 entities)
– Events (50 000 entities)
– Persons (1M entities)
• Geonames data for Europe
– All European countries, w/o RUS, UKR, BLR; 650 000 locations total
– Some properties and locations types irrelevant to SPB excluded
• owl:sameAs links between DBpedia and Geonames
– 500k owl:sameAs mappings
Extended Reference Dataset
#7LDBC Semantic Publishing Benchmark Evolution Mar 2015
To
Company
To Person To Place /
gn:Feature
To Event
Company 40,797 26,675 218,636 18
Person 89,506 1,324,425 3,380,145 145,892
Event 5,114 154,207 140,579 35,442
Interconnected Reference Dataset
#8LDBC Semantic Publishing Benchmark Evolution
• Substantial volume of connections between entities
• Geonames comes with hierarchical relationship,
defining nesting of locations, gn:parentFeature
• DBPedia inter-entity relationships between entities*:
* numbers shown in table are approximate
Mar 2015
• GraphDB’s RDFRank feature is used to calculate a
rank for all the entities in the Reference Data
– RDFRank calculates a measure of importance for all URIs in an RDF
graph, based on Google’s PageRank algorithm
• These RDFRanks are calculated using GraphDB after
the entire Reference Data is loaded
– This way all sorts of relationships in the graph are considered during
calculations, including such that were logically inferred
• RDFRanks are inserted using predicate
<http://www.ldbcouncil.org/spb#hasRDFRank>
• These ranks are available as a part of the Ref. Data
– No need to get compute them again during loading
RDF-Rank for the Reference Data
#9LDBC Semantic Publishing Benchmark Evolution Mar 2015
• The Data Generator uses a set of “popular entities”
– Those are referred in 30% of the content-to-entity relations/tags
– This is one of the heuristics used by the data generator to produce
more realistic data distributions
– This is implemented in SPB 1.0, no change in SPB 2.0
• Popular entities are those with top 5% RDF Rank
– This is the new thing in SPB v.2.0
– Before that the popular entities were selected randomly
– This way, we get a more realistic dataset where those entities which
are more often used to tag content also have better connectivity in the
Reference Data
• In the future, RDF-ranks can be used for other
purposes also
RDF-Rank Used for Popular Entities
#10LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#11LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Generated three times more relationships between
Creative Works and Entities, than in SPB 1.0
– More recent use cases in publishing, adopt rich metadata descriptions
with more than 10 references to relevant entities and concepts
– In SPB 1.0 there are on average a bit less than 3 entity references per
creative work, based on distributions from an old BBC archive
– While developing SPB 2.0 we didn’t have access to substantial amount
of rich semantic metadata from publisher’s archive, to allow us derive
content-to-concept frequency distributions
– So, we decided to use the same distributions from BBC, but to triple
the concept references for each of them
– In SPB 2.0 there are on average about 8 content-to-concept references
– This results in about 25 statements/creative work description
– All other heuristics and specifics of the metadata generator were
preserved (popular entities, CW sequences alike storylines, etc.)
Changes in the Metadata
#12LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Geonames URIs used for content-to-location
references
– As in SPB v.1.0, each creative work description refers (through
cwork:Mention property) to exactly one location
• These references are exploited in queries with geo-spatial constraints
– While most of the geo-spatial information in the Reference Data
comes from Geonames, for about 500 thousand locations there are
also DBPedia URIs, that come from the DBPedia-to-Geonames
owl:sameAs mappings
– Intentionally, DBPedia URIs are not used when metadata about
creative works is generated
• In contrast, the substitution parameters for locations in the corresponding queries
use only DBpedia locations
• This way the corresponding queries would return no results if proper SameAs
expansion doesn’t exist
Changes in the Metadata (2)
#13LDBC Semantic Publishing Benchmark Evolution Mar 2015
• As a result of changes in the size of the Reference
Data and the metadata size, there are differences in
the number of Creative Works included at the same
scale factor in SPB v.1.0 and in SPB v.2.0
Changes in the Metadata - Stats
#14LDBC Semantic Publishing Benchmark Evolution Mar 2015
SPB 1.0 SPB 2.0
Reference Data size (explicit statements) 170k 22M
Creative Work descr. size (explicit st./CW) 19 25
Metadata in 50M dataset (explicit statements) 50M 28M
Creative Works in 50M datasets (count) 2.6M 1.1M
Creative Work-to-Entity relationships in 50M 7M 9M
Metadata in 1B dataset (explicit statements) 1B 978M
Creative Works in 1B datasets (count) 53M 39M
Creative Work-to-Entity relationships in 1B 137M 313M
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#15LDBC Semantic Publishing Benchmark Evolution Mar 2015
The Rules Sets
LDBC Semantic Publishing Benchmark Evolution #16
• Complete inference in SPB 2.0 requires support for
the following primitives:
– RDFS: subPropertyOf, subClassOf
– OWL: TransitiveProperty, SymmetricProperty, sameAs
• Any triplestore with OWL 2 RL reasoning support will
be able to process SPB v2.0 correctly
– In fact, a much reduced rule-set is sufficient for complete reasoning in
as in OWL 2 RL there are host of primitives and rules that SPB 2.0 does
not make use of. Such examples are all onProperty restrictions, class
and property equivalence, property chains, etc.
– The popular (but not W3C standardized) OWL Horst profile is sufficient
for reasoning with SPB v 2.0
• OWL Horst refers to the pD* entailment defined by Herman Ter Horst in:
Completeness, decidability and complexity of entailment for RDF Schema and a
semantic extension involving the OWL vocabulary. J. Web Sem. 3(2-3): 79-115 (2005)
Mar 2015
• Modified versions of the BBC Ontologies
– As in SPB v.1.0, BBC Core ontology defines relationships between
Person – Organizations – Locations - Events
– Those were mapped to corresponding DBPedia and Geonames classes
and relationships
• bbccore:Thing is defined as super class of dbp-ont:Company, dbp-ont:Event,
foaf:Person, geonames-ontology:Feature (geographic feature)
– dbp-ont:parentCompany is defined to be owl:TransitiveProperty
– These extra definitions are added at the end of the BBC Core ontology
• Added a SPB-modified version of the Geonames
ontology
– Stripped out unnecessary pieces such onProperty restrictions that
impose cardinality constraints
New and Updated Ontologies
LDBC Semantic Publishing Benchmark Evolution #17Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#18LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Transitive closure of location-nesting relationships
– geo-ont:parentFeature property is used in GeoNames to link each
location to larger locations that it is a part of
– geo-ont:parentFeature is owl:TransitiveProperty in the GN ontology
– This way if Munich has gn:parentFeature Bavaria, that on its turn has
gn:parentFeature Germany, than reasoning should make sure that
Germany also appears as gn:parentFeature of Munich
• Transitive closure of dbp-ont:ParentCompany
– Inference unveils indirect company control relationships
• owl:sameAs relations between Geonames features
and DBPedia URIs for the same locations:
– The standard semantics of owl:sameAs requires that each statement
asserted using the Geonames URIs should be inferable/retrievable also
with the mapped DBPedia URIs
Concrete Inference Patterns
LDBC Semantic Publishing Benchmark Evolution #19Mar 2015
• The inference patterns from SPB v.1.0 are still there:
– cwork:tag statements inferred from each of its sub-properties
cwork:about and cwork:mentions
– < ?cw rdf:type cwork:CreativeWork > statements inferred for
instances of each of its sub-classes
– These two are the only reasoning patterns that apply to the generated
content metadata; all the others apply only to the Reference Data
Concrete Inference Patterns (2)
LDBC Semantic Publishing Benchmark Evolution #20Mar 2015
• If brute-force materialization is used, the Reference
Data expands from 22M statements to about 100M
– If owl:sameAs expansion is not considered, materialization on top of
the Reference Data adds about 7M statements – most of those coming
from the transitive closure of geo-ont:parentFeature
– Several triplestores (e.g. ORACLE and GraphDB) that use forward-
chaining have specific mechanism, which allow them to handle
owl:sameAs reasoning in a sort of hybrid manner, without expanding
their indices. After materialization, such engines will have to deal with
29M statements of Reference Data, instead of 100M
• Brute-force materialization of the generated
metadata describing creative works would double it
– In SPB v.1.0 the expansion factor was 1.6, but now there are more
entity references and also owl:sameAs equivalents of the location URIs
Inference Statistics and Ratios
LDBC Semantic Publishing Benchmark Evolution #21Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#22LDBC Semantic Publishing Benchmark Evolution Mar 2015
Modified Queries in Basic Interactive
LDBC Semantic Publishing Benchmark Evolution #23
• SPB 2.0 changes only the Basic Interactive Query-Mix
– The other query mixes are mostly unchanged
• In most of the queries, cwork:about and
cwork:mention properties that link creative work to
an entity, have been replaced with their supper
cwork:tag
– This applies also to the Advanced Interactive and the Analytical use
cases
– For most of the queries, both types of relations are equally relevant
– Using the super-property make the query simpler; as compared to
other approaches to make a query that uses both (e.g. UNION)
Mar 2015
New Queries
LDBC Semantic Publishing Benchmark Evolution #24
• Basic Interactive query-mix now adds 2 new queries
exploring the interconnectedness in reference data
• Q10: Retrieve CWs that mention locations in the
same province (A.ADM1) as the specified one
– There is additional constraint on time interval (5 days)
• Q11: Retrieve the most recent CWs that are tagged
with entities, related to a specific popular entity
– Relations can be inbound and outbound; explicit or inferred
Mar 2015
Q10: News from the region
LDBC Semantic Publishing Benchmark Evolution #25
SELECT ?cw ?title ?dateModified {
<http://dbpedia.org/resource/Sofia> geo-ont:parentFeature ?province .
?province geo-ont:featureCode geo-ont:A.ADM1 .
{
?location geo-ont:parentFeature ?province .
} UNION {
BIND(?province as ?location) .
}
?cw a cwork:CreativeWork ;
cwork:tag ?location ;
cwork:title ?title ;
cwork:dateModified ?dateModified .
FILTER(?dateModified >=
"2011-05-14T00:00:00.000"^^<http://www.w3.org/2001/XMLSchema#dateTime>
&& ?dateModified <
"2011-05-19T23:59:59.999"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
}
LIMIT 100
Mar 2015
Q11: News on about related entities
LDBC Semantic Publishing Benchmark Evolution #26
SELECT DISTINCT ?cw ?title ?description ?dateModified ?primaryContent
{
{
<http://dbpedia.org/resource/Teresa_Fedor> ?p ?e .
}
UNION
{
?e ?p <http://dbpedia.org/resource/Teresa_Fedor> .
}
?e a core:Thing .
?cw cwork:tag ?e ;
cwork:title ?title ;
cwork:description ?description ;
cwork:dateModified ?dateModified ;
bbc:primaryContentOf ?primaryContent .
}
ORDER BY DESC(?dateModified)
LIMIT 100
Mar 2015
Trivial Validation Opportunities
LDBC Semantic Publishing Benchmark Evolution #27
• The validation mechanisms from SPB v.1.0 remain
unchanged
• In addition to this, the changes to the benchmark are
designed such a way that in case that specific
inference patterns are not supported by the engine,
some of the queries will return zero results
– If owl:sameAs inference is not supported Q10 will return 0 results
– If transitive properties are not supported, Q10 will return 0 results
• Q11 will return smaller number of results (higher ratio of zeros also)
– If rdfs:subProperty is not supported, Q2-Q10 will return 0 results
– If rdfs:subClass is not supported, Q1, Q2, Q6, Q8 will return no results
Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• The New Queries
• GraphDB Configuration Disclosure
• Future plans
Outline
#28LDBC Semantic Publishing Benchmark Evolution Mar 2015
1. Reasoning performed through forward-chaining
with a custom ruleset
– This is the rdfs-optimized ruleset of GraphDB with added rules to
support transitive, inverse and symmetric properties
2. owl:sameAs optimization of GraphDB is beneficial
– This is the standard behavior of GraphDB; but it helps a lot
– Query-time tuning is used to disable expansion of results with
respect to owl:sameAs equivalent URIs
3. Geo-spatial index in Q6 of the basic interactive mix
4. Lucene Connector for full-text search in Q8
Note: Points #2-#4 above help query time, but slow
down updates in GraphDB. As materialization does also
GraphDB Configuration
LDBC Semantic Publishing Benchmark Evolution #29Mar 2015
• Experimental run using GraphDB-SE 6.1
• Using LDBC Scale Factor 1 - (22M reference data +
30M generated data)
• Hardware: 32G RAM, Intel i7-4770 CPU @ 3.40GHz,
single SSD Drive Samsung 845 DC
• Benchmark configuration: 8 reading / 2 writing
agents, 1 min warmup, 40 min benchmark
• Updates/sec : 13, Selects /sec : 44
GraphDB – First Results
LDBC Semantic Publishing Benchmark Evolution #30Mar 2015
• Motivation
• The New Reference Datasets
• Changes in the generated metadata
• Ontologies and Required Rule-sets
• The Concrete Inference patterns
• Changes in the Basic Interactive Query Mix
• GraphDB Configuration Disclosure
• Future plans
Outline
#31LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Relationships between pieces of content
– At present Creative Works are related only to Entities, not to other
CWs
– These could be StoryLines or Collections of MM assets, e.g. an article
with few images related to it
• Better modelling of content-to-entity cardinalities
– Based on data from FT
• Enriched query sets and realistic query frequencies
– Based on query logs from FT
• Loading/updating big dataset in live instances
• Testing high-availability
– FT acceptance tests for High Availability cluster are great starting point
Future Plans
#32LDBC Semantic Publishing Benchmark Evolution Mar 2015
• Much bigger reference dataset: from 170k to 22M
– 7M statement from GeoNames about Europe
– 14M statements from DBpedia for Companies, Persons, Events
• Interconnected reference data: more than 5M links
– owl:sameAs links between DBPedia and Geonames
• Much more comprehensive usage of inference
– While still the simplest possible flavor of OWL is used
• rdfs:subClassOf, rdfs:subPorpertyOf, owl:TransitiveProperty, owl:sameAs
– Transitive closure over company control and geographic nesting
– Simpler queries through usage of super-property
• Retrieval of relevant content through links in the
reference data
Summary of SPB v.2.0 Changes
#33LDBC Semantic Publishing Benchmark Evolution Mar 2015

Mais conteúdo relacionado

Mais procurados

19 09-26 source reconfiguration and august release features
19 09-26 source reconfiguration and august release features19 09-26 source reconfiguration and august release features
19 09-26 source reconfiguration and august release features
Dede Sabey
 
Census GSS Initiative (EPAN 2011)
Census GSS Initiative (EPAN 2011)Census GSS Initiative (EPAN 2011)
Census GSS Initiative (EPAN 2011)
WV Assocation of Geospatial Professionals
 
Dev Analytics Overview
Dev Analytics OverviewDev Analytics Overview
Dev Analytics Overview
Sunita Shrivastava
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
Josef Hardi
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Databricks
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
Ali Arsalan Kazmi
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
Andrea Gazzarini
 
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Databricks
 

Mais procurados (8)

19 09-26 source reconfiguration and august release features
19 09-26 source reconfiguration and august release features19 09-26 source reconfiguration and august release features
19 09-26 source reconfiguration and august release features
 
Census GSS Initiative (EPAN 2011)
Census GSS Initiative (EPAN 2011)Census GSS Initiative (EPAN 2011)
Census GSS Initiative (EPAN 2011)
 
Dev Analytics Overview
Dev Analytics OverviewDev Analytics Overview
Dev Analytics Overview
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
Graph Features in Spark 3.0: Integrating Graph Querying and Algorithms in Spa...
 

Destaque

20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
Ioan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
LDBC council
 
LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
Ioan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
Ioan Toma
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
Ioan Toma
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
Ioan Toma
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
Ioan Toma
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
Ioan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
Ioan Toma
 

Destaque (9)

20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 

Semelhante a Ldbc spb 2.0 evolution

SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
Kristian Alexander
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
Marin Dimitrov
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
Chimezie Ogbuji
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
Synaptica, LLC
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Creating Knowledge out of Interlinked Data
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries)
robin fay
 
Practical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataPractical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic data
Terry Reese
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
Holistic Benchmarking of Big Linked Data
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
Marin Dimitrov
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
Data Processing over very Large Relational Databases
Data Processing over very Large Relational DatabasesData Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
kvaderlipa
 
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextAALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
Terry Reese
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Thanh Tran
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
infor123
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
C/D/H Technology Consultants
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNIC
APNIC
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
Ashnikbiz
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of libraries
Regan Harper
 

Semelhante a Ldbc spb 2.0 evolution (20)

SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries)
 
Practical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataPractical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic data
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Data Processing over very Large Relational Databases
Data Processing over very Large Relational DatabasesData Processing over very Large Relational Databases
Data Processing over very Large Relational Databases
 
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNextAALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNIC
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of libraries
 

Mais de Ioan Toma

Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Ioan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
Ioan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
Ioan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
Ioan Toma
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
Ioan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
Ioan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
Ioan Toma
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 

Mais de Ioan Toma (10)

Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 

Último

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 

Último (20)

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 

Ldbc spb 2.0 evolution

  • 1. LDBC Semantic Publishing Benchmark Evolution March 2015 LDBC Semantic Publishing Benchmark Evolution #1Mar 2015
  • 2. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #2LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 3. SPB needed to evolve in order to: • Allow for retrieval of semantically relevant content – Based on rich metadata descriptions and diverse reference knowledge • Demonstrate that triplestores offer simplified and more efficient querying And this way to: • Cover more advanced and pertinent usage of triplestores • Present a bigger challenge to the engines Change is Needed #3LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 4. • Background and issues with SPB 1.0: – “Reference data” = master data in Dynamic Semantic Publishing • Essentially, taxonomies and entity datasets used to describe “creative works” – SPB 1.0 has very small set of reference data – Reference data is not interconnected – few relations between entities – All descriptions of content assets (articles, pictures, etc.) refer to the Reference Data, but not to one another – Star-shaped graph with small reference dataset in the middle • This way SPB is not using the full potential of graph databases • Bigger and better connected Reference Data – More reference data and more connections between the entities • A better connected dataset – Make cross references between the different pieces of content Directions for Improvement #4LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 5. • Make use of inference – In SPB 1.0 it is really trivial: couple of subclasses and sub-properties – It would be relevant to have some transitive, symmetric and inverse properties – Such semantics can foster retrieval of relevant content • More interesting and challenging query mixes – This can go in plenty of different directions, but the most obvious one is to evolve the queries so that they take benefit from changes above – On the bigger picture, there should be a way to make some of the queries nicer, i.e. simpler and cleaner • Testing high-availability – FT acceptance tests for HA cluster are great starting point – Too ambitious for SPB 2.0 Directions for Improvement (2) #5LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 6. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #6LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 7. • 22M explicit statements total in SPB v.2.0 – vs. 127k statements in SPB 1.0; BBC lists + tiny fractions of GeoNames • Added reference data from DBPedia 2014 – Extracts alike: DESCRIBE ?e WHERE { ?e a dbp-ont:Company } – Companies (85 000 entities) – Events (50 000 entities) – Persons (1M entities) • Geonames data for Europe – All European countries, w/o RUS, UKR, BLR; 650 000 locations total – Some properties and locations types irrelevant to SPB excluded • owl:sameAs links between DBpedia and Geonames – 500k owl:sameAs mappings Extended Reference Dataset #7LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 8. To Company To Person To Place / gn:Feature To Event Company 40,797 26,675 218,636 18 Person 89,506 1,324,425 3,380,145 145,892 Event 5,114 154,207 140,579 35,442 Interconnected Reference Dataset #8LDBC Semantic Publishing Benchmark Evolution • Substantial volume of connections between entities • Geonames comes with hierarchical relationship, defining nesting of locations, gn:parentFeature • DBPedia inter-entity relationships between entities*: * numbers shown in table are approximate Mar 2015
  • 9. • GraphDB’s RDFRank feature is used to calculate a rank for all the entities in the Reference Data – RDFRank calculates a measure of importance for all URIs in an RDF graph, based on Google’s PageRank algorithm • These RDFRanks are calculated using GraphDB after the entire Reference Data is loaded – This way all sorts of relationships in the graph are considered during calculations, including such that were logically inferred • RDFRanks are inserted using predicate <http://www.ldbcouncil.org/spb#hasRDFRank> • These ranks are available as a part of the Ref. Data – No need to get compute them again during loading RDF-Rank for the Reference Data #9LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 10. • The Data Generator uses a set of “popular entities” – Those are referred in 30% of the content-to-entity relations/tags – This is one of the heuristics used by the data generator to produce more realistic data distributions – This is implemented in SPB 1.0, no change in SPB 2.0 • Popular entities are those with top 5% RDF Rank – This is the new thing in SPB v.2.0 – Before that the popular entities were selected randomly – This way, we get a more realistic dataset where those entities which are more often used to tag content also have better connectivity in the Reference Data • In the future, RDF-ranks can be used for other purposes also RDF-Rank Used for Popular Entities #10LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 11. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #11LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 12. • Generated three times more relationships between Creative Works and Entities, than in SPB 1.0 – More recent use cases in publishing, adopt rich metadata descriptions with more than 10 references to relevant entities and concepts – In SPB 1.0 there are on average a bit less than 3 entity references per creative work, based on distributions from an old BBC archive – While developing SPB 2.0 we didn’t have access to substantial amount of rich semantic metadata from publisher’s archive, to allow us derive content-to-concept frequency distributions – So, we decided to use the same distributions from BBC, but to triple the concept references for each of them – In SPB 2.0 there are on average about 8 content-to-concept references – This results in about 25 statements/creative work description – All other heuristics and specifics of the metadata generator were preserved (popular entities, CW sequences alike storylines, etc.) Changes in the Metadata #12LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 13. • Geonames URIs used for content-to-location references – As in SPB v.1.0, each creative work description refers (through cwork:Mention property) to exactly one location • These references are exploited in queries with geo-spatial constraints – While most of the geo-spatial information in the Reference Data comes from Geonames, for about 500 thousand locations there are also DBPedia URIs, that come from the DBPedia-to-Geonames owl:sameAs mappings – Intentionally, DBPedia URIs are not used when metadata about creative works is generated • In contrast, the substitution parameters for locations in the corresponding queries use only DBpedia locations • This way the corresponding queries would return no results if proper SameAs expansion doesn’t exist Changes in the Metadata (2) #13LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 14. • As a result of changes in the size of the Reference Data and the metadata size, there are differences in the number of Creative Works included at the same scale factor in SPB v.1.0 and in SPB v.2.0 Changes in the Metadata - Stats #14LDBC Semantic Publishing Benchmark Evolution Mar 2015 SPB 1.0 SPB 2.0 Reference Data size (explicit statements) 170k 22M Creative Work descr. size (explicit st./CW) 19 25 Metadata in 50M dataset (explicit statements) 50M 28M Creative Works in 50M datasets (count) 2.6M 1.1M Creative Work-to-Entity relationships in 50M 7M 9M Metadata in 1B dataset (explicit statements) 1B 978M Creative Works in 1B datasets (count) 53M 39M Creative Work-to-Entity relationships in 1B 137M 313M
  • 15. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #15LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 16. The Rules Sets LDBC Semantic Publishing Benchmark Evolution #16 • Complete inference in SPB 2.0 requires support for the following primitives: – RDFS: subPropertyOf, subClassOf – OWL: TransitiveProperty, SymmetricProperty, sameAs • Any triplestore with OWL 2 RL reasoning support will be able to process SPB v2.0 correctly – In fact, a much reduced rule-set is sufficient for complete reasoning in as in OWL 2 RL there are host of primitives and rules that SPB 2.0 does not make use of. Such examples are all onProperty restrictions, class and property equivalence, property chains, etc. – The popular (but not W3C standardized) OWL Horst profile is sufficient for reasoning with SPB v 2.0 • OWL Horst refers to the pD* entailment defined by Herman Ter Horst in: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. J. Web Sem. 3(2-3): 79-115 (2005) Mar 2015
  • 17. • Modified versions of the BBC Ontologies – As in SPB v.1.0, BBC Core ontology defines relationships between Person – Organizations – Locations - Events – Those were mapped to corresponding DBPedia and Geonames classes and relationships • bbccore:Thing is defined as super class of dbp-ont:Company, dbp-ont:Event, foaf:Person, geonames-ontology:Feature (geographic feature) – dbp-ont:parentCompany is defined to be owl:TransitiveProperty – These extra definitions are added at the end of the BBC Core ontology • Added a SPB-modified version of the Geonames ontology – Stripped out unnecessary pieces such onProperty restrictions that impose cardinality constraints New and Updated Ontologies LDBC Semantic Publishing Benchmark Evolution #17Mar 2015
  • 18. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #18LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 19. • Transitive closure of location-nesting relationships – geo-ont:parentFeature property is used in GeoNames to link each location to larger locations that it is a part of – geo-ont:parentFeature is owl:TransitiveProperty in the GN ontology – This way if Munich has gn:parentFeature Bavaria, that on its turn has gn:parentFeature Germany, than reasoning should make sure that Germany also appears as gn:parentFeature of Munich • Transitive closure of dbp-ont:ParentCompany – Inference unveils indirect company control relationships • owl:sameAs relations between Geonames features and DBPedia URIs for the same locations: – The standard semantics of owl:sameAs requires that each statement asserted using the Geonames URIs should be inferable/retrievable also with the mapped DBPedia URIs Concrete Inference Patterns LDBC Semantic Publishing Benchmark Evolution #19Mar 2015
  • 20. • The inference patterns from SPB v.1.0 are still there: – cwork:tag statements inferred from each of its sub-properties cwork:about and cwork:mentions – < ?cw rdf:type cwork:CreativeWork > statements inferred for instances of each of its sub-classes – These two are the only reasoning patterns that apply to the generated content metadata; all the others apply only to the Reference Data Concrete Inference Patterns (2) LDBC Semantic Publishing Benchmark Evolution #20Mar 2015
  • 21. • If brute-force materialization is used, the Reference Data expands from 22M statements to about 100M – If owl:sameAs expansion is not considered, materialization on top of the Reference Data adds about 7M statements – most of those coming from the transitive closure of geo-ont:parentFeature – Several triplestores (e.g. ORACLE and GraphDB) that use forward- chaining have specific mechanism, which allow them to handle owl:sameAs reasoning in a sort of hybrid manner, without expanding their indices. After materialization, such engines will have to deal with 29M statements of Reference Data, instead of 100M • Brute-force materialization of the generated metadata describing creative works would double it – In SPB v.1.0 the expansion factor was 1.6, but now there are more entity references and also owl:sameAs equivalents of the location URIs Inference Statistics and Ratios LDBC Semantic Publishing Benchmark Evolution #21Mar 2015
  • 22. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #22LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 23. Modified Queries in Basic Interactive LDBC Semantic Publishing Benchmark Evolution #23 • SPB 2.0 changes only the Basic Interactive Query-Mix – The other query mixes are mostly unchanged • In most of the queries, cwork:about and cwork:mention properties that link creative work to an entity, have been replaced with their supper cwork:tag – This applies also to the Advanced Interactive and the Analytical use cases – For most of the queries, both types of relations are equally relevant – Using the super-property make the query simpler; as compared to other approaches to make a query that uses both (e.g. UNION) Mar 2015
  • 24. New Queries LDBC Semantic Publishing Benchmark Evolution #24 • Basic Interactive query-mix now adds 2 new queries exploring the interconnectedness in reference data • Q10: Retrieve CWs that mention locations in the same province (A.ADM1) as the specified one – There is additional constraint on time interval (5 days) • Q11: Retrieve the most recent CWs that are tagged with entities, related to a specific popular entity – Relations can be inbound and outbound; explicit or inferred Mar 2015
  • 25. Q10: News from the region LDBC Semantic Publishing Benchmark Evolution #25 SELECT ?cw ?title ?dateModified { <http://dbpedia.org/resource/Sofia> geo-ont:parentFeature ?province . ?province geo-ont:featureCode geo-ont:A.ADM1 . { ?location geo-ont:parentFeature ?province . } UNION { BIND(?province as ?location) . } ?cw a cwork:CreativeWork ; cwork:tag ?location ; cwork:title ?title ; cwork:dateModified ?dateModified . FILTER(?dateModified >= "2011-05-14T00:00:00.000"^^<http://www.w3.org/2001/XMLSchema#dateTime> && ?dateModified < "2011-05-19T23:59:59.999"^^<http://www.w3.org/2001/XMLSchema#dateTime>) } LIMIT 100 Mar 2015
  • 26. Q11: News on about related entities LDBC Semantic Publishing Benchmark Evolution #26 SELECT DISTINCT ?cw ?title ?description ?dateModified ?primaryContent { { <http://dbpedia.org/resource/Teresa_Fedor> ?p ?e . } UNION { ?e ?p <http://dbpedia.org/resource/Teresa_Fedor> . } ?e a core:Thing . ?cw cwork:tag ?e ; cwork:title ?title ; cwork:description ?description ; cwork:dateModified ?dateModified ; bbc:primaryContentOf ?primaryContent . } ORDER BY DESC(?dateModified) LIMIT 100 Mar 2015
  • 27. Trivial Validation Opportunities LDBC Semantic Publishing Benchmark Evolution #27 • The validation mechanisms from SPB v.1.0 remain unchanged • In addition to this, the changes to the benchmark are designed such a way that in case that specific inference patterns are not supported by the engine, some of the queries will return zero results – If owl:sameAs inference is not supported Q10 will return 0 results – If transitive properties are not supported, Q10 will return 0 results • Q11 will return smaller number of results (higher ratio of zeros also) – If rdfs:subProperty is not supported, Q2-Q10 will return 0 results – If rdfs:subClass is not supported, Q1, Q2, Q6, Q8 will return no results Mar 2015
  • 28. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • The New Queries • GraphDB Configuration Disclosure • Future plans Outline #28LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 29. 1. Reasoning performed through forward-chaining with a custom ruleset – This is the rdfs-optimized ruleset of GraphDB with added rules to support transitive, inverse and symmetric properties 2. owl:sameAs optimization of GraphDB is beneficial – This is the standard behavior of GraphDB; but it helps a lot – Query-time tuning is used to disable expansion of results with respect to owl:sameAs equivalent URIs 3. Geo-spatial index in Q6 of the basic interactive mix 4. Lucene Connector for full-text search in Q8 Note: Points #2-#4 above help query time, but slow down updates in GraphDB. As materialization does also GraphDB Configuration LDBC Semantic Publishing Benchmark Evolution #29Mar 2015
  • 30. • Experimental run using GraphDB-SE 6.1 • Using LDBC Scale Factor 1 - (22M reference data + 30M generated data) • Hardware: 32G RAM, Intel i7-4770 CPU @ 3.40GHz, single SSD Drive Samsung 845 DC • Benchmark configuration: 8 reading / 2 writing agents, 1 min warmup, 40 min benchmark • Updates/sec : 13, Selects /sec : 44 GraphDB – First Results LDBC Semantic Publishing Benchmark Evolution #30Mar 2015
  • 31. • Motivation • The New Reference Datasets • Changes in the generated metadata • Ontologies and Required Rule-sets • The Concrete Inference patterns • Changes in the Basic Interactive Query Mix • GraphDB Configuration Disclosure • Future plans Outline #31LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 32. • Relationships between pieces of content – At present Creative Works are related only to Entities, not to other CWs – These could be StoryLines or Collections of MM assets, e.g. an article with few images related to it • Better modelling of content-to-entity cardinalities – Based on data from FT • Enriched query sets and realistic query frequencies – Based on query logs from FT • Loading/updating big dataset in live instances • Testing high-availability – FT acceptance tests for High Availability cluster are great starting point Future Plans #32LDBC Semantic Publishing Benchmark Evolution Mar 2015
  • 33. • Much bigger reference dataset: from 170k to 22M – 7M statement from GeoNames about Europe – 14M statements from DBpedia for Companies, Persons, Events • Interconnected reference data: more than 5M links – owl:sameAs links between DBPedia and Geonames • Much more comprehensive usage of inference – While still the simplest possible flavor of OWL is used • rdfs:subClassOf, rdfs:subPorpertyOf, owl:TransitiveProperty, owl:sameAs – Transitive closure over company control and geographic nesting – Simpler queries through usage of super-property • Retrieval of relevant content through links in the reference data Summary of SPB v.2.0 Changes #33LDBC Semantic Publishing Benchmark Evolution Mar 2015