1. BowlognaBench
Benchmarking
RDF
Analy5cs
Gianluca
Demar5ni,
Iliya
Enchev,
Joël
Gapany,
and
Philippe
Cudré-‐Mauroux
eXascale
Infolab
&
Faculty
of
Humani5es
University
of
Fribourg,
Switzerland
30-‐Jun-‐11
Gianluca
Demar5ni
1
2. Mo5va5on
• Seman5c
Data
keeps
increasing
on
the
Web
• More
common
is
the
need
to
run
OLAP-‐type
queries
– How
did
university
student
performance
evolve
over
last
5
years?
• A
novel
benchmark
for
Knowledge
Bases
focusing
on
complex
Analy5cs
queries
30-‐Jun-‐11
Gianluca
Demar5ni
2
3. Why
do
we
need
a
new
RDF
benchmark?
• Exis5ng
RDF
benchmarks
(e.g.,
LUBM)
– Don’t
deal
with
complex
analy5c
queries
– Don’t
look
at
the
temporal
dimension
– Don’t
model
a
realis5c
se_ng
• Analy5c
benchmarks
exist
for
rela5onal
systems
(e.g.
TPC-‐H)
30-‐Jun-‐11
Gianluca
Demar5ni
3
4. The
Bologna
Reform
• Started
in
June
1999
• Framework
for
higher
educa5on
systems
• 47
Countries
• Common
academic
degrees
• Common
study
structure
• Common
terminology
30-‐Jun-‐11
Gianluca
Demar5ni
4
5. The
university
se_ng
ader
Bologna
• A
lot
of
data
is
available
– Not
following
standard
schemas
– Comprehensive
and
available
data
is
a
success
factor
• Shared
data
– Erasmus
exchanges
– Courses
in
a
given
language
• Analy5c
tools
may
help
monitoring
university
performance
30-‐Jun-‐11
Gianluca
Demar5ni
5
6. An
ontology
about
Bologna
• A
Lexicon
for
the
Bologna
Reform
– Basic
set
of
terms
for
the
new
system
– Stable
across
5me
and
ins5tu5ons
– Developed
by
a
professional
terminologist
30-‐Jun-‐11
Gianluca
Demar5ni
6
7. The
ontology
crea5on
process
• The
Bowlogna
Ontology
– 29
top
classes
(67
in
total)
– Classes:
student,
professor,
evalua5on,
teaching
unit,
ECTS
credit,
semester,
etc.
– Concept
defini5ons
in
English,
French,
German
30-‐Jun-‐11
Gianluca
Demar5ni
7
9. Bowlogna
Ontology
• Private
/
Public
parts
– Public
data
can
be
shared
with
other
uni
(e.g.,
course
descrip5ons)
– Private
data
in
sensible
(e.g.,
evalua5on
results)
• Private
data
might
contain
more
instances
• Aggrega5ons
over
private
data
may
be
shared
(e.g.,
number
of
enrolled
students)
30-‐Jun-‐11
Gianluca
Demar5ni
9
10. The
Benchmark
• Bowlogna
Ontology
– 67
classes
• 12
Analy5cs
queries
– Natural
language
and
SPARQL
transla5on
• Automa5c
Instance
Generator
– Populated
ontology
with
given
num
of
instances
• Test
over
num
of
instances
and
universi5es
30-‐Jun-‐11
Gianluca
Demar5ni
10
11. Analy5c
Queries
• Count
• Molecule
– Query
4.
Return
all
informa5on
about
Student0
within
a
scope
of
two
• Max
Min
• Ranking
and
TopK
• Temporal
– Query
8.
What
is
the
average
comple5on
5me
of
Bachelor
studies
for
each
Study
Track?
• Path
• Mul5ple
Universi5es
30-‐Jun-‐11
Gianluca
Demar5ni
11
12. Query
Classifica5on
we classify a query as having a large input size if it involves more than 5%
of instances, and small otherwise. Selectivity measures the amount of instances
that match the query: we classify a query as having high selectivity if less than
10% of instances match the query, and low otherwise. Complexity measures the
amount of classes and properties involved in the query: queries are classified as
having high or low complexity accordingly to the RDF schema we have defined.
Table 1. Classification of queries according to their need to access private and public
data, input size, selectivity, and complexity.
Count Molecule MaxMin TopK Temp Path MultiUniv
Query 1 2 3 4 5 6 7 8 9 10 11 12
Public x x x x x x x x
Private x x x x x x x x
Input Size Small Large Small Small Large Small Large Large Large Large Large Large
Selectivity High Low Low Low Low Low High Low Low Low Low Low
Complexity Low Low Low High High Low High Low High High Low High
As we can see the majority of queries have a low selectivity which reflects our
intent of performing analytic queries, that is, queries for which a lot of data is
retrieved and aggregated. For the same reason, most of the queries have a large
input. Finally, queries are equally divided in high and low complexities.30-‐Jun-‐11
Gianluca
Demar5ni
12
13. From
the
process
analyst
point
of
view
• Which
system
should
I
pick
for
my
specific
problem?
– Not
looking
for
the
best
system
– Look
at
Problem-‐specific
query
sets
30-‐Jun-‐11
Gianluca
Demar5ni
13
14. Which
system
to
use?
Count
Queries
Path
Queries
Rank
Queries
Temporal
Queries
System
A
0.5s
5s
0.1s
2s
System
B
3s
0.4s
2s
1s
System
C
0.5s
0.5s
0.5s
0.5s
30-‐Jun-‐11
Gianluca
Demar5ni
14
15. Conclusions
• BowlognaBench
for
Analy5c
Queries
• OWL
Ontology
for
Higher
Educa5on
Systems
• Next
Steps
– Run
a
compara5ve
evalua5on
of
RDF
systems
– Set
up
a
wiki-‐like
space
where
groups
can
upload
experimental
results
30-‐Jun-‐11
Gianluca
Demar5ni
15