SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Semantic Data Mining: an Ontology Based Approach
Agnieszka Lawrynowicz
Institute of Computing Science
Poznan University of Technology
April 12, 2016
Seminar of the Institute of Computing Science
Poznan University of Technology
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 1
Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 2
Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 3
Introduction: data mining
Input: a data table, text documents, ...
Output: a model, a pattern set
DATA$MINING$
Model,$pa0erns$
data$
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 4
Introduction: using background knowledge in data mining
Using background knowledge in data mining has been extensively
researched
hierarchy/taxonomy of attributes (Michalski et al., 1986, Srikant,
Agrawal, 1995)
Inductive Logic Programming (Muggleton, 1991, Lavrac and
Dzeroski, 1994)
relational learning (Quinlan, 1993, de Raedt, 2008)
semantic data mining tutorial @ ECML/PKDD’2011 (Lavrac,
Vavpetic, Lawrynowicz, Potoniec, Hilario, Kalousis)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 5
Introduction: relational data mining
Input: a relational database, a graph, a set of logical facts, ...
Output: a model, a pattern set
RELATIONAL)
DATA)MINING)
Model,)pa4erns)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 6
Semantic data mining
Input:
a data table, text documents, Web pages, a relational database, a
graph, a set of logical facts, ...
one or more ontologies
Output: a model, a pattern set
SEMANTIC)
DATA)MINING)
Model,)pa3erns)
Data)
Ontologies)
annota;ons)
mappings)
vocabulary)reBuse)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 7
Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 8
Ontology in computer science
“engineering artefact [...]“ (Guarino 98)
“An ontology is a
formal specification Á machine interpretation
of a shared Á group of people, consensus
conceptualization Á abstract model of phenomena, concepts
of a domain of interest“ Á domain knowledge
(Gruber 93, Studer 98)
Ontology = formal specification of a terminological knowledge (most often
from a particular domain)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 9
Semantic Web layer cake
Stos	języków	Sieci	Semantycznej	
Języki modelowania
ontologii
Dane
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 10
Ontologies + data = knowledge graph
reviewer1	 paper10	
metaReviews	
PeerReviewedPaper	MetaReviewer	 metaReviews	
reviews	
RDF	
RDFS	
rdf:type	 rdf:type	
rdfs:domain	 rdfs:range	
rdfs:subPropertyOf	
rdfs:subClassOf	
OWL	
owl:Restric>on	
rdfs:subClassOf	
Reviewer	
rdf:type	
owl:someValuesFrom	
owl:onProperty	
reviewedBy	
owl:inverseOf	
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 11
Logical meaning of OWL
Description Logics, DLs = family of first order logic-based formalisms
suitable for representing knowledge, especially terminologies, ontologies,
underpinning the Web Ontology Language (OWL).
Basic building blocks: concepts, roles, constructors, individuals
Example
TBox
Atomic concept: Reviewer, Paper
Roles: reviews, metaReviews, reviewedBy
Constructors: ⊓, ∃
Axiom (concept definition):
PeerReviewedPaper ≡ Paper ⊓ ∃reviewedBy.Reviewer
Axiom (concept description ”each meta reviewer is a reviewer”):
MetaReviewer ⊑ Reviewer
ABox
Fact assertion: metaReviews(reviewer1, paper10)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 12
Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 13
Overview of meta-learning
Meta-learning: learning to learn
application of machine learning techniques to meta-data about past
machine learning experiments;
the goal: to modify some aspect of the learning process to improve
the performance of the resulting model;
meta-mining: meta-learning applied to full data mining process
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 14
Overview of the e-LICO system (EU FP7 2009-2012)
.,+1B/0DF'4;)<'<!=1)/*'.0!<!*1<'1;!'?)BB!0!*1'=/D./*!*1<'/B'1;!'!"#$%&'+0=;)1!=1>0!'G()E>0!'7H'+*?'
<;/I<';/I'1;!A')*1!0+=1'1/'+=;)!J!'1;!'><!0K<'L*/I,!?E!'?)<=/J!0A'E/+,F''
4;!'!"#$%&')*B0+<10>=1>0!'G?!.)=1!?')*'1;!'B)E>0!'>*?!0'1;!'?+<;!?',)*!H')<'1;!'D!+*<'MA'I;)=;'1;!'
?+1+"D)*)*E'.,+1B/0D')<'?!,)J!0!?'1/'<=)!*1)<1<F'4;!')**/J+1)J!'=/0!''/B'1;!'!"#$%&'.,+1B/0D')<'1;!'
!"#$%%&'$"#( )&*+,-$./( 0**&*#1"#' G$NOP' +M/J!' 1;!' ?+<;!?' ,)*!H' I)1;' )1<' .,+**!0' +*?' D!1+",!+0*!0F'
Q/I!J!0P'1/'?!,)J!0'1;!'?+1+"D)*)*E'.,+1B/0D'1/')1<'<=)!*1)<1'><!0<P'1;!0!'+0!'<!J!0+,'/1;!0'<!0J)=!<'
+*?'=/D./*!*1<F'()E>0!'7'<;/I<'+*'/J!0J)!I'/B'!"#$%&R<'=/D./*!*1<'+*?';/I'1;!A')*1!0+=1'I)1;'
!+=;'/1;!0F'
'
()E>0!'7F'&J!0J)!I'/B'1;!'!"#$%&'<A<1!DF''
4;!0!'+0!'1I/'><!0"B+=)*E'=/D./*!*1<'B/0'1;!'!"#$%&'.,+1B/0DS'1;!<!'+,,/I'<=)!*1)<1<'1/'+==!<<'?+1+"
D)*)*E' /.!0+1/0<' +*?T/0' /1;!0' ?+1+' .0/=!<<)*E' <!0J)=!<P' 1/' =/D./<!' 1;!D' )*1/' I/0LB,/I<' +*?'
!U!=>1!' 1;!DP' =/,,!=1)*E' 1;!' 0!<>,1<' B/0' )*1!0.0!1+1)/*' /0' B>01;!0' +*+,A<)<F' 4;!<!' 1I/' =!*10+,'
)*B0+<10>=1>0!'=/D./*!*1<'+0!V'Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 15
Background knowledge: DM OPtimization Ontology
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 16
Data Mining OPtimization Ontology (DMOP)
the primary goal of DMOP is to support all decision-making steps
that determine the outcome of the data mining process;
development started in EU FP7 project e-LICO (2009-2012);
DMOP v5.5: 723 classes, 111 properties, 4291 axioms;
highly axiomatized;
represented in Web Ontology Language (OWL 2);
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 17
Competency questions
”Given a data mining task/data set, which of the valid or applicable
workflows/algorithms will yield optimal results (or at least better results
than the others)?”
”Given a set of candidate workflows/algorithms for a given task/data
set, which data set/workflow/algorithm characteristics should be
taken into account in order to select the most appropriate one?”
and others more fine-grained, e.g.:
”Which induction algorithms should I use (or avoid) when my dataset
has many more variables than instances?”
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 18
Architecture of DMOP knowledge base and its satellite
triple stores
TBox%
DMOP%
ABox%
Operator%DB%
DMEX(DB1%%%%DMEX(DB2%%…%%%DMEX(DBk%
OWL2%
RDF%
Triple%
Store%
Formal%Conceptual%Framework%%
of%Data%Mining%Domain%
Accepted%Knowledge%of%DM%
Tasks,%Algorithms,%Operators%%
Specific%DM%ApplicaFons%
Datasets,%Workflows,%Results%
MetaHminer’s%training%data%
MetaHminer’s%prior%%
DM%knowledge%
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 19
The core concepts of DMOP (simplified)
Fig. 1. The core concepts of DMOP.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 20
DMOP: algorithm representation
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 21
Alignment of DMOP with DOLCE 1/3
Two main reasons to align DMOP with a foundational ontology:
considerations about attributes and data properties; extant
non-foundational ontology solutions were partial re-inventions of how
they are treated in a foundational ontology;
reuse of the ontology’s object properties;
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 22
Alignment of DMOP with DOLCE 2/3
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 23
Alignment of DMOP with DOLCE 3/3
Perdurant: DM-Experiment and DM-Operation are subclasses of
dolce:process;
Endurant: most DM classes, such as algorithm, software, strategy,
task, and optimization problem, are subclasses of
dolce:non-physical-endurant;
Quality: characteristics and parameters of DM entities made
subclasses of dolce:abstract-quality;
Abstract: for identifying discrete values, classes added as subclasses
of dolce:abstract-region;
object properties: DMOP reuses mainly DOLCE’s parthood, quality,
and quale relations;
each of the four DOLCE main branches have been used.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 24
Qualities and attributes 1/3
How to handle ’attributes’ in OWL ontologies, and, in a broader context,
measurements?
easy way: attribute is a binary functional relation between a class and
a datatype
Elephant ⊑ =1 hasWeight.integer
Elephant ⊑ =1 hasWeightPrecise.real
Elephant ⊑ =1 hasWeightImperial.integer (in lbs)
building into one’s ontology application decisions about how to store
the data (and in which unit it is)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 25
Qualities and attributes 2/3
How to handle ’attributes’ in OWL ontologies, and, in a broader context,
measurements?
more elaborate way: unfold the notion of an object’s property (e.g.
weight) from one attribute/OWL data property into at least two
properties:
▸ one OWL object property from the object to the ’reified attribute’
(“quality property” represented as an OWL class)
▸ and another property to the value(s)
favoured in foundational ontologies;
solves the problem of non-reusability of the ’attribute’ and prevents
duplication of data properties;
measurements for DMOP more alike values for parameters;
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 26
Qualities and attributes 3/3
ModelingAlgorithm ⊑ =1 dolce:has-quality.LearningPolicy
LearningPolicy ⊑ =1 dolce:has-quale.Eager-Lazy
Eager-Lazy ⊑ ≤ 1 hasDataValue.anyType
LearningPolicy is a subclass of dolce:quality
Eager-Lazy is a subclass of dolce:abstract-region
In this way, the ontology can be linked to many different applications, who
even may use different data types, yet still agree on the meaning of the
characteristics and parameters (’attributes’) of the algorithms, tasks, and
other DM endurants.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 27
Meta-modeling in DMOP 1/4
only processes (executions of workflows) and operations (executions
of operators) consume inputs and produce outputs
DM algorithms (as well as operators and workflows) can only specify
the type of input or output
inputs and outputs (DM-Dataset and DM-Hypothesis class hierarchy,
respectively) are modeled as subclasses of IO-Object class
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 28
Meta-modeling in DMOP 2/4
DM algorithms: classes or individuals? Individuals.
Problem: expressing types of inputs/outputs associated with
algorithm
”C4.5 specifiesInputClass CategoricalLabeledDataSet” 
Individual Class
(instance of DM-Algorithm) (subclass of DM-Hypothesis)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 29
Meta-modeling in DMOP 3/4
Initial solution: one artificial class per each single algorithm with a
single instance corresponding to this particular algorithm
Problem: hasInput, hasOutput, specifiesInputClass,
specifiesOutputClass—assigned a common range—IO-Object
”C4.5 specifiesInputClass Iris” ?
Individual Individual
(instance of DM-Algorithm) (instance of DM-Hypothesis)
Iris is a concrete dataset. Clearly, any DM algorithm is not designed
to handle only a particular dataset.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 30
Meta-modeling in DMOP 4/4
Final solution: weak form of punning available in OWL 2
IO-Class: meta-class—the class of all classes of input and output
objects
”C4.5 specifiesInputClass CategoricalLabeledDataSet” 
Individual Individual
(instance of DM-Algorithm) (instance of IO-Class)
”DM-Process hasInput some CategoricalLabeledDataSet” 
Class Class
(subclass of dolce:process) (subclass of IO-Object)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 31
DM method: Fr-ONT-Qu semantic pattern miner
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 32
Data mining as search
learning in description logics (DLs) and other relational data can be
seen as search in space of concepts / RDF triples / clauses /
(conjunctive / SPARQL) queries, ...
it is possible to impose ordering on this search space, e.g., using
subsumption as natural quasi-order and generality relation between
DL concepts
▸ if D ⊑ C then C covers all instances that are covered by D
refinement operators may be applied to traverse the space by
computing a set of specializations (resp. generalizations) of a concept
/ RDF triples/ clauses/ (conjunctive / SPARQL) queries, ...
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 33
Properties of refinement operators
Consider downward refinement operator ρ and by C ρ D denote a
refinement chain from a DL concept C to D
complete: each point in lattice is reachable (for D ⊑ C there exists E
such that E ≡ D and a refinement chain C ρ ... ρ E
weakly complete: for any concept C with C ⊑ ⊺, concept E with
E ≡ C can be reached from ⊺
finite: finite for any concept
redundant: there exist two different refinement chains from C to D
proper: C ρ D implies C /≡ D
ideal = complete + proper + finite
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 34
Learning in DLs and in clausal languages is hard
Lehmann  Hitzler (ILP 2007, MLJ 2010) proved for many DLs and
(Nienhuys-Cheng  Wolf, 1997) for clausal languages that no ideal
refinement operator exists.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 35
Fr-ONT-Qu
algorithm for mining patterns in RDF(s) data
patterns expressed as SPARQL queries
generality relation: taxonomical subsumption
consists of: a refinement operator ρ and a strategy to select best
patterns for further refinement
Example SPARQL query
head SELECT ?x WHERE {
body ?x rdf:type :Paper .
?x rdf:type :PeerReviewedPaper .
?x :reviewedBy ?y
}
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 36
New generality relation: taxonomical subsumption
Taxonomically closed pattern
A pattern Q is taxonomically closed, or t-closed, w.r.t. the background knowledge
G if for each triple of the form (?x rdf:type c) in Q, Q also contains the
transitive closure of (?x rdf:type c) w.r.t. G, and for each triple of the form
(?x p ?y) that appears in the pattern Q, Q also contains the transitive closure
of (?x p ?y) w.r.t. G.
Taxonomical subsumption
Given two patterns Q1 and Q2 over ρdf dataset G, and their t-closures Q1
t and
Q2
t respectively, Q1 taxonomically subsumes (t-subsumes) Q2 iff there exists a
mapping σ such that a set of triple patterns and FILTER expressions from
σ(body(Q1
t )) is a subset of a set of triple patterns and FILTER expressions from
body(Q2
t ).
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 37
Input of the algorithm
a declarative bias (B) to limit a search space (i.e. classes and
properties to use) and maximal number of iterations
2 thresholds: for keeping good enough patterns and for refining best
patterns
choice from several quality measures to select for thresholds (e.g.
support on knowledge base)
beam search size
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 38
Example
B: classes: PeerReviewedPaper, JournalPaper, property: reviewedBy
1 Refine every pattern from the previous iteration by adding a single
restriction for a variable already existing in the pattern. E.g. for
patern {?x rdf:type :Paper.}, its refinements are:
▸ {?x rdf:type :Paper . ?x rdf:type :PeerReviewedPaper .}
▸ {?x rdf:type :Paper . ?x rdf:type :JournalPaper . }
▸ {?x rdf:type :Paper . ?x :reviewedBy ?y}
2 Evaluate patterns (with some quality measure as support on a data
set) and select only the best ones
3 Repeat steps 1-2 as long as there are patterns for refinement and
maximal number of iterations is not exceeded
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 39
Refinement operator ρ: uses trie data structure
ρ: (locally) finite and complete
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 40
Pattern based classification 1/2
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 41
Pattern based classification 2/2
We learn features that are optimized with regard to the (classification) task
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 42
Propositionalisation 1/2
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 43
Propositionalisation 2/2
In this way, learned features may be consumed by any out-of-the-shelf
’attribute-value’ classification algorithm
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 44
Comparative experiments on classification of semantic data
1/2
we considered published work with available results and datasets
(including ESWC 2008 best paper, ESWC 2012 best paper)
various types of methods: kernel methods, statistical relational
classifier, concept learning algorithms
we strictly followed the tasks, protocols and experimental setups of
the methods
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 45
Comparative experiments on classification of semantic data
2/2
For classification task Fr-ONT-Qu outperformed state-of-art approaches to
classification of Semantic Web data
(see: ”Pattern based feature construction in semantic data mining” by A.
Lawrynowicz, J. Potoniec, IJSWIS 10(1), 2014):
kernel methods Bloehdorn et al. (2007), Loesch et al. (ESWC 2012
best paper),
statistical relational classifier SPARQL-ML by Kiefer et al (ESWC
2008 best paper),
concept learning algorithms DL-FOIL by Fanizzi et al (2008),
DL-Learner cutting-edge CELOE variant by Lehmann (2009)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 46
What is RapidMiner? 1/2
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 47
What is RapidMiner? 2/2
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 48
RapidMiner XML based workflow representation
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 49
Creating (meta-)dataset for meta-mining
DMOP-based	
repository	of		
DM	processes	
(DMEX-DB)	
Dataset	for	training	
	meta-miner	
	85	mln		RDF	triples	
Baseline	
	DM	experiment	
set	
1581	RapidMiner		
executed	workflows	
Baseline	
datasets	
11	UCI	datasets	
Data	Characters6cs	Tool	(DCT)	
DMOP	ontology	
Transforma6on	
	to	RDF	
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 50
Propositionalisation
Workflow	pa*erns	
	
	
Dataset	
DMOP-based	RDF	
repository	of	DM	
processes	
?opex2!dmop:hasParameterSetting ?front1.!
?front0!dmop:executes rm:DM-Operator .!
?front0!dmop:implements ?front2 .!!!
?front2 a dmop:DM-Algorithm .
?front2 a dmop:InductionAlgorithm .!!!
?front2 a dmop:ModelingAlgorithm .!!!
?front2 a dmop:ClassificationModelingAlgorithm .!!!
?front2 a dmop:ClassificationTreeInductionAlgorithm .!}!
was mined when Fr-ONT-Qu traversed down the algorithm classes hierarchy specializing
variable ?front2. In this way, it is possible to abstract from the level of operators (algorithm
implementations) to the level of algorithms and their taxonomy. For instance, both rm:RM-
Decision_Tree and weka:Weka-J48 operators implement a classification tree induction
algorithm and one may generalize over it. The patterns containing class hierarchies provide
similar expressivity to this of patterns mined in so-called generalized association rule mining.
The following pattern covers only those workflows that contain ‘Decision Tree’ operator,
for which the parameter minimal size for split has value between 2 and 5.5:
Q2 = select distinct ?x where { Bd ∪
?opex2!dmop:executes ?front0 .!
?opex2!dmop:executes rm:RM-Decision_Tree .!
?opex2!dmop:hasParameterSetting ?front1.!
?front0!dmop:executes rm:DM-Operator .!
?front1!dmop:setsValueOf ?front2.!
?front1!dmop:hasValue ?front3.!
filter(2.000000 = xsd:double(?front3)  xsd:double(?front3) = 16.000000) .
?front2!dmop:hasParameterKey 'minimal_size_for_split'.!
?front1!dmop:hasValue ?front3.!
filter(2.000000 = xsd:double(?front3)  xsd:double(?front3) = 9.000000) .
?front1!dmop:hasValue ?front3.!
filter(2.000000 = xsd:double(?front3)  xsd:double(?front3) = 5.500000) . }
Dataset	characteris3cs	
…	
Features	
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 51
Semantic meta-mining results
McNemar’s test for pairs of classifiers performed with the null
hypothesis that a classifier built using dataset characteristics and a
mined pattern set has the same error rate as the baseline that used
dataset characteristics and only the names of the machine learning
DM operators
Test confirmed that classifiers trained using workflow patterns
performed significantly better (in terms of accuracy) than the baseline
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 52
Sharing: Standardization of DM/ML schemas
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 53
Evolution of the field of DM/ML ontologies
20092008 2011 2012
OntoDM
20142008
DMOP
ontologies/vocabularies
events
Experiment
Databases
platform
2010
ExposéDMWF
Data Mining
Ontology
Jamboree
(Slovenia)
2015
MEX
OpenML 2016
(Netherlands)
W3C
Machine
Learning
Schema
Community
Group
OpenML
platform
2016
ML Schema
Core
2013
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 54
OntoDM
Pance Panov, Larisa N. Soldatova, Saso Dzeroski: Ontology of core data mining
entities. Data Min. Knowl. Discov. 28(5-6): 1222-1265 (2014)
built in compliance to upper level ontologies BFO, OBI, IAO, modularized
incorporates structured data mining
Use case: generic, middle level ontology for ML; representing QSAR entities for
drug design, used by Eve Robot Scientist
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 55
DMOP: Data Mining Optimization Ontology
C. Maria Keet, Agnieszka Lawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong
Nguyen, Raul Palma, Robert Stevens, Melanie Hilario: The Data Mining OPtimization
Ontology. J. Web Sem. 32: 43-53 (2015)
development started in e-LICO EU FP7 project (2009-2012)
detailed algorithm internal characteristics (’qualities’)
Use case: meta-learning (’whitebox’), meta-mining, used to produce Intelligent
Discovery Assistant for RapidMiner
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 56
Expos´e
Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, Geoffrey Holmes:
Experiment databases - A new way to share, organize and learn from experiments.
Machine Learning 87(2): 127-158 (2012)
re-uses OntoDM (at top-level) and DMOP (at bottom level)
superseded by OpenML DB schema
Use case: experiment databases, ExpML markup
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 57
Early work towards aligning DM/ML ontologies (2010)
DMO Ontology Jamboree, Josef Stefan Institute, Slovenia
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 58
MEX vocabulary
Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck,
Markus Ackermann, Jens Lehmann: MEX vocabulary: a lightweight interchange format
for machine learning experiments. SEMANTICS 2015: 169-176
lightweight interchange format
maps to PROV
Use case: annotating ML experiments and interchanging ML metadata
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 59
How to make existing DM/ML ontologies compatible?
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 60
W3C Machine Learning Schema Community Group (2015)
https://www.w3.org/community/ml-schema/
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 61
OpenML, Lorentz Center, Netherlands (2016)
First draft of ML Schema Core https://github.com/ML-Schema/core
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 62
Sharing beyond DM/ML domain
Mapping DMOP to workflow ontologies (Research Objects, OPMW)
(ROHub hosted by Poznan Supercomputing and Networking Center)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 63
Semantic data mining: more information
Semantic data mining tutorial @ ECML/PKDD’2011
http://videolectures.net/ecmlpkdd2011_lavrac_vavpetic_mining/
peculiarities of the learning setting: Open World Assumption, what is a
”truly semantic” similarity measure?, ...
methods, applications, tools
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 64
Summary
semantic data mining: data mining with ontologies as
background/prior knowledge, most often from structured data
ontologies best if engineered with uses cases in mind
learning in description logics and clausal languages is hard; heuristics,
dealing with peculiarities
Fr-ONT-Qu semantic pattern mining algorithm: theorethical
properties, practical evaluation
use case: semantic meta-mining for constructing Intelligent Data
Mining Assistant
importance of interoperability (for scientific reproducibility, for
inter-domain applications)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 65
Acknowledgements
Polish National Science Center under the SONATA program
”ARISTOTELES: Methodology and algorithms for automatic revision of
ontologies in task based scenarios” (2014/13/D/ST6/02076) (2015-2018)
Foundation for Polish Science under the POMOST programme, cofinanced
from European Union, Regional Development Fund (POMOST/2013-7/8)
(2013-2015)
EU FP7 ICT-2007.4.4 (231519) ”e-LICO: An e-Laboratory for
Interdisciplinary Collaborative Research in Data Mining and Data-Intensive
Science” (2009-2012)
Fr-ONT-Qu, meta-mining experiments done jointly with Jedrzej Potoniec
Contributors to the development of DMOP and/or other e-LICO
infrastructure used in the research described in this presentation: Melanie
Hilario, C. Maria Keet, Claudia d’Amato, Huyen Do, Simon Fischer, Dragan
Gamberger, Lina Al-Jadir, Simon Jupp, Alexandros Kalousis, Joerg
Uwe-Kietz, Petra Kralj Novak, Babak Mougouie, Phong Nguyen, Raul
Palma, Floarea Serban, Robert Stevens, Anze Vavpetic, Jun Wang, Derry
Wijaya, Adam Woznica
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 66

Mais conteúdo relacionado

Destaque

Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
Atul Shridhar
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
toncho11
 
Use of ontologies in natural language processing
Use of ontologies in natural language processingUse of ontologies in natural language processing
Use of ontologies in natural language processing
ATHMAN HAJ-HAMOU
 

Destaque (20)

Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Dr. Ahmad, origin ontology of future scenario's idea, 3
Dr. Ahmad, origin ontology of future scenario's idea, 3Dr. Ahmad, origin ontology of future scenario's idea, 3
Dr. Ahmad, origin ontology of future scenario's idea, 3
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search results
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
A Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval TechniquesA Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval Techniques
 
Ontology Alignment using Linked Data
Ontology Alignment using Linked DataOntology Alignment using Linked Data
Ontology Alignment using Linked Data
 
In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Semantics And Search
Semantics And SearchSemantics And Search
Semantics And Search
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...Semantic security framework and context-aware role-based access control ontol...
Semantic security framework and context-aware role-based access control ontol...
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
Use of ontologies in natural language processing
Use of ontologies in natural language processingUse of ontologies in natural language processing
Use of ontologies in natural language processing
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and Refinement
 
13584 27 multimedia mining
13584 27 multimedia mining13584 27 multimedia mining
13584 27 multimedia mining
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 

Semelhante a Semantic data mining: an ontology based approach

Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Ingredients for Semantic Sensor Networks
Ingredients for Semantic Sensor NetworksIngredients for Semantic Sensor Networks
Ingredients for Semantic Sensor Networks
Oscar Corcho
 

Semelhante a Semantic data mining: an ontology based approach (20)

2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
Defensa.V11
Defensa.V11Defensa.V11
Defensa.V11
 
Extracting and analyzing online confessions
Extracting and analyzing online confessionsExtracting and analyzing online confessions
Extracting and analyzing online confessions
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Big Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learningBig Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learning
 
Ingredients for Semantic Sensor Networks
Ingredients for Semantic Sensor NetworksIngredients for Semantic Sensor Networks
Ingredients for Semantic Sensor Networks
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Artificial intelligence and its application
Artificial intelligence and its applicationArtificial intelligence and its application
Artificial intelligence and its application
 
ConQueSt
ConQueStConQueSt
ConQueSt
 
Dwdm
DwdmDwdm
Dwdm
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 

Mais de Agnieszka Ławrynowicz

Mais de Agnieszka Ławrynowicz (8)

CHIST-ERA 2019 - presentation of CAMIL (Poznan University of Technology)
CHIST-ERA 2019 - presentation of CAMIL (Poznan University of Technology)CHIST-ERA 2019 - presentation of CAMIL (Poznan University of Technology)
CHIST-ERA 2019 - presentation of CAMIL (Poznan University of Technology)
 
Ontologie w historyczno-geograficznych systemach informacyjnych
Ontologie w historyczno-geograficznych systemach informacyjnychOntologie w historyczno-geograficznych systemach informacyjnych
Ontologie w historyczno-geograficznych systemach informacyjnych
 
ML Schema: Machine Learning Schema
ML Schema: Machine Learning SchemaML Schema: Machine Learning Schema
ML Schema: Machine Learning Schema
 
Semantic Meta-Mining of Knowledge Discovery Processes
Semantic Meta-Mining of Knowledge Discovery ProcessesSemantic Meta-Mining of Knowledge Discovery Processes
Semantic Meta-Mining of Knowledge Discovery Processes
 
Hazardous Situation Ontology Design Pattern
Hazardous Situation Ontology Design Pattern Hazardous Situation Ontology Design Pattern
Hazardous Situation Ontology Design Pattern
 
Using Substitutive Itemset Mining Framework for Finding Synonymous Properties...
Using Substitutive Itemset Mining Framework for Finding Synonymous Properties...Using Substitutive Itemset Mining Framework for Finding Synonymous Properties...
Using Substitutive Itemset Mining Framework for Finding Synonymous Properties...
 
Data Mining OPtimization Ontology and its application to meta-mining of knowl...
Data Mining OPtimization Ontology and its application to meta-mining of knowl...Data Mining OPtimization Ontology and its application to meta-mining of knowl...
Data Mining OPtimization Ontology and its application to meta-mining of knowl...
 
ZTG 2013 Agnieszka Ławrynowicz
ZTG 2013 Agnieszka ŁawrynowiczZTG 2013 Agnieszka Ławrynowicz
ZTG 2013 Agnieszka Ławrynowicz
 

Último

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Último (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Semantic data mining: an ontology based approach

  • 1. Semantic Data Mining: an Ontology Based Approach Agnieszka Lawrynowicz Institute of Computing Science Poznan University of Technology April 12, 2016 Seminar of the Institute of Computing Science Poznan University of Technology Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 1
  • 2. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 2
  • 3. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 3
  • 4. Introduction: data mining Input: a data table, text documents, ... Output: a model, a pattern set DATA$MINING$ Model,$pa0erns$ data$ Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 4
  • 5. Introduction: using background knowledge in data mining Using background knowledge in data mining has been extensively researched hierarchy/taxonomy of attributes (Michalski et al., 1986, Srikant, Agrawal, 1995) Inductive Logic Programming (Muggleton, 1991, Lavrac and Dzeroski, 1994) relational learning (Quinlan, 1993, de Raedt, 2008) semantic data mining tutorial @ ECML/PKDD’2011 (Lavrac, Vavpetic, Lawrynowicz, Potoniec, Hilario, Kalousis) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 5
  • 6. Introduction: relational data mining Input: a relational database, a graph, a set of logical facts, ... Output: a model, a pattern set RELATIONAL) DATA)MINING) Model,)pa4erns) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 6
  • 7. Semantic data mining Input: a data table, text documents, Web pages, a relational database, a graph, a set of logical facts, ... one or more ontologies Output: a model, a pattern set SEMANTIC) DATA)MINING) Model,)pa3erns) Data) Ontologies) annota;ons) mappings) vocabulary)reBuse) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 7
  • 8. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 8
  • 9. Ontology in computer science “engineering artefact [...]“ (Guarino 98) “An ontology is a formal specification Á machine interpretation of a shared Á group of people, consensus conceptualization Á abstract model of phenomena, concepts of a domain of interest“ Á domain knowledge (Gruber 93, Studer 98) Ontology = formal specification of a terminological knowledge (most often from a particular domain) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 9
  • 10. Semantic Web layer cake Stos języków Sieci Semantycznej Języki modelowania ontologii Dane Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 10
  • 11. Ontologies + data = knowledge graph reviewer1 paper10 metaReviews PeerReviewedPaper MetaReviewer metaReviews reviews RDF RDFS rdf:type rdf:type rdfs:domain rdfs:range rdfs:subPropertyOf rdfs:subClassOf OWL owl:Restric>on rdfs:subClassOf Reviewer rdf:type owl:someValuesFrom owl:onProperty reviewedBy owl:inverseOf Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 11
  • 12. Logical meaning of OWL Description Logics, DLs = family of first order logic-based formalisms suitable for representing knowledge, especially terminologies, ontologies, underpinning the Web Ontology Language (OWL). Basic building blocks: concepts, roles, constructors, individuals Example TBox Atomic concept: Reviewer, Paper Roles: reviews, metaReviews, reviewedBy Constructors: ⊓, ∃ Axiom (concept definition): PeerReviewedPaper ≡ Paper ⊓ ∃reviewedBy.Reviewer Axiom (concept description ”each meta reviewer is a reviewer”): MetaReviewer ⊑ Reviewer ABox Fact assertion: metaReviews(reviewer1, paper10) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 12
  • 13. Outline Introduction to semantic data mining Ontology in computer science Semantic meta-mining ▸ Use Case: e-LICO Intelligent Discovery Assistant ▸ Background knowledge: Data Mining OPtimization Ontology ▸ DM method: Pattern discovery with Fr-ONT-Qu ▸ Sharing: Standardization of data mining and machine learning schemas Summary Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 13
  • 14. Overview of meta-learning Meta-learning: learning to learn application of machine learning techniques to meta-data about past machine learning experiments; the goal: to modify some aspect of the learning process to improve the performance of the resulting model; meta-mining: meta-learning applied to full data mining process Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 14
  • 15. Overview of the e-LICO system (EU FP7 2009-2012) .,+1B/0DF'4;)<'<!=1)/*'.0!<!*1<'1;!'?)BB!0!*1'=/D./*!*1<'/B'1;!'!"#$%&'+0=;)1!=1>0!'G()E>0!'7H'+*?' <;/I<';/I'1;!A')*1!0+=1'1/'+=;)!J!'1;!'><!0K<'L*/I,!?E!'?)<=/J!0A'E/+,F'' 4;!'!"#$%&')*B0+<10>=1>0!'G?!.)=1!?')*'1;!'B)E>0!'>*?!0'1;!'?+<;!?',)*!H')<'1;!'D!+*<'MA'I;)=;'1;!' ?+1+"D)*)*E'.,+1B/0D')<'?!,)J!0!?'1/'<=)!*1)<1<F'4;!')**/J+1)J!'=/0!''/B'1;!'!"#$%&'.,+1B/0D')<'1;!' !"#$%%&'$"#( )&*+,-$./( 0**&*#1"#' G$NOP' +M/J!' 1;!' ?+<;!?' ,)*!H' I)1;' )1<' .,+**!0' +*?' D!1+",!+0*!0F' Q/I!J!0P'1/'?!,)J!0'1;!'?+1+"D)*)*E'.,+1B/0D'1/')1<'<=)!*1)<1'><!0<P'1;!0!'+0!'<!J!0+,'/1;!0'<!0J)=!<' +*?'=/D./*!*1<F'()E>0!'7'<;/I<'+*'/J!0J)!I'/B'!"#$%&R<'=/D./*!*1<'+*?';/I'1;!A')*1!0+=1'I)1;' !+=;'/1;!0F' ' ()E>0!'7F'&J!0J)!I'/B'1;!'!"#$%&'<A<1!DF'' 4;!0!'+0!'1I/'><!0"B+=)*E'=/D./*!*1<'B/0'1;!'!"#$%&'.,+1B/0DS'1;!<!'+,,/I'<=)!*1)<1<'1/'+==!<<'?+1+" D)*)*E' /.!0+1/0<' +*?T/0' /1;!0' ?+1+' .0/=!<<)*E' <!0J)=!<P' 1/' =/D./<!' 1;!D' )*1/' I/0LB,/I<' +*?' !U!=>1!' 1;!DP' =/,,!=1)*E' 1;!' 0!<>,1<' B/0' )*1!0.0!1+1)/*' /0' B>01;!0' +*+,A<)<F' 4;!<!' 1I/' =!*10+,' )*B0+<10>=1>0!'=/D./*!*1<'+0!V'Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 15
  • 16. Background knowledge: DM OPtimization Ontology Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 16
  • 17. Data Mining OPtimization Ontology (DMOP) the primary goal of DMOP is to support all decision-making steps that determine the outcome of the data mining process; development started in EU FP7 project e-LICO (2009-2012); DMOP v5.5: 723 classes, 111 properties, 4291 axioms; highly axiomatized; represented in Web Ontology Language (OWL 2); Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 17
  • 18. Competency questions ”Given a data mining task/data set, which of the valid or applicable workflows/algorithms will yield optimal results (or at least better results than the others)?” ”Given a set of candidate workflows/algorithms for a given task/data set, which data set/workflow/algorithm characteristics should be taken into account in order to select the most appropriate one?” and others more fine-grained, e.g.: ”Which induction algorithms should I use (or avoid) when my dataset has many more variables than instances?” Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 18
  • 19. Architecture of DMOP knowledge base and its satellite triple stores TBox% DMOP% ABox% Operator%DB% DMEX(DB1%%%%DMEX(DB2%%…%%%DMEX(DBk% OWL2% RDF% Triple% Store% Formal%Conceptual%Framework%% of%Data%Mining%Domain% Accepted%Knowledge%of%DM% Tasks,%Algorithms,%Operators%% Specific%DM%ApplicaFons% Datasets,%Workflows,%Results% MetaHminer’s%training%data% MetaHminer’s%prior%% DM%knowledge% Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 19
  • 20. The core concepts of DMOP (simplified) Fig. 1. The core concepts of DMOP. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 20
  • 21. DMOP: algorithm representation Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 21
  • 22. Alignment of DMOP with DOLCE 1/3 Two main reasons to align DMOP with a foundational ontology: considerations about attributes and data properties; extant non-foundational ontology solutions were partial re-inventions of how they are treated in a foundational ontology; reuse of the ontology’s object properties; Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 22
  • 23. Alignment of DMOP with DOLCE 2/3 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 23
  • 24. Alignment of DMOP with DOLCE 3/3 Perdurant: DM-Experiment and DM-Operation are subclasses of dolce:process; Endurant: most DM classes, such as algorithm, software, strategy, task, and optimization problem, are subclasses of dolce:non-physical-endurant; Quality: characteristics and parameters of DM entities made subclasses of dolce:abstract-quality; Abstract: for identifying discrete values, classes added as subclasses of dolce:abstract-region; object properties: DMOP reuses mainly DOLCE’s parthood, quality, and quale relations; each of the four DOLCE main branches have been used. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 24
  • 25. Qualities and attributes 1/3 How to handle ’attributes’ in OWL ontologies, and, in a broader context, measurements? easy way: attribute is a binary functional relation between a class and a datatype Elephant ⊑ =1 hasWeight.integer Elephant ⊑ =1 hasWeightPrecise.real Elephant ⊑ =1 hasWeightImperial.integer (in lbs) building into one’s ontology application decisions about how to store the data (and in which unit it is) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 25
  • 26. Qualities and attributes 2/3 How to handle ’attributes’ in OWL ontologies, and, in a broader context, measurements? more elaborate way: unfold the notion of an object’s property (e.g. weight) from one attribute/OWL data property into at least two properties: ▸ one OWL object property from the object to the ’reified attribute’ (“quality property” represented as an OWL class) ▸ and another property to the value(s) favoured in foundational ontologies; solves the problem of non-reusability of the ’attribute’ and prevents duplication of data properties; measurements for DMOP more alike values for parameters; Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 26
  • 27. Qualities and attributes 3/3 ModelingAlgorithm ⊑ =1 dolce:has-quality.LearningPolicy LearningPolicy ⊑ =1 dolce:has-quale.Eager-Lazy Eager-Lazy ⊑ ≤ 1 hasDataValue.anyType LearningPolicy is a subclass of dolce:quality Eager-Lazy is a subclass of dolce:abstract-region In this way, the ontology can be linked to many different applications, who even may use different data types, yet still agree on the meaning of the characteristics and parameters (’attributes’) of the algorithms, tasks, and other DM endurants. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 27
  • 28. Meta-modeling in DMOP 1/4 only processes (executions of workflows) and operations (executions of operators) consume inputs and produce outputs DM algorithms (as well as operators and workflows) can only specify the type of input or output inputs and outputs (DM-Dataset and DM-Hypothesis class hierarchy, respectively) are modeled as subclasses of IO-Object class Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 28
  • 29. Meta-modeling in DMOP 2/4 DM algorithms: classes or individuals? Individuals. Problem: expressing types of inputs/outputs associated with algorithm ”C4.5 specifiesInputClass CategoricalLabeledDataSet” Individual Class (instance of DM-Algorithm) (subclass of DM-Hypothesis) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 29
  • 30. Meta-modeling in DMOP 3/4 Initial solution: one artificial class per each single algorithm with a single instance corresponding to this particular algorithm Problem: hasInput, hasOutput, specifiesInputClass, specifiesOutputClass—assigned a common range—IO-Object ”C4.5 specifiesInputClass Iris” ? Individual Individual (instance of DM-Algorithm) (instance of DM-Hypothesis) Iris is a concrete dataset. Clearly, any DM algorithm is not designed to handle only a particular dataset. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 30
  • 31. Meta-modeling in DMOP 4/4 Final solution: weak form of punning available in OWL 2 IO-Class: meta-class—the class of all classes of input and output objects ”C4.5 specifiesInputClass CategoricalLabeledDataSet” Individual Individual (instance of DM-Algorithm) (instance of IO-Class) ”DM-Process hasInput some CategoricalLabeledDataSet” Class Class (subclass of dolce:process) (subclass of IO-Object) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 31
  • 32. DM method: Fr-ONT-Qu semantic pattern miner Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 32
  • 33. Data mining as search learning in description logics (DLs) and other relational data can be seen as search in space of concepts / RDF triples / clauses / (conjunctive / SPARQL) queries, ... it is possible to impose ordering on this search space, e.g., using subsumption as natural quasi-order and generality relation between DL concepts ▸ if D ⊑ C then C covers all instances that are covered by D refinement operators may be applied to traverse the space by computing a set of specializations (resp. generalizations) of a concept / RDF triples/ clauses/ (conjunctive / SPARQL) queries, ... Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 33
  • 34. Properties of refinement operators Consider downward refinement operator ρ and by C ρ D denote a refinement chain from a DL concept C to D complete: each point in lattice is reachable (for D ⊑ C there exists E such that E ≡ D and a refinement chain C ρ ... ρ E weakly complete: for any concept C with C ⊑ ⊺, concept E with E ≡ C can be reached from ⊺ finite: finite for any concept redundant: there exist two different refinement chains from C to D proper: C ρ D implies C /≡ D ideal = complete + proper + finite Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 34
  • 35. Learning in DLs and in clausal languages is hard Lehmann Hitzler (ILP 2007, MLJ 2010) proved for many DLs and (Nienhuys-Cheng Wolf, 1997) for clausal languages that no ideal refinement operator exists. Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 35
  • 36. Fr-ONT-Qu algorithm for mining patterns in RDF(s) data patterns expressed as SPARQL queries generality relation: taxonomical subsumption consists of: a refinement operator ρ and a strategy to select best patterns for further refinement Example SPARQL query head SELECT ?x WHERE { body ?x rdf:type :Paper . ?x rdf:type :PeerReviewedPaper . ?x :reviewedBy ?y } Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 36
  • 37. New generality relation: taxonomical subsumption Taxonomically closed pattern A pattern Q is taxonomically closed, or t-closed, w.r.t. the background knowledge G if for each triple of the form (?x rdf:type c) in Q, Q also contains the transitive closure of (?x rdf:type c) w.r.t. G, and for each triple of the form (?x p ?y) that appears in the pattern Q, Q also contains the transitive closure of (?x p ?y) w.r.t. G. Taxonomical subsumption Given two patterns Q1 and Q2 over ρdf dataset G, and their t-closures Q1 t and Q2 t respectively, Q1 taxonomically subsumes (t-subsumes) Q2 iff there exists a mapping σ such that a set of triple patterns and FILTER expressions from σ(body(Q1 t )) is a subset of a set of triple patterns and FILTER expressions from body(Q2 t ). Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 37
  • 38. Input of the algorithm a declarative bias (B) to limit a search space (i.e. classes and properties to use) and maximal number of iterations 2 thresholds: for keeping good enough patterns and for refining best patterns choice from several quality measures to select for thresholds (e.g. support on knowledge base) beam search size Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 38
  • 39. Example B: classes: PeerReviewedPaper, JournalPaper, property: reviewedBy 1 Refine every pattern from the previous iteration by adding a single restriction for a variable already existing in the pattern. E.g. for patern {?x rdf:type :Paper.}, its refinements are: ▸ {?x rdf:type :Paper . ?x rdf:type :PeerReviewedPaper .} ▸ {?x rdf:type :Paper . ?x rdf:type :JournalPaper . } ▸ {?x rdf:type :Paper . ?x :reviewedBy ?y} 2 Evaluate patterns (with some quality measure as support on a data set) and select only the best ones 3 Repeat steps 1-2 as long as there are patterns for refinement and maximal number of iterations is not exceeded Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 39
  • 40. Refinement operator ρ: uses trie data structure ρ: (locally) finite and complete Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 40
  • 41. Pattern based classification 1/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 41
  • 42. Pattern based classification 2/2 We learn features that are optimized with regard to the (classification) task Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 42
  • 43. Propositionalisation 1/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 43
  • 44. Propositionalisation 2/2 In this way, learned features may be consumed by any out-of-the-shelf ’attribute-value’ classification algorithm Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 44
  • 45. Comparative experiments on classification of semantic data 1/2 we considered published work with available results and datasets (including ESWC 2008 best paper, ESWC 2012 best paper) various types of methods: kernel methods, statistical relational classifier, concept learning algorithms we strictly followed the tasks, protocols and experimental setups of the methods Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 45
  • 46. Comparative experiments on classification of semantic data 2/2 For classification task Fr-ONT-Qu outperformed state-of-art approaches to classification of Semantic Web data (see: ”Pattern based feature construction in semantic data mining” by A. Lawrynowicz, J. Potoniec, IJSWIS 10(1), 2014): kernel methods Bloehdorn et al. (2007), Loesch et al. (ESWC 2012 best paper), statistical relational classifier SPARQL-ML by Kiefer et al (ESWC 2008 best paper), concept learning algorithms DL-FOIL by Fanizzi et al (2008), DL-Learner cutting-edge CELOE variant by Lehmann (2009) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 46
  • 47. What is RapidMiner? 1/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 47
  • 48. What is RapidMiner? 2/2 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 48
  • 49. RapidMiner XML based workflow representation Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 49
  • 50. Creating (meta-)dataset for meta-mining DMOP-based repository of DM processes (DMEX-DB) Dataset for training meta-miner 85 mln RDF triples Baseline DM experiment set 1581 RapidMiner executed workflows Baseline datasets 11 UCI datasets Data Characters6cs Tool (DCT) DMOP ontology Transforma6on to RDF Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 50
  • 51. Propositionalisation Workflow pa*erns Dataset DMOP-based RDF repository of DM processes ?opex2!dmop:hasParameterSetting ?front1.! ?front0!dmop:executes rm:DM-Operator .! ?front0!dmop:implements ?front2 .!!! ?front2 a dmop:DM-Algorithm . ?front2 a dmop:InductionAlgorithm .!!! ?front2 a dmop:ModelingAlgorithm .!!! ?front2 a dmop:ClassificationModelingAlgorithm .!!! ?front2 a dmop:ClassificationTreeInductionAlgorithm .!}! was mined when Fr-ONT-Qu traversed down the algorithm classes hierarchy specializing variable ?front2. In this way, it is possible to abstract from the level of operators (algorithm implementations) to the level of algorithms and their taxonomy. For instance, both rm:RM- Decision_Tree and weka:Weka-J48 operators implement a classification tree induction algorithm and one may generalize over it. The patterns containing class hierarchies provide similar expressivity to this of patterns mined in so-called generalized association rule mining. The following pattern covers only those workflows that contain ‘Decision Tree’ operator, for which the parameter minimal size for split has value between 2 and 5.5: Q2 = select distinct ?x where { Bd ∪ ?opex2!dmop:executes ?front0 .! ?opex2!dmop:executes rm:RM-Decision_Tree .! ?opex2!dmop:hasParameterSetting ?front1.! ?front0!dmop:executes rm:DM-Operator .! ?front1!dmop:setsValueOf ?front2.! ?front1!dmop:hasValue ?front3.! filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 16.000000) . ?front2!dmop:hasParameterKey 'minimal_size_for_split'.! ?front1!dmop:hasValue ?front3.! filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 9.000000) . ?front1!dmop:hasValue ?front3.! filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 5.500000) . } Dataset characteris3cs … Features Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 51
  • 52. Semantic meta-mining results McNemar’s test for pairs of classifiers performed with the null hypothesis that a classifier built using dataset characteristics and a mined pattern set has the same error rate as the baseline that used dataset characteristics and only the names of the machine learning DM operators Test confirmed that classifiers trained using workflow patterns performed significantly better (in terms of accuracy) than the baseline Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 52
  • 53. Sharing: Standardization of DM/ML schemas Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 53
  • 54. Evolution of the field of DM/ML ontologies 20092008 2011 2012 OntoDM 20142008 DMOP ontologies/vocabularies events Experiment Databases platform 2010 ExposéDMWF Data Mining Ontology Jamboree (Slovenia) 2015 MEX OpenML 2016 (Netherlands) W3C Machine Learning Schema Community Group OpenML platform 2016 ML Schema Core 2013 Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 54
  • 55. OntoDM Pance Panov, Larisa N. Soldatova, Saso Dzeroski: Ontology of core data mining entities. Data Min. Knowl. Discov. 28(5-6): 1222-1265 (2014) built in compliance to upper level ontologies BFO, OBI, IAO, modularized incorporates structured data mining Use case: generic, middle level ontology for ML; representing QSAR entities for drug design, used by Eve Robot Scientist Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 55
  • 56. DMOP: Data Mining Optimization Ontology C. Maria Keet, Agnieszka Lawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong Nguyen, Raul Palma, Robert Stevens, Melanie Hilario: The Data Mining OPtimization Ontology. J. Web Sem. 32: 43-53 (2015) development started in e-LICO EU FP7 project (2009-2012) detailed algorithm internal characteristics (’qualities’) Use case: meta-learning (’whitebox’), meta-mining, used to produce Intelligent Discovery Assistant for RapidMiner Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 56
  • 57. Expos´e Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, Geoffrey Holmes: Experiment databases - A new way to share, organize and learn from experiments. Machine Learning 87(2): 127-158 (2012) re-uses OntoDM (at top-level) and DMOP (at bottom level) superseded by OpenML DB schema Use case: experiment databases, ExpML markup Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 57
  • 58. Early work towards aligning DM/ML ontologies (2010) DMO Ontology Jamboree, Josef Stefan Institute, Slovenia Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 58
  • 59. MEX vocabulary Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, Jens Lehmann: MEX vocabulary: a lightweight interchange format for machine learning experiments. SEMANTICS 2015: 169-176 lightweight interchange format maps to PROV Use case: annotating ML experiments and interchanging ML metadata Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 59
  • 60. How to make existing DM/ML ontologies compatible? Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 60
  • 61. W3C Machine Learning Schema Community Group (2015) https://www.w3.org/community/ml-schema/ Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 61
  • 62. OpenML, Lorentz Center, Netherlands (2016) First draft of ML Schema Core https://github.com/ML-Schema/core Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 62
  • 63. Sharing beyond DM/ML domain Mapping DMOP to workflow ontologies (Research Objects, OPMW) (ROHub hosted by Poznan Supercomputing and Networking Center) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 63
  • 64. Semantic data mining: more information Semantic data mining tutorial @ ECML/PKDD’2011 http://videolectures.net/ecmlpkdd2011_lavrac_vavpetic_mining/ peculiarities of the learning setting: Open World Assumption, what is a ”truly semantic” similarity measure?, ... methods, applications, tools Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 64
  • 65. Summary semantic data mining: data mining with ontologies as background/prior knowledge, most often from structured data ontologies best if engineered with uses cases in mind learning in description logics and clausal languages is hard; heuristics, dealing with peculiarities Fr-ONT-Qu semantic pattern mining algorithm: theorethical properties, practical evaluation use case: semantic meta-mining for constructing Intelligent Data Mining Assistant importance of interoperability (for scientific reproducibility, for inter-domain applications) Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 65
  • 66. Acknowledgements Polish National Science Center under the SONATA program ”ARISTOTELES: Methodology and algorithms for automatic revision of ontologies in task based scenarios” (2014/13/D/ST6/02076) (2015-2018) Foundation for Polish Science under the POMOST programme, cofinanced from European Union, Regional Development Fund (POMOST/2013-7/8) (2013-2015) EU FP7 ICT-2007.4.4 (231519) ”e-LICO: An e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Science” (2009-2012) Fr-ONT-Qu, meta-mining experiments done jointly with Jedrzej Potoniec Contributors to the development of DMOP and/or other e-LICO infrastructure used in the research described in this presentation: Melanie Hilario, C. Maria Keet, Claudia d’Amato, Huyen Do, Simon Fischer, Dragan Gamberger, Lina Al-Jadir, Simon Jupp, Alexandros Kalousis, Joerg Uwe-Kietz, Petra Kralj Novak, Babak Mougouie, Phong Nguyen, Raul Palma, Floarea Serban, Robert Stevens, Anze Vavpetic, Jun Wang, Derry Wijaya, Adam Woznica Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 66