Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Semantic data mining: an ontology based approach
1. Semantic Data Mining: an Ontology Based Approach
Agnieszka Lawrynowicz
Institute of Computing Science
Poznan University of Technology
April 12, 2016
Seminar of the Institute of Computing Science
Poznan University of Technology
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 1
2. Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 2
3. Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 3
4. Introduction: data mining
Input: a data table, text documents, ...
Output: a model, a pattern set
DATA$MINING$
Model,$pa0erns$
data$
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 4
5. Introduction: using background knowledge in data mining
Using background knowledge in data mining has been extensively
researched
hierarchy/taxonomy of attributes (Michalski et al., 1986, Srikant,
Agrawal, 1995)
Inductive Logic Programming (Muggleton, 1991, Lavrac and
Dzeroski, 1994)
relational learning (Quinlan, 1993, de Raedt, 2008)
semantic data mining tutorial @ ECML/PKDD’2011 (Lavrac,
Vavpetic, Lawrynowicz, Potoniec, Hilario, Kalousis)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 5
6. Introduction: relational data mining
Input: a relational database, a graph, a set of logical facts, ...
Output: a model, a pattern set
RELATIONAL)
DATA)MINING)
Model,)pa4erns)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 6
7. Semantic data mining
Input:
a data table, text documents, Web pages, a relational database, a
graph, a set of logical facts, ...
one or more ontologies
Output: a model, a pattern set
SEMANTIC)
DATA)MINING)
Model,)pa3erns)
Data)
Ontologies)
annota;ons)
mappings)
vocabulary)reBuse)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 7
8. Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 8
9. Ontology in computer science
“engineering artefact [...]“ (Guarino 98)
“An ontology is a
formal specification Á machine interpretation
of a shared Á group of people, consensus
conceptualization Á abstract model of phenomena, concepts
of a domain of interest“ Á domain knowledge
(Gruber 93, Studer 98)
Ontology = formal specification of a terminological knowledge (most often
from a particular domain)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 9
10. Semantic Web layer cake
Stos języków Sieci Semantycznej
Języki modelowania
ontologii
Dane
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 10
12. Logical meaning of OWL
Description Logics, DLs = family of first order logic-based formalisms
suitable for representing knowledge, especially terminologies, ontologies,
underpinning the Web Ontology Language (OWL).
Basic building blocks: concepts, roles, constructors, individuals
Example
TBox
Atomic concept: Reviewer, Paper
Roles: reviews, metaReviews, reviewedBy
Constructors: ⊓, ∃
Axiom (concept definition):
PeerReviewedPaper ≡ Paper ⊓ ∃reviewedBy.Reviewer
Axiom (concept description ”each meta reviewer is a reviewer”):
MetaReviewer ⊑ Reviewer
ABox
Fact assertion: metaReviews(reviewer1, paper10)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 12
13. Outline
Introduction to semantic data mining
Ontology in computer science
Semantic meta-mining
▸ Use Case: e-LICO Intelligent Discovery Assistant
▸ Background knowledge: Data Mining OPtimization Ontology
▸ DM method: Pattern discovery with Fr-ONT-Qu
▸ Sharing: Standardization of data mining and machine learning schemas
Summary
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 13
14. Overview of meta-learning
Meta-learning: learning to learn
application of machine learning techniques to meta-data about past
machine learning experiments;
the goal: to modify some aspect of the learning process to improve
the performance of the resulting model;
meta-mining: meta-learning applied to full data mining process
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 14
16. Background knowledge: DM OPtimization Ontology
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 16
17. Data Mining OPtimization Ontology (DMOP)
the primary goal of DMOP is to support all decision-making steps
that determine the outcome of the data mining process;
development started in EU FP7 project e-LICO (2009-2012);
DMOP v5.5: 723 classes, 111 properties, 4291 axioms;
highly axiomatized;
represented in Web Ontology Language (OWL 2);
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 17
18. Competency questions
”Given a data mining task/data set, which of the valid or applicable
workflows/algorithms will yield optimal results (or at least better results
than the others)?”
”Given a set of candidate workflows/algorithms for a given task/data
set, which data set/workflow/algorithm characteristics should be
taken into account in order to select the most appropriate one?”
and others more fine-grained, e.g.:
”Which induction algorithms should I use (or avoid) when my dataset
has many more variables than instances?”
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 18
19. Architecture of DMOP knowledge base and its satellite
triple stores
TBox%
DMOP%
ABox%
Operator%DB%
DMEX(DB1%%%%DMEX(DB2%%…%%%DMEX(DBk%
OWL2%
RDF%
Triple%
Store%
Formal%Conceptual%Framework%%
of%Data%Mining%Domain%
Accepted%Knowledge%of%DM%
Tasks,%Algorithms,%Operators%%
Specific%DM%ApplicaFons%
Datasets,%Workflows,%Results%
MetaHminer’s%training%data%
MetaHminer’s%prior%%
DM%knowledge%
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 19
20. The core concepts of DMOP (simplified)
Fig. 1. The core concepts of DMOP.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 20
22. Alignment of DMOP with DOLCE 1/3
Two main reasons to align DMOP with a foundational ontology:
considerations about attributes and data properties; extant
non-foundational ontology solutions were partial re-inventions of how
they are treated in a foundational ontology;
reuse of the ontology’s object properties;
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 22
23. Alignment of DMOP with DOLCE 2/3
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 23
24. Alignment of DMOP with DOLCE 3/3
Perdurant: DM-Experiment and DM-Operation are subclasses of
dolce:process;
Endurant: most DM classes, such as algorithm, software, strategy,
task, and optimization problem, are subclasses of
dolce:non-physical-endurant;
Quality: characteristics and parameters of DM entities made
subclasses of dolce:abstract-quality;
Abstract: for identifying discrete values, classes added as subclasses
of dolce:abstract-region;
object properties: DMOP reuses mainly DOLCE’s parthood, quality,
and quale relations;
each of the four DOLCE main branches have been used.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 24
25. Qualities and attributes 1/3
How to handle ’attributes’ in OWL ontologies, and, in a broader context,
measurements?
easy way: attribute is a binary functional relation between a class and
a datatype
Elephant ⊑ =1 hasWeight.integer
Elephant ⊑ =1 hasWeightPrecise.real
Elephant ⊑ =1 hasWeightImperial.integer (in lbs)
building into one’s ontology application decisions about how to store
the data (and in which unit it is)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 25
26. Qualities and attributes 2/3
How to handle ’attributes’ in OWL ontologies, and, in a broader context,
measurements?
more elaborate way: unfold the notion of an object’s property (e.g.
weight) from one attribute/OWL data property into at least two
properties:
▸ one OWL object property from the object to the ’reified attribute’
(“quality property” represented as an OWL class)
▸ and another property to the value(s)
favoured in foundational ontologies;
solves the problem of non-reusability of the ’attribute’ and prevents
duplication of data properties;
measurements for DMOP more alike values for parameters;
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 26
27. Qualities and attributes 3/3
ModelingAlgorithm ⊑ =1 dolce:has-quality.LearningPolicy
LearningPolicy ⊑ =1 dolce:has-quale.Eager-Lazy
Eager-Lazy ⊑ ≤ 1 hasDataValue.anyType
LearningPolicy is a subclass of dolce:quality
Eager-Lazy is a subclass of dolce:abstract-region
In this way, the ontology can be linked to many different applications, who
even may use different data types, yet still agree on the meaning of the
characteristics and parameters (’attributes’) of the algorithms, tasks, and
other DM endurants.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 27
28. Meta-modeling in DMOP 1/4
only processes (executions of workflows) and operations (executions
of operators) consume inputs and produce outputs
DM algorithms (as well as operators and workflows) can only specify
the type of input or output
inputs and outputs (DM-Dataset and DM-Hypothesis class hierarchy,
respectively) are modeled as subclasses of IO-Object class
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 28
29. Meta-modeling in DMOP 2/4
DM algorithms: classes or individuals? Individuals.
Problem: expressing types of inputs/outputs associated with
algorithm
”C4.5 specifiesInputClass CategoricalLabeledDataSet”
Individual Class
(instance of DM-Algorithm) (subclass of DM-Hypothesis)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 29
30. Meta-modeling in DMOP 3/4
Initial solution: one artificial class per each single algorithm with a
single instance corresponding to this particular algorithm
Problem: hasInput, hasOutput, specifiesInputClass,
specifiesOutputClass—assigned a common range—IO-Object
”C4.5 specifiesInputClass Iris” ?
Individual Individual
(instance of DM-Algorithm) (instance of DM-Hypothesis)
Iris is a concrete dataset. Clearly, any DM algorithm is not designed
to handle only a particular dataset.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 30
31. Meta-modeling in DMOP 4/4
Final solution: weak form of punning available in OWL 2
IO-Class: meta-class—the class of all classes of input and output
objects
”C4.5 specifiesInputClass CategoricalLabeledDataSet”
Individual Individual
(instance of DM-Algorithm) (instance of IO-Class)
”DM-Process hasInput some CategoricalLabeledDataSet”
Class Class
(subclass of dolce:process) (subclass of IO-Object)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 31
32. DM method: Fr-ONT-Qu semantic pattern miner
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 32
33. Data mining as search
learning in description logics (DLs) and other relational data can be
seen as search in space of concepts / RDF triples / clauses /
(conjunctive / SPARQL) queries, ...
it is possible to impose ordering on this search space, e.g., using
subsumption as natural quasi-order and generality relation between
DL concepts
▸ if D ⊑ C then C covers all instances that are covered by D
refinement operators may be applied to traverse the space by
computing a set of specializations (resp. generalizations) of a concept
/ RDF triples/ clauses/ (conjunctive / SPARQL) queries, ...
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 33
34. Properties of refinement operators
Consider downward refinement operator ρ and by C ρ D denote a
refinement chain from a DL concept C to D
complete: each point in lattice is reachable (for D ⊑ C there exists E
such that E ≡ D and a refinement chain C ρ ... ρ E
weakly complete: for any concept C with C ⊑ ⊺, concept E with
E ≡ C can be reached from ⊺
finite: finite for any concept
redundant: there exist two different refinement chains from C to D
proper: C ρ D implies C /≡ D
ideal = complete + proper + finite
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 34
35. Learning in DLs and in clausal languages is hard
Lehmann Hitzler (ILP 2007, MLJ 2010) proved for many DLs and
(Nienhuys-Cheng Wolf, 1997) for clausal languages that no ideal
refinement operator exists.
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 35
36. Fr-ONT-Qu
algorithm for mining patterns in RDF(s) data
patterns expressed as SPARQL queries
generality relation: taxonomical subsumption
consists of: a refinement operator ρ and a strategy to select best
patterns for further refinement
Example SPARQL query
head SELECT ?x WHERE {
body ?x rdf:type :Paper .
?x rdf:type :PeerReviewedPaper .
?x :reviewedBy ?y
}
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 36
37. New generality relation: taxonomical subsumption
Taxonomically closed pattern
A pattern Q is taxonomically closed, or t-closed, w.r.t. the background knowledge
G if for each triple of the form (?x rdf:type c) in Q, Q also contains the
transitive closure of (?x rdf:type c) w.r.t. G, and for each triple of the form
(?x p ?y) that appears in the pattern Q, Q also contains the transitive closure
of (?x p ?y) w.r.t. G.
Taxonomical subsumption
Given two patterns Q1 and Q2 over ρdf dataset G, and their t-closures Q1
t and
Q2
t respectively, Q1 taxonomically subsumes (t-subsumes) Q2 iff there exists a
mapping σ such that a set of triple patterns and FILTER expressions from
σ(body(Q1
t )) is a subset of a set of triple patterns and FILTER expressions from
body(Q2
t ).
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 37
38. Input of the algorithm
a declarative bias (B) to limit a search space (i.e. classes and
properties to use) and maximal number of iterations
2 thresholds: for keeping good enough patterns and for refining best
patterns
choice from several quality measures to select for thresholds (e.g.
support on knowledge base)
beam search size
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 38
39. Example
B: classes: PeerReviewedPaper, JournalPaper, property: reviewedBy
1 Refine every pattern from the previous iteration by adding a single
restriction for a variable already existing in the pattern. E.g. for
patern {?x rdf:type :Paper.}, its refinements are:
▸ {?x rdf:type :Paper . ?x rdf:type :PeerReviewedPaper .}
▸ {?x rdf:type :Paper . ?x rdf:type :JournalPaper . }
▸ {?x rdf:type :Paper . ?x :reviewedBy ?y}
2 Evaluate patterns (with some quality measure as support on a data
set) and select only the best ones
3 Repeat steps 1-2 as long as there are patterns for refinement and
maximal number of iterations is not exceeded
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 39
40. Refinement operator ρ: uses trie data structure
ρ: (locally) finite and complete
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 40
42. Pattern based classification 2/2
We learn features that are optimized with regard to the (classification) task
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 42
44. Propositionalisation 2/2
In this way, learned features may be consumed by any out-of-the-shelf
’attribute-value’ classification algorithm
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 44
45. Comparative experiments on classification of semantic data
1/2
we considered published work with available results and datasets
(including ESWC 2008 best paper, ESWC 2012 best paper)
various types of methods: kernel methods, statistical relational
classifier, concept learning algorithms
we strictly followed the tasks, protocols and experimental setups of
the methods
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 45
46. Comparative experiments on classification of semantic data
2/2
For classification task Fr-ONT-Qu outperformed state-of-art approaches to
classification of Semantic Web data
(see: ”Pattern based feature construction in semantic data mining” by A.
Lawrynowicz, J. Potoniec, IJSWIS 10(1), 2014):
kernel methods Bloehdorn et al. (2007), Loesch et al. (ESWC 2012
best paper),
statistical relational classifier SPARQL-ML by Kiefer et al (ESWC
2008 best paper),
concept learning algorithms DL-FOIL by Fanizzi et al (2008),
DL-Learner cutting-edge CELOE variant by Lehmann (2009)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 46
47. What is RapidMiner? 1/2
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 47
48. What is RapidMiner? 2/2
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 48
49. RapidMiner XML based workflow representation
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 49
50. Creating (meta-)dataset for meta-mining
DMOP-based
repository of
DM processes
(DMEX-DB)
Dataset for training
meta-miner
85 mln RDF triples
Baseline
DM experiment
set
1581 RapidMiner
executed workflows
Baseline
datasets
11 UCI datasets
Data Characters6cs Tool (DCT)
DMOP ontology
Transforma6on
to RDF
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 50
51. Propositionalisation
Workflow pa*erns
Dataset
DMOP-based RDF
repository of DM
processes
?opex2!dmop:hasParameterSetting ?front1.!
?front0!dmop:executes rm:DM-Operator .!
?front0!dmop:implements ?front2 .!!!
?front2 a dmop:DM-Algorithm .
?front2 a dmop:InductionAlgorithm .!!!
?front2 a dmop:ModelingAlgorithm .!!!
?front2 a dmop:ClassificationModelingAlgorithm .!!!
?front2 a dmop:ClassificationTreeInductionAlgorithm .!}!
was mined when Fr-ONT-Qu traversed down the algorithm classes hierarchy specializing
variable ?front2. In this way, it is possible to abstract from the level of operators (algorithm
implementations) to the level of algorithms and their taxonomy. For instance, both rm:RM-
Decision_Tree and weka:Weka-J48 operators implement a classification tree induction
algorithm and one may generalize over it. The patterns containing class hierarchies provide
similar expressivity to this of patterns mined in so-called generalized association rule mining.
The following pattern covers only those workflows that contain ‘Decision Tree’ operator,
for which the parameter minimal size for split has value between 2 and 5.5:
Q2 = select distinct ?x where { Bd ∪
?opex2!dmop:executes ?front0 .!
?opex2!dmop:executes rm:RM-Decision_Tree .!
?opex2!dmop:hasParameterSetting ?front1.!
?front0!dmop:executes rm:DM-Operator .!
?front1!dmop:setsValueOf ?front2.!
?front1!dmop:hasValue ?front3.!
filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 16.000000) .
?front2!dmop:hasParameterKey 'minimal_size_for_split'.!
?front1!dmop:hasValue ?front3.!
filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 9.000000) .
?front1!dmop:hasValue ?front3.!
filter(2.000000 = xsd:double(?front3) xsd:double(?front3) = 5.500000) . }
Dataset characteris3cs
…
Features
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 51
52. Semantic meta-mining results
McNemar’s test for pairs of classifiers performed with the null
hypothesis that a classifier built using dataset characteristics and a
mined pattern set has the same error rate as the baseline that used
dataset characteristics and only the names of the machine learning
DM operators
Test confirmed that classifiers trained using workflow patterns
performed significantly better (in terms of accuracy) than the baseline
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 52
53. Sharing: Standardization of DM/ML schemas
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 53
54. Evolution of the field of DM/ML ontologies
20092008 2011 2012
OntoDM
20142008
DMOP
ontologies/vocabularies
events
Experiment
Databases
platform
2010
ExposéDMWF
Data Mining
Ontology
Jamboree
(Slovenia)
2015
MEX
OpenML 2016
(Netherlands)
W3C
Machine
Learning
Schema
Community
Group
OpenML
platform
2016
ML Schema
Core
2013
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 54
55. OntoDM
Pance Panov, Larisa N. Soldatova, Saso Dzeroski: Ontology of core data mining
entities. Data Min. Knowl. Discov. 28(5-6): 1222-1265 (2014)
built in compliance to upper level ontologies BFO, OBI, IAO, modularized
incorporates structured data mining
Use case: generic, middle level ontology for ML; representing QSAR entities for
drug design, used by Eve Robot Scientist
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 55
56. DMOP: Data Mining Optimization Ontology
C. Maria Keet, Agnieszka Lawrynowicz, Claudia d’Amato, Alexandros Kalousis, Phong
Nguyen, Raul Palma, Robert Stevens, Melanie Hilario: The Data Mining OPtimization
Ontology. J. Web Sem. 32: 43-53 (2015)
development started in e-LICO EU FP7 project (2009-2012)
detailed algorithm internal characteristics (’qualities’)
Use case: meta-learning (’whitebox’), meta-mining, used to produce Intelligent
Discovery Assistant for RapidMiner
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 56
57. Expos´e
Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, Geoffrey Holmes:
Experiment databases - A new way to share, organize and learn from experiments.
Machine Learning 87(2): 127-158 (2012)
re-uses OntoDM (at top-level) and DMOP (at bottom level)
superseded by OpenML DB schema
Use case: experiment databases, ExpML markup
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 57
58. Early work towards aligning DM/ML ontologies (2010)
DMO Ontology Jamboree, Josef Stefan Institute, Slovenia
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 58
59. MEX vocabulary
Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck,
Markus Ackermann, Jens Lehmann: MEX vocabulary: a lightweight interchange format
for machine learning experiments. SEMANTICS 2015: 169-176
lightweight interchange format
maps to PROV
Use case: annotating ML experiments and interchanging ML metadata
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 59
60. How to make existing DM/ML ontologies compatible?
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 60
61. W3C Machine Learning Schema Community Group (2015)
https://www.w3.org/community/ml-schema/
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 61
62. OpenML, Lorentz Center, Netherlands (2016)
First draft of ML Schema Core https://github.com/ML-Schema/core
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 62
63. Sharing beyond DM/ML domain
Mapping DMOP to workflow ontologies (Research Objects, OPMW)
(ROHub hosted by Poznan Supercomputing and Networking Center)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 63
64. Semantic data mining: more information
Semantic data mining tutorial @ ECML/PKDD’2011
http://videolectures.net/ecmlpkdd2011_lavrac_vavpetic_mining/
peculiarities of the learning setting: Open World Assumption, what is a
”truly semantic” similarity measure?, ...
methods, applications, tools
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 64
65. Summary
semantic data mining: data mining with ontologies as
background/prior knowledge, most often from structured data
ontologies best if engineered with uses cases in mind
learning in description logics and clausal languages is hard; heuristics,
dealing with peculiarities
Fr-ONT-Qu semantic pattern mining algorithm: theorethical
properties, practical evaluation
use case: semantic meta-mining for constructing Intelligent Data
Mining Assistant
importance of interoperability (for scientific reproducibility, for
inter-domain applications)
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 65
66. Acknowledgements
Polish National Science Center under the SONATA program
”ARISTOTELES: Methodology and algorithms for automatic revision of
ontologies in task based scenarios” (2014/13/D/ST6/02076) (2015-2018)
Foundation for Polish Science under the POMOST programme, cofinanced
from European Union, Regional Development Fund (POMOST/2013-7/8)
(2013-2015)
EU FP7 ICT-2007.4.4 (231519) ”e-LICO: An e-Laboratory for
Interdisciplinary Collaborative Research in Data Mining and Data-Intensive
Science” (2009-2012)
Fr-ONT-Qu, meta-mining experiments done jointly with Jedrzej Potoniec
Contributors to the development of DMOP and/or other e-LICO
infrastructure used in the research described in this presentation: Melanie
Hilario, C. Maria Keet, Claudia d’Amato, Huyen Do, Simon Fischer, Dragan
Gamberger, Lina Al-Jadir, Simon Jupp, Alexandros Kalousis, Joerg
Uwe-Kietz, Petra Kralj Novak, Babak Mougouie, Phong Nguyen, Raul
Palma, Floarea Serban, Robert Stevens, Anze Vavpetic, Jun Wang, Derry
Wijaya, Adam Woznica
Agnieszka Lawrynowicz Semantic Data Mining: an Ontology Based Approach 66