SlideShare a Scribd company logo
1 of 16
Download to read offline
A Domain Specific
ESA-inspired Approach for
Document Semantic
Description
Luca Mazzola, Patrick Siegfried, Andreas Waldis, Michael Kaufmann, and
Alexander Denzler
HSLU - Lucerne University of Applied Sciences,
School of Information Technology,
6343 - Rotkreuz,
Switzerland
9th IEEE International Conference on Intelligent Systems – IS2018
IEEE_IS2018 25/09/2018
Slide 2, 25-Sep-18
- DSS : Decision Support System for
- job placement
- further education suggestion
- profile (CV) similarity identification
- Data driven
- Automatically evolving (no rules definition need)
- Limiting the cold-start problem.
Motivation
• DSS
• Data-driven
• Limited cold-start
IEEE_IS2018 25/09/2018
Slide 3, 25-Sep-18
- Unstructured/semi-structured documents
- CV/resumé
- job offer
- education description(high school, professional
instruction, Bachelor, Master, executive ed.,…)
- Other general purposes docs (e.g: websites)
- Mixing with on-the-job training:
- No formal learning objective, no uniform
description
- Consideration of competences due to job
experiences
Issues
• Unstructured data
• Different origin/standard
• Informal and semiformal
IEEE_IS2018 25/09/2018
Slide 4, 25-Sep-18
- External crowd-based available corpus: Wikipedia
- Good quality
- Concepts = existing page titles
- Vocabulary = page content (stems)
- Metric = normalized TF-IDF
- As suggested by ESA, but transposed
- Domain specific filtering
- Noise reduction by removal of “irrelevant”
concepts / vocabulary
Our Approach
• Wikipedia ad data-source (ESA)
• nTF-IDF
• Domain specific (noise limiting)
IEEE_IS2018 25/09/2018
Slide 5, 25-Sep-18
Semantic matrix building process
• Enriching ( NO Disambiguation,
Virtual pages for Redirect)
• filtering
Data characterization:
IEEE_IS2018 25/09/2018
DEWiki: ~2.5M
CVs: ~27K
JOB offers: ~30K
Education descr: ~1,1K
Valid “concepts”: ~40K
Valid ”stems”: ~66K
Slide 6, 25-Sep-18
Reference Model building
• Additional distribution data
• Dynamic filtering
IEEE_IS2018 25/09/2018
Slide 7, 25-Sep-18
- develop a metric to compare documents based on
common set of attributes
- compare two given documents:
- identify similarities
- extract common “concepts”
- compare a given document against a set:
- assign relevant CVs to a job post
- Match educational experiences to CV on
common skill-set
- find similar CVs to a given one
Requirements
• Set of requirements
IEEE_IS2018 25/09/2018
Slide 8, 25-Sep-18
- Ranked matching between 17CVs and 44
educational experiences
- Golden standard: manual annotation by business
partner (ordered top-3 educations for each CV)
- Weighted as from the table  Expected value
for pure random assignment: E[Q] ~ 0.32
- Obtained result  Q = 6.62 and sd[Q]= 1.68
- Additional analysis, for 5 representative cases:
Non-randomness verification
• Wikipedia ad data-source (ESA)
• nTF-IDF
• Domain specific (noise limiting)
Rank #1 #2 #3
Top-1 2 - -
Top-2 1/2 3/2 -
Top-3 1/3 3/3 5/3
Top-5 1/5 3/5 5/5
Top-10 1/10 3/10 5/10
IEEE_IS2018 25/09/2018
Slide 9, 25-Sep-18
- We identified a set of 10 heterogenous
documents in German:
- Doc1 Automobile Meckatroniker EZF (educ exp)
- Doc2 Software Entwichkler (JOB offer)
- Doc3 B.Sc. Medizin-Informatiker/in BFH (educ exp)
- Doc4 AutoMeckatroniker (JOB offer)
- Doc5 Webpage of «Data Intelligence» team at HSLU (website)
- Doc6 Dipl. Pflegefachperson HF/FH(Privatabteilung) (JOB offer)
- Doc7 Luzerner Kantonspital website - general page (website)
- Doc8 Zuger Kantonspital website – «about us» (website)
- Doc9 Visa hat technische Probleme in ganz Europa (news, 01Jun)
- Doc10 Bayer übernimmt Monsanto für 63 Milliarden (news, 07Jun)
- Analysis to discover relationships (similarities)
amongst them
Experiment
• Experiment setup
IEEE_IS2018 25/09/2018
noise, from http://www.20min.ch
Slide 10, 25-Sep-18
Results – pairwise similarities
IEEE_IS2018 25/09/2018
v
v
v
v
v
v
v
v
?
Slide 11, 25-Sep-18
Results – final R measure
IEEE_IS2018 25/09/2018
Slide 12, 25-Sep-18
Result – Dendrogram by spectral
clustering
IEEE_IS2018 25/09/2018
Slide 13, 25-Sep-18
- An ESA-inspired approach for document
comparison
- Able to work on heterogeneous documents
- Language
- structure
- Domain filtering for better specificity (less noise)
- Better results wrt randomness
- Human manual evaluation positive
- Clustering capabilities
- Meaninful
- Able to spot and “separate” outliers in a
dataset(noise)
Achievments
• New approach
• Good performances
• Outliers “detection”
IEEE_IS2018 25/09/2018
Slide 14, 25-Sep-18
- Language dependent
- Currently in German
- No interpretation of absolute distance of
documents
- Only comparisons are meaningful
- No completely meaningful explicit signature of
document (such as the one offered by ESA)
- Computation complexity for model creation
- But, dynamic adjustment partially compensate
Limits
• Language dependency
• Adopted metrics
• Explicit semantic interpretation
IEEE_IS2018 25/09/2018
Slide 15, 25-Sep-18
- Granular approach usage
- Using, if available, the CV semi-structure
- Customizable metrics for stem weighting
- Different metrics for vectors comparison
- Multilanguage version
- Using the Wikipedia metadata for “translated”
pages
- Granular map of the CH educational panorama
Next Steps
• Improve model (metrics)
• Multilanguage support
• Towards a Map of CH education
IEEE_IS2018 25/09/2018
T direct
Research
Dr. Luca Mazzola
Research Associate
+41 41 757 68 90
luca.mazzola@hslu.ch
Rotkreuz
Questions
IEEE_IS2018 25/09/2018

More Related Content

What's hot

International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)ijfcst journal
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)ijfcst journal
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)ijfcst journal
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)ijfcst journal
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)ijfcst journal
 
Call for papers - International Journal of Information Technology, Control a...
Call for papers -  International Journal of Information Technology, Control a...Call for papers -  International Journal of Information Technology, Control a...
Call for papers - International Journal of Information Technology, Control a...IJITCA Journal
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)ijfcst journal
 
Resume - Hugh Zhao - August 2016
Resume - Hugh Zhao - August 2016Resume - Hugh Zhao - August 2016
Resume - Hugh Zhao - August 2016Hugh Zhao
 
Call for papers - 7th International Conference on Advances in Computer Scienc...
Call for papers - 7th International Conference on Advances in Computer Scienc...Call for papers - 7th International Conference on Advances in Computer Scienc...
Call for papers - 7th International Conference on Advances in Computer Scienc...IJCSES Journal
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 
7th International Conference on Advances in Computer Science and Information ...
7th International Conference on Advances in Computer Science and Information ...7th International Conference on Advances in Computer Science and Information ...
7th International Conference on Advances in Computer Science and Information ...IJITCA Journal
 

What's hot (14)

International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)
 
Call for papers - International Journal of Information Technology, Control a...
Call for papers -  International Journal of Information Technology, Control a...Call for papers -  International Journal of Information Technology, Control a...
Call for papers - International Journal of Information Technology, Control a...
 
International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)International Journal in Foundations of Computer Science & Technology(IJFCST)
International Journal in Foundations of Computer Science & Technology(IJFCST)
 
Resume - Hugh Zhao - August 2016
Resume - Hugh Zhao - August 2016Resume - Hugh Zhao - August 2016
Resume - Hugh Zhao - August 2016
 
Call for papers - 7th International Conference on Advances in Computer Scienc...
Call for papers - 7th International Conference on Advances in Computer Scienc...Call for papers - 7th International Conference on Advances in Computer Scienc...
Call for papers - 7th International Conference on Advances in Computer Scienc...
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
7th International Conference on Advances in Computer Science and Information ...
7th International Conference on Advances in Computer Science and Information ...7th International Conference on Advances in Computer Science and Information ...
7th International Conference on Advances in Computer Science and Information ...
 
Process Mining
Process MiningProcess Mining
Process Mining
 
A knowledge-based solution for automatic mapping in component based automat...
A knowledge-based solution for  automatic mapping in component  based automat...A knowledge-based solution for  automatic mapping in component  based automat...
A knowledge-based solution for automatic mapping in component based automat...
 
From artificial cognitive systems and open architectures to cognitive manufac...
From artificial cognitive systems and open architectures to cognitive manufac...From artificial cognitive systems and open architectures to cognitive manufac...
From artificial cognitive systems and open architectures to cognitive manufac...
 

Similar to Document semantic characterization

DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Leonidas Akritidis
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in EuropeSteven Miller
 
Schema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge GraphsSchema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge GraphsVera G. Meister
 
The Story of the Semantic Grid
The Story of the Semantic GridThe Story of the Semantic Grid
The Story of the Semantic Gridbutest
 
Cloud ERP Security: Guidelines for evaluation
Cloud ERP Security: Guidelines for evaluationCloud ERP Security: Guidelines for evaluation
Cloud ERP Security: Guidelines for evaluationNazli Sahin
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsHong-Linh Truong
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Studyswolny
 
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffMeasuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffHong-Linh Truong
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Research Data Alliance
 
Data & Analytics at Scale
Data & Analytics at ScaleData & Analytics at Scale
Data & Analytics at ScaleWalid Mehanna
 
Data Analysis In Excel - Course Gate
Data Analysis In Excel - Course GateData Analysis In Excel - Course Gate
Data Analysis In Excel - Course GateCourse Gate
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 

Similar to Document semantic characterization (20)

DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in Europe
 
Schema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge GraphsSchema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge Graphs
 
Sodc 1 Introduction
Sodc 1 IntroductionSodc 1 Introduction
Sodc 1 Introduction
 
The Story of the Semantic Grid
The Story of the Semantic GridThe Story of the Semantic Grid
The Story of the Semantic Grid
 
Cloud ERP Security: Guidelines for evaluation
Cloud ERP Security: Guidelines for evaluationCloud ERP Security: Guidelines for evaluation
Cloud ERP Security: Guidelines for evaluation
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflows
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy TradeoffMeasuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
Measuring, Quantifying, & Predicting the Cost-Accuracy Tradeoff
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Data & Analytics at Scale
Data & Analytics at ScaleData & Analytics at Scale
Data & Analytics at Scale
 
Data Analysis In Excel - Course Gate
Data Analysis In Excel - Course GateData Analysis In Excel - Course Gate
Data Analysis In Excel - Course Gate
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 

More from Luca Mazzola

Concept extraction with convolutional neural networks
Concept extraction with convolutional neural networksConcept extraction with convolutional neural networks
Concept extraction with convolutional neural networksLuca Mazzola
 
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...Luca Mazzola
 
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERUPattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERULuca Mazzola
 
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in ManufacturingODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in ManufacturingLuca Mazzola
 
Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...Luca Mazzola
 
MRC12_120915_MOCLog
MRC12_120915_MOCLogMRC12_120915_MOCLog
MRC12_120915_MOCLogLuca Mazzola
 
Icalt2012 presentation
Icalt2012 presentationIcalt2012 presentation
Icalt2012 presentationLuca Mazzola
 
Presentazione moodle notification_moodlemoot2011_trieste
Presentazione  moodle notification_moodlemoot2011_triestePresentazione  moodle notification_moodlemoot2011_trieste
Presentazione moodle notification_moodlemoot2011_triesteLuca Mazzola
 
Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010Luca Mazzola
 
Presentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - BariPresentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - BariLuca Mazzola
 
GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...Luca Mazzola
 
Protezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico IntroProtezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico IntroLuca Mazzola
 
Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Luca Mazzola
 
Toward adaptive presentations of student models in eLearning environments
Toward adaptive presentations
 of student models
in eLearning environmentsToward adaptive presentations
 of student models
in eLearning environments
Toward adaptive presentations of student models in eLearning environmentsLuca Mazzola
 
Towards Home Healthcare Informatics
Towards Home Healthcare InformaticsTowards Home Healthcare Informatics
Towards Home Healthcare InformaticsLuca Mazzola
 
Moodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorseMoodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorseLuca Mazzola
 
Presentazione per MIC 2008
Presentazione per MIC 2008Presentazione per MIC 2008
Presentazione per MIC 2008Luca Mazzola
 
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...Luca Mazzola
 

More from Luca Mazzola (19)

Concept extraction with convolutional neural networks
Concept extraction with convolutional neural networksConcept extraction with convolutional neural networks
Concept extraction with convolutional neural networks
 
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
 
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERUPattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
 
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in ManufacturingODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
 
Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...
 
MRC12_120915_MOCLog
MRC12_120915_MOCLogMRC12_120915_MOCLog
MRC12_120915_MOCLog
 
Icalt2012 presentation
Icalt2012 presentationIcalt2012 presentation
Icalt2012 presentation
 
Presentazione moodle notification_moodlemoot2011_trieste
Presentazione  moodle notification_moodlemoot2011_triestePresentazione  moodle notification_moodlemoot2011_trieste
Presentazione moodle notification_moodlemoot2011_trieste
 
Ifhro2010
Ifhro2010Ifhro2010
Ifhro2010
 
Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010
 
Presentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - BariPresentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - Bari
 
GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...
 
Protezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico IntroProtezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico Intro
 
Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...
 
Toward adaptive presentations of student models in eLearning environments
Toward adaptive presentations
 of student models
in eLearning environmentsToward adaptive presentations
 of student models
in eLearning environments
Toward adaptive presentations of student models in eLearning environments
 
Towards Home Healthcare Informatics
Towards Home Healthcare InformaticsTowards Home Healthcare Informatics
Towards Home Healthcare Informatics
 
Moodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorseMoodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorse
 
Presentazione per MIC 2008
Presentazione per MIC 2008Presentazione per MIC 2008
Presentazione per MIC 2008
 
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...
 

Recently uploaded

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 

Document semantic characterization

  • 1. A Domain Specific ESA-inspired Approach for Document Semantic Description Luca Mazzola, Patrick Siegfried, Andreas Waldis, Michael Kaufmann, and Alexander Denzler HSLU - Lucerne University of Applied Sciences, School of Information Technology, 6343 - Rotkreuz, Switzerland 9th IEEE International Conference on Intelligent Systems – IS2018 IEEE_IS2018 25/09/2018
  • 2. Slide 2, 25-Sep-18 - DSS : Decision Support System for - job placement - further education suggestion - profile (CV) similarity identification - Data driven - Automatically evolving (no rules definition need) - Limiting the cold-start problem. Motivation • DSS • Data-driven • Limited cold-start IEEE_IS2018 25/09/2018
  • 3. Slide 3, 25-Sep-18 - Unstructured/semi-structured documents - CV/resumé - job offer - education description(high school, professional instruction, Bachelor, Master, executive ed.,…) - Other general purposes docs (e.g: websites) - Mixing with on-the-job training: - No formal learning objective, no uniform description - Consideration of competences due to job experiences Issues • Unstructured data • Different origin/standard • Informal and semiformal IEEE_IS2018 25/09/2018
  • 4. Slide 4, 25-Sep-18 - External crowd-based available corpus: Wikipedia - Good quality - Concepts = existing page titles - Vocabulary = page content (stems) - Metric = normalized TF-IDF - As suggested by ESA, but transposed - Domain specific filtering - Noise reduction by removal of “irrelevant” concepts / vocabulary Our Approach • Wikipedia ad data-source (ESA) • nTF-IDF • Domain specific (noise limiting) IEEE_IS2018 25/09/2018
  • 5. Slide 5, 25-Sep-18 Semantic matrix building process • Enriching ( NO Disambiguation, Virtual pages for Redirect) • filtering Data characterization: IEEE_IS2018 25/09/2018 DEWiki: ~2.5M CVs: ~27K JOB offers: ~30K Education descr: ~1,1K Valid “concepts”: ~40K Valid ”stems”: ~66K
  • 6. Slide 6, 25-Sep-18 Reference Model building • Additional distribution data • Dynamic filtering IEEE_IS2018 25/09/2018
  • 7. Slide 7, 25-Sep-18 - develop a metric to compare documents based on common set of attributes - compare two given documents: - identify similarities - extract common “concepts” - compare a given document against a set: - assign relevant CVs to a job post - Match educational experiences to CV on common skill-set - find similar CVs to a given one Requirements • Set of requirements IEEE_IS2018 25/09/2018
  • 8. Slide 8, 25-Sep-18 - Ranked matching between 17CVs and 44 educational experiences - Golden standard: manual annotation by business partner (ordered top-3 educations for each CV) - Weighted as from the table  Expected value for pure random assignment: E[Q] ~ 0.32 - Obtained result  Q = 6.62 and sd[Q]= 1.68 - Additional analysis, for 5 representative cases: Non-randomness verification • Wikipedia ad data-source (ESA) • nTF-IDF • Domain specific (noise limiting) Rank #1 #2 #3 Top-1 2 - - Top-2 1/2 3/2 - Top-3 1/3 3/3 5/3 Top-5 1/5 3/5 5/5 Top-10 1/10 3/10 5/10 IEEE_IS2018 25/09/2018
  • 9. Slide 9, 25-Sep-18 - We identified a set of 10 heterogenous documents in German: - Doc1 Automobile Meckatroniker EZF (educ exp) - Doc2 Software Entwichkler (JOB offer) - Doc3 B.Sc. Medizin-Informatiker/in BFH (educ exp) - Doc4 AutoMeckatroniker (JOB offer) - Doc5 Webpage of «Data Intelligence» team at HSLU (website) - Doc6 Dipl. Pflegefachperson HF/FH(Privatabteilung) (JOB offer) - Doc7 Luzerner Kantonspital website - general page (website) - Doc8 Zuger Kantonspital website – «about us» (website) - Doc9 Visa hat technische Probleme in ganz Europa (news, 01Jun) - Doc10 Bayer übernimmt Monsanto für 63 Milliarden (news, 07Jun) - Analysis to discover relationships (similarities) amongst them Experiment • Experiment setup IEEE_IS2018 25/09/2018 noise, from http://www.20min.ch
  • 10. Slide 10, 25-Sep-18 Results – pairwise similarities IEEE_IS2018 25/09/2018 v v v v v v v v ?
  • 11. Slide 11, 25-Sep-18 Results – final R measure IEEE_IS2018 25/09/2018
  • 12. Slide 12, 25-Sep-18 Result – Dendrogram by spectral clustering IEEE_IS2018 25/09/2018
  • 13. Slide 13, 25-Sep-18 - An ESA-inspired approach for document comparison - Able to work on heterogeneous documents - Language - structure - Domain filtering for better specificity (less noise) - Better results wrt randomness - Human manual evaluation positive - Clustering capabilities - Meaninful - Able to spot and “separate” outliers in a dataset(noise) Achievments • New approach • Good performances • Outliers “detection” IEEE_IS2018 25/09/2018
  • 14. Slide 14, 25-Sep-18 - Language dependent - Currently in German - No interpretation of absolute distance of documents - Only comparisons are meaningful - No completely meaningful explicit signature of document (such as the one offered by ESA) - Computation complexity for model creation - But, dynamic adjustment partially compensate Limits • Language dependency • Adopted metrics • Explicit semantic interpretation IEEE_IS2018 25/09/2018
  • 15. Slide 15, 25-Sep-18 - Granular approach usage - Using, if available, the CV semi-structure - Customizable metrics for stem weighting - Different metrics for vectors comparison - Multilanguage version - Using the Wikipedia metadata for “translated” pages - Granular map of the CH educational panorama Next Steps • Improve model (metrics) • Multilanguage support • Towards a Map of CH education IEEE_IS2018 25/09/2018
  • 16. T direct Research Dr. Luca Mazzola Research Associate +41 41 757 68 90 luca.mazzola@hslu.ch Rotkreuz Questions IEEE_IS2018 25/09/2018