SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
DATA
SCIENCE LAB
PROJECT
Master Degree: Data Science
Accomplished by:
A. Portaluppi & L. Ravazzi &
M. Spandri
A.A. 2019-2020
1
INTRODUCTION
DATA
SET
DEMS Publications
(Dipartimento di
Economia, Metodi
Quantitativi e Strategie
di Impresa).
Find topics studied
by DEMS universitary
researcher.
Multidimensional
Scaling techniques
and Cluster
Analysis.
2
PURPOSES TOOLS
DATA
MANAGEMENT:
1. Exploration
2. Preprocessing
3. Data Cleaning
4. NLP
MULTI-
DIMENSIONAL
SCALING:
1. Common
Multidimension
al Scaling
2. Metric Scaling
3. Sammon
Mapping
CLUSTER
ANALYSIS:
Prototype-Based:
Fuzzy Algorithm
DATA
VISUALIZATION:
RShiny
Application
STEP BY STEP
3
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
ID TITLE JOURNAL
ABSTRAC
T
ABSTRAC
T_ENG
KEYWOR
DS
KEYWOR
DS_ENG
235 … … …
ID DEMS_AUTHORS
235 …
235 …
235 …
…
HOW TO MANAGE THE DATA SETS?
4
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
HOW TO CHOOSE
RECORDS?
30% DOCUMENTS LEFT
5
JOURNAL ARTICLES
WRITTEN BY
ASSISTANT
PROFESSORS ETC.
ENGLISH
LANGUAGE
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
What does it means in English?
Perfect! There is a field
which specifies the
language.
The language of an article is
the language of the abstract
(textcat function).
6
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
NATURAL LANGUAGE PROCESSING
Start with mixed texts
(title, abstract,
keywords and journal)
Bag of
words
1. Drop out punctuation,
stop-words, non-letter
character
2. All in lower case
3. Stemming process
1 2 3 4
Compute
tf_idf
7
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
MULTIDIMENSIONAL SCALING
WHAT?
A function to
project data from
a N-dimensional
space to 2 or 3
dimensions
WHY?
• Graphical
approach
(Clustering)
• Increase
Interpretability
HOW?
• Metric
• Non Metric
8
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
Table with text and the
number of terms into the
bag of words.
Choose a proximity measures.
Apply the desidered
technique.
9
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
1. Common Multidimensional Scaling
(Euclidean distance)
2. Metric Scaling
3. Sammon Mapping (Manhattan
distance)
We applied three
techniques:
and will describe only the last
one due to the good results.
10
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
SAMMON MAPPING
Minimize Sammon Stress:
where is the distance
between the i-th and j-th
observation in the initial space,
while refers to the final
space.
For metric
and non
metric data
Non-linear
trasformation
approach
(different from
PCA)
11
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
Since an article can touch
different topics, clustering must
be of fuzzy type.
CLUSTERING
Labels of clusters rely on the
fifteen words most frequent
in the bag of words.
12
Manhattan distance is used
in order to build clusters.
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
LABELS
13
DEMS
ECONOMICS
STATISTICS
BUSINESS
STRATEGY
• Finance and
Energy
• Economic policy
• Macroeconomics
• Income
Distribution
• Game Theory
• Health Statistics
• Pure Statistics
• Statistics and
Finance
• Social Issues
• Industrial Economic
• Corporate Finance
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
14
MOVE TO
RSHINY!
1
2
3
SUMMARY
CONCLUSIONS
15
Multidimensional
scaling is a
powerful tool to
visualize data.
We found the main
topics studied by
DEMS researches.
MDS and
Clustering can
show interesting
patterns in data.
FUTURE DEVELOPMENTS
Other techniques for scaling, such as
Self Organizing Maps.
Other proximity measures for MDS.
Consider not only singleton into the
bag of words (Association Analysis).
16
17
THANK YOU
FOR YOUR
ATTENTION

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

2.6 Curve Sketching Rcbhs
2.6 Curve Sketching Rcbhs2.6 Curve Sketching Rcbhs
2.6 Curve Sketching Rcbhs
 
Presentation on application of matrix
Presentation on application of matrixPresentation on application of matrix
Presentation on application of matrix
 
Matrix and it's Application
Matrix and it's ApplicationMatrix and it's Application
Matrix and it's Application
 
Spatial Data Model 2
Spatial Data Model 2Spatial Data Model 2
Spatial Data Model 2
 
Geographical information system unit 5
Geographical information  system unit 5Geographical information  system unit 5
Geographical information system unit 5
 
Matrix in software engineering
Matrix in software engineeringMatrix in software engineering
Matrix in software engineering
 
Applications of Matrix
Applications of MatrixApplications of Matrix
Applications of Matrix
 
How to train your mind to think like the ai machine you are training
How to train your mind to think like the ai machine you are trainingHow to train your mind to think like the ai machine you are training
How to train your mind to think like the ai machine you are training
 
Application of calculus in cse
Application of calculus in cseApplication of calculus in cse
Application of calculus in cse
 
Uses Of Calculus is Computer Science
Uses Of Calculus is Computer ScienceUses Of Calculus is Computer Science
Uses Of Calculus is Computer Science
 
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
Data Visualisation using SSRS: Euclid's Royal Road to the numbersData Visualisation using SSRS: Euclid's Royal Road to the numbers
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
 
Use of matrix in daily life
Use of matrix in daily lifeUse of matrix in daily life
Use of matrix in daily life
 
Application of matrices in real life and matrix
Application of matrices in real life and matrixApplication of matrices in real life and matrix
Application of matrices in real life and matrix
 
datamodel_vector
datamodel_vectordatamodel_vector
datamodel_vector
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
 
2.6b scatter plots and lines of best fit
2.6b scatter plots and lines of best fit2.6b scatter plots and lines of best fit
2.6b scatter plots and lines of best fit
 
Calculus
CalculusCalculus
Calculus
 
Geo-spatial Analysis and Modelling
Geo-spatial Analysis and ModellingGeo-spatial Analysis and Modelling
Geo-spatial Analysis and Modelling
 
Applications of Linear Algebra in Computer Sciences
Applications of Linear Algebra in Computer SciencesApplications of Linear Algebra in Computer Sciences
Applications of Linear Algebra in Computer Sciences
 
Applications of matrices in Real\Daily life
Applications of matrices in Real\Daily lifeApplications of matrices in Real\Daily life
Applications of matrices in Real\Daily life
 

Semelhante a Data science lab project

Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
butest
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
NitishChoudhary23
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 

Semelhante a Data science lab project (20)

UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
 
Intro & Applications of Discrete Math
Intro & Applications of Discrete MathIntro & Applications of Discrete Math
Intro & Applications of Discrete Math
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Linear Regression with R programming.pptx
Linear Regression with R programming.pptxLinear Regression with R programming.pptx
Linear Regression with R programming.pptx
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Data science lab project