SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Interactive Analysis of
Word Vector Embeddings
Florian Heimerl and Michael Gleicher
Department of Computer Sciences
University of Wisconsin - Madison
Florian Heimerl and Michael Gleicher
Department of Computer Sciences
University of Wisconsin - Madison
Interactive Analysis of
Word Vector Embeddings
Summary:
Word vector embeddings offer unique challenges
Task analysis of needs 3 Designs for unmet needs
Buddy Plots
Concept Axis Plots
Co-occurance Matrices
What is an embedding?
General mathematics:
Place a smaller structure into a
larger structure
Computer science:
Place a discrete set of objects
into a vector space
Encode relationships between
objects
Paris
4,8,1,3,…
Atlanta
5,2,1,7,…New York
4,8,1,3,…
London
5,2,1,7,…
Boston
3,2,5,1,…
Tokyo
9,2,6,4,…
Beijing
7,3,2,7,…
San Jose
5,2,1,7,…
Jakarta
3,2,5,1,…
Sydney
9,2,6,4,…
Munich
7,3,2,7,…
High Dimensional Data
Objects have associated Vectors
Kinds of relationships in embeddings
Distance
A is closer to B than to C
B
C
D
A
E
Linear Structure
A is to B as C is to D
Semantic Directions
A is more X than C
B
C
D
AF
F
E
B
A
C
D
Relationships are interesting even if global positions are not
Word vector embeddings
Place words in a high-dimensional vector space
Words similar in meaning should be close in space
Infer similarity by distributional semantics:
similar context implies similar meaning
Construct embeddings by processing a corpus of text
my pet cat is brown
my pet dog is brown
my big car is brown
Several ways to build embeddings
Word2Vec
Skip-gram model
Neural embedding
GLoVE
Co-occurrence model
Factor matrix by optimization
Why use Word Vector Embeddings?
Learn about
Language or Corpora (Texts)
Find similar words/synonyms
Track changes of word usage
Exploring polysemy
Creating lexical resources
Evidence of bias
…
Natural Language Applications
Pre-Processing
Translation
Sentiment Analysis
Interpretation
…
Challenges of Word Vector Embeddings
Large numbers of Words
High-dimensional spaces
Complex relationships – meaningless positions
Complex processes for building embeddings
Complex downstream applications
No ground truth - subjective aspects
Why visualization?
Tasks are inherently human-centric
Variety of tasks involving interpretation
Linguistic and domain knowledge for applications
But what are those tasks?
Task Analysis:
What do people do with Word Embeddings?
Literature survey
111 papers from diverse communities
Consider use cases in Linguistics, HCI, Digitial Humanities, etc.
Augment this list by extrapolation
what tasks would users want – but aren’t doing yet
Use cases suggest tasks
Learn about
Language or Corpora (Texts)
Find similar words/synonyms
Track changes of word usage
Exploring polysemy
Creating lexical resources
Natural Language Applications
Pre-Processing
Translation
Sentiment Analysis
Interpretation
Evaluation
Intrinsic (good embedding?)
Extrinsic (applications success?)
Interpretation
Identify items of interest
Probe values of interest
Linguistic Tasks and Characteristics
We identified 7 distinct linguistic tasks within the literature
Two ways those tasks are relevant
• Automatically test embeddings through human-curated resources
• Users probe embedding
4 characteristics pertinent to those tasks: similarity, arithmetic
structures, concept axis, and co-occurrences
13
Prior Work:
Word Vector Embedding Visualizations
Liu et al., InfoVis 2017
Analogy Relationships
Smilkov et al., 2016
Distance Relationships
Rong and Adar, 2016
Training Process
Tasks and Characteristics vs. Visualizations
Rank word pairs similarity
View neighbors similarity
Select synonyms similarity
Compare concepts average, similarity
Find analogies offset, similarity
Project on Concept concept axis
Predict Contexts co-oc. probability
Smilkov, et al 2016
Liu, et al 2017
Tasks and Characteristics vs. Visualizations
Rank word pairs similarity
View neighbors similarity
Select synonyms similarity
Compare concepts average, similarity
Find analogies offset, similarity
Project on Concept concept axis
Predict Contexts co-oc. probability
Smilkov, et al 2016
Liu, et al 2017
Tasks we seek designs for in this paper
Buddy Plots
Concept Axis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Buddy Plots
Concept Axis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Similarities:
Understanding local distances
Distances are meaningful
even if absolute values are not
What is close to a word?
Are there groups of words that are similar?
Ordered lists are useful
Density (how many can you show)
Sense of relative distances
Comparison between words
Embedding Projector
Smilkov, et al. 2016
Buddy Plots (1D lists)
Alexander and Gleicher, 2016 – for Topic Models
Map distance (to selected reference) to horizontal axis
Reference
object
Closest
to ref
Not dimensionality reduction?
Map distance to word to horizontal axis
Focus on a single point – other relations not preserved
?
Embedding Projector
Smilkov, et al. 2016
Stacked/Chained buddy plots
Use color to encode distance
Stacked/Chained buddy plots
Use color to encode distance in the reference row
A
AB
B
Stacked/Chained buddy plots
Use color to encode distance in the reference
Same word… different embedding
Broadcast
Buddy Plots
Concept Axis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Why are words similar?
Understandng word co-occurence
Similarity based on co-occurrence
count how often one word occurs near another
Co-occurrence matrix
main form of input data
many models approximate the matrix (reconstruct)
Useful for understanding and diagnosing models
Co-occurrence view
How do we view the massive matrix?
color encoding (heat map) – density, relative values
select rows (specify words of interest)
select columns (metrics of interestingness – given rows)
highest values
highest variance
Co-occurrence matrix view
Matrix comparison view
High variance words in one embedding may be low in the other
Buddy Plots
Concept Axis Plots
Co-occurance Matrices
1. Similarities (local distances)
2. Co-Occurrances
3. Concept Axes
Concept Axes:
Understanding Semantic Directions
Opposing concepts
make an axis
Food
Pet
Concept Axes:
Understanding Concept Axes
Define an axis from
one concept to
another
Food
Pet
Dog
Cat
Goldfish
Chicken
Cow
Trout
Ways to define axes
Vector between two concepts
Interaxis - Kim et al., 2015
Classifier between two groups
Explainers – Gleicher, 2013
Food
Pet
Dog
Cat
Goldfish
Chicken
Cow
Trout
Multiple Concept Axes
Use multi-variate plots
2D = Scatterplot
Pet Food
SmallBig
Scatterplot Interaction
man woman
academic
worker
man woman
academic
worker
Application: Word meaning change
Application: Stability Assessment
Implementation
http://graphics.cs.wisc.edu/Vis/EmbVis/
Everything runs on line (simplified interfaces)
cloud version uses small models
Python backend / D3 frontend
Limitations
Implementation
Effectiveness
Completeness
Usability and Scalability
Evaluation (of designs)
More Tasks
Identifying Probes
Explicit Comparison
Connection to (model) evaluation
Feedback to model building
Summary:
Word vector embeddings offer unique challenges
Task analysis of needs Designs for 3 unmet needs
Buddy Plots
Concept Axis Plots
Co-occurance Matrices
Acknowledgments
This work is funded in part by NSF 1162037
and DARPA FA8750-17-2-0107
http://graphics.cs.wisc.edu/Vis/EmbVis/

Mais conteúdo relacionado

Mais procurados

Thinking in clustering yueshen xu
Thinking in clustering yueshen xuThinking in clustering yueshen xu
Thinking in clustering yueshen xuYueshen Xu
 
Report
ReportReport
Reportbutest
 
A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...
A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...
A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...ijcoa
 
Assessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-ComputingAssessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
 
Contextual Definition Generation
Contextual Definition GenerationContextual Definition Generation
Contextual Definition GenerationSergey Sosnovsky
 
Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)Natalia Djahi
 
先進網際服務系統1008
先進網際服務系統1008先進網際服務系統1008
先進網際服務系統1008youji0811
 
Learning Probabilistic Relational Models
Learning Probabilistic Relational ModelsLearning Probabilistic Relational Models
Learning Probabilistic Relational ModelsUniversity of Nantes
 
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
Order out of Chaos: Construction of Knowledge Models from PDF TextbooksOrder out of Chaos: Construction of Knowledge Models from PDF Textbooks
Order out of Chaos: Construction of Knowledge Models from PDF TextbooksIsaac Alpizar-Chacon
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksUniversity of Nantes
 
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...CSCJournals
 
An Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextAn Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextFulvio Rotella
 
Designing learning activities_on_computa (1)
Designing learning activities_on_computa (1)Designing learning activities_on_computa (1)
Designing learning activities_on_computa (1)Natalia Djahi
 
An exact approach to learning Probabilistic Relational Model
An exact approach to learning Probabilistic Relational ModelAn exact approach to learning Probabilistic Relational Model
An exact approach to learning Probabilistic Relational ModelUniversity of Nantes
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology MappingPradeep B Pillai
 
A cognitive approach for Modelling and Reasoning on Commonsense Knowledge in...
A cognitive  approach for Modelling and Reasoning on Commonsense Knowledge in...A cognitive  approach for Modelling and Reasoning on Commonsense Knowledge in...
A cognitive approach for Modelling and Reasoning on Commonsense Knowledge in...Antonio Lieto
 

Mais procurados (19)

Thinking in clustering yueshen xu
Thinking in clustering yueshen xuThinking in clustering yueshen xu
Thinking in clustering yueshen xu
 
Report
ReportReport
Report
 
A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...
A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...
A Study on Transition of Logic Connectives to Induced Linked Fuzzy Relational...
 
Assessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-ComputingAssessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-Computing
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
Contextual Definition Generation
Contextual Definition GenerationContextual Definition Generation
Contextual Definition Generation
 
Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)Visual mapping sentence_a_methodological (1)
Visual mapping sentence_a_methodological (1)
 
先進網際服務系統1008
先進網際服務系統1008先進網際服務系統1008
先進網際服務系統1008
 
Learning Probabilistic Relational Models
Learning Probabilistic Relational ModelsLearning Probabilistic Relational Models
Learning Probabilistic Relational Models
 
Zrm
ZrmZrm
Zrm
 
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
Order out of Chaos: Construction of Knowledge Models from PDF TextbooksOrder out of Chaos: Construction of Knowledge Models from PDF Textbooks
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian Networks
 
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
 
An Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextAn Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from Text
 
Designing learning activities_on_computa (1)
Designing learning activities_on_computa (1)Designing learning activities_on_computa (1)
Designing learning activities_on_computa (1)
 
An exact approach to learning Probabilistic Relational Model
An exact approach to learning Probabilistic Relational ModelAn exact approach to learning Probabilistic Relational Model
An exact approach to learning Probabilistic Relational Model
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
A cognitive approach for Modelling and Reasoning on Commonsense Knowledge in...
A cognitive  approach for Modelling and Reasoning on Commonsense Knowledge in...A cognitive  approach for Modelling and Reasoning on Commonsense Knowledge in...
A cognitive approach for Modelling and Reasoning on Commonsense Knowledge in...
 

Semelhante a Interactive Analysis of Word Vector Embeddings

The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindisinghg77
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approachdinesh_joshy
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysisfridolin.wild
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Diagrammatic knowledge modeling for managers – ontology-based approach
Diagrammatic knowledge modeling for managers  – ontology-based approachDiagrammatic knowledge modeling for managers  – ontology-based approach
Diagrammatic knowledge modeling for managers – ontology-based approachDmitry Kudryavtsev
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Inheritance of Handwriting Features
Inheritance of Handwriting FeaturesInheritance of Handwriting Features
Inheritance of Handwriting Featuresirjes
 

Semelhante a Interactive Analysis of Word Vector Embeddings (20)

The Duet model
The Duet modelThe Duet model
The Duet model
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysis
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Diagrammatic knowledge modeling for managers – ontology-based approach
Diagrammatic knowledge modeling for managers  – ontology-based approachDiagrammatic knowledge modeling for managers  – ontology-based approach
Diagrammatic knowledge modeling for managers – ontology-based approach
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Ontology
OntologyOntology
Ontology
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Inheritance of Handwriting Features
Inheritance of Handwriting FeaturesInheritance of Handwriting Features
Inheritance of Handwriting Features
 

Último

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Último (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Interactive Analysis of Word Vector Embeddings

  • 1. Interactive Analysis of Word Vector Embeddings Florian Heimerl and Michael Gleicher Department of Computer Sciences University of Wisconsin - Madison
  • 2. Florian Heimerl and Michael Gleicher Department of Computer Sciences University of Wisconsin - Madison Interactive Analysis of Word Vector Embeddings
  • 3. Summary: Word vector embeddings offer unique challenges Task analysis of needs 3 Designs for unmet needs Buddy Plots Concept Axis Plots Co-occurance Matrices
  • 4. What is an embedding? General mathematics: Place a smaller structure into a larger structure Computer science: Place a discrete set of objects into a vector space Encode relationships between objects Paris 4,8,1,3,… Atlanta 5,2,1,7,…New York 4,8,1,3,… London 5,2,1,7,… Boston 3,2,5,1,… Tokyo 9,2,6,4,… Beijing 7,3,2,7,… San Jose 5,2,1,7,… Jakarta 3,2,5,1,… Sydney 9,2,6,4,… Munich 7,3,2,7,… High Dimensional Data Objects have associated Vectors
  • 5. Kinds of relationships in embeddings Distance A is closer to B than to C B C D A E Linear Structure A is to B as C is to D Semantic Directions A is more X than C B C D AF F E B A C D Relationships are interesting even if global positions are not
  • 6. Word vector embeddings Place words in a high-dimensional vector space Words similar in meaning should be close in space Infer similarity by distributional semantics: similar context implies similar meaning Construct embeddings by processing a corpus of text my pet cat is brown my pet dog is brown my big car is brown
  • 7. Several ways to build embeddings Word2Vec Skip-gram model Neural embedding GLoVE Co-occurrence model Factor matrix by optimization
  • 8. Why use Word Vector Embeddings? Learn about Language or Corpora (Texts) Find similar words/synonyms Track changes of word usage Exploring polysemy Creating lexical resources Evidence of bias … Natural Language Applications Pre-Processing Translation Sentiment Analysis Interpretation …
  • 9. Challenges of Word Vector Embeddings Large numbers of Words High-dimensional spaces Complex relationships – meaningless positions Complex processes for building embeddings Complex downstream applications No ground truth - subjective aspects
  • 10. Why visualization? Tasks are inherently human-centric Variety of tasks involving interpretation Linguistic and domain knowledge for applications But what are those tasks?
  • 11. Task Analysis: What do people do with Word Embeddings? Literature survey 111 papers from diverse communities Consider use cases in Linguistics, HCI, Digitial Humanities, etc. Augment this list by extrapolation what tasks would users want – but aren’t doing yet
  • 12. Use cases suggest tasks Learn about Language or Corpora (Texts) Find similar words/synonyms Track changes of word usage Exploring polysemy Creating lexical resources Natural Language Applications Pre-Processing Translation Sentiment Analysis Interpretation Evaluation Intrinsic (good embedding?) Extrinsic (applications success?) Interpretation Identify items of interest Probe values of interest
  • 13. Linguistic Tasks and Characteristics We identified 7 distinct linguistic tasks within the literature Two ways those tasks are relevant • Automatically test embeddings through human-curated resources • Users probe embedding 4 characteristics pertinent to those tasks: similarity, arithmetic structures, concept axis, and co-occurrences 13
  • 14. Prior Work: Word Vector Embedding Visualizations Liu et al., InfoVis 2017 Analogy Relationships Smilkov et al., 2016 Distance Relationships Rong and Adar, 2016 Training Process
  • 15. Tasks and Characteristics vs. Visualizations Rank word pairs similarity View neighbors similarity Select synonyms similarity Compare concepts average, similarity Find analogies offset, similarity Project on Concept concept axis Predict Contexts co-oc. probability Smilkov, et al 2016 Liu, et al 2017
  • 16. Tasks and Characteristics vs. Visualizations Rank word pairs similarity View neighbors similarity Select synonyms similarity Compare concepts average, similarity Find analogies offset, similarity Project on Concept concept axis Predict Contexts co-oc. probability Smilkov, et al 2016 Liu, et al 2017 Tasks we seek designs for in this paper
  • 17. Buddy Plots Concept Axis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 18. Buddy Plots Concept Axis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 19. Similarities: Understanding local distances Distances are meaningful even if absolute values are not What is close to a word? Are there groups of words that are similar? Ordered lists are useful Density (how many can you show) Sense of relative distances Comparison between words Embedding Projector Smilkov, et al. 2016
  • 20. Buddy Plots (1D lists) Alexander and Gleicher, 2016 – for Topic Models Map distance (to selected reference) to horizontal axis Reference object Closest to ref
  • 21. Not dimensionality reduction? Map distance to word to horizontal axis Focus on a single point – other relations not preserved ?
  • 23. Stacked/Chained buddy plots Use color to encode distance
  • 24. Stacked/Chained buddy plots Use color to encode distance in the reference row A AB B
  • 25. Stacked/Chained buddy plots Use color to encode distance in the reference
  • 26.
  • 27. Same word… different embedding Broadcast
  • 28. Buddy Plots Concept Axis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 29. Why are words similar? Understandng word co-occurence Similarity based on co-occurrence count how often one word occurs near another Co-occurrence matrix main form of input data many models approximate the matrix (reconstruct) Useful for understanding and diagnosing models
  • 30. Co-occurrence view How do we view the massive matrix? color encoding (heat map) – density, relative values select rows (specify words of interest) select columns (metrics of interestingness – given rows) highest values highest variance
  • 32. Matrix comparison view High variance words in one embedding may be low in the other
  • 33. Buddy Plots Concept Axis Plots Co-occurance Matrices 1. Similarities (local distances) 2. Co-Occurrances 3. Concept Axes
  • 34. Concept Axes: Understanding Semantic Directions Opposing concepts make an axis Food Pet
  • 35. Concept Axes: Understanding Concept Axes Define an axis from one concept to another Food Pet Dog Cat Goldfish Chicken Cow Trout
  • 36. Ways to define axes Vector between two concepts Interaxis - Kim et al., 2015 Classifier between two groups Explainers – Gleicher, 2013 Food Pet Dog Cat Goldfish Chicken Cow Trout
  • 37. Multiple Concept Axes Use multi-variate plots 2D = Scatterplot Pet Food SmallBig
  • 40.
  • 42.
  • 43.
  • 46. Implementation http://graphics.cs.wisc.edu/Vis/EmbVis/ Everything runs on line (simplified interfaces) cloud version uses small models Python backend / D3 frontend
  • 47. Limitations Implementation Effectiveness Completeness Usability and Scalability Evaluation (of designs) More Tasks Identifying Probes Explicit Comparison Connection to (model) evaluation Feedback to model building
  • 48. Summary: Word vector embeddings offer unique challenges Task analysis of needs Designs for 3 unmet needs Buddy Plots Concept Axis Plots Co-occurance Matrices Acknowledgments This work is funded in part by NSF 1162037 and DARPA FA8750-17-2-0107 http://graphics.cs.wisc.edu/Vis/EmbVis/