SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
graph
product
KDD 2018
OpenTag: Open Attribute Value
Extraction From Product Profiles
Guineng Zheng*, Subhabrata MukherjeeΔ, Xin Luna DongΔ, FeiFei Li*
ΔAmazon.com, *University of Utah
1
graph
product
KDD 2018
Motivation
2
Alexa, what are
the flavors of nescafe?
Nescafe Coffee flavors include
caramel, mocha, vanilla, coconut,
cappuccino, original/regular,
decaf, espresso, and cafe au lait
decaf.
graph
product
KDD 2018
Attribute value extraction from product profiles
3
Flavor Brand
graph
product
KDD 2018
Characteristics of Attribute Extraction
Open World Assumption
• No Predefined Attribute Value
• New Attribute Value Discovery
Limited semantics, irregular syntax
• Most titles have 10-15 words
• Most bullets have 5-6 words
• Phrases not Sentences
• Lack of regular grammatical
structure in titles and bullets
• Attribute stacking
1. Rachael Ray Nutrish Just 6 Natural Dry Dog
Food, Lamb Meal & Brown Rice Recipe
2. Lamb Meal is the #1 Ingredient
1. beef flavor
2. lamb flavor
3. venison flavor
4
graph
product
KDD 2018
Prior Work and Our Contributions
5
Open World
Assumption
No Lexicon,
No Hand-crafted
Features
Active Learning
Ghani et al. 2003, Putthividhya et al.
2011, Ling et al. 2012, Petrovski et
al. 2017
Huang et al. 2015, Kozareva et al.
2016
Kozareva et al. 2016, Lample et al.
2016, Ma et al. 2016
OpenTag (this work)
graph
product
KDD 2018
Outline
6
• Problem Definition
• Models
• Experiments
• Active Learning
• Experiments
graph
product
KDD 2018
Recap: Problem Statement
7
Input Product Profile Output Extractions
Title Description Bullets Flavor Brand …
CESAR Canine
Cuisine Variety
Pack Filet Mignon
& Porterhouse
Steak Dog Food
(Two 12-Count
Cases)
A Delectable Meaty Meal
for a Small Canine Looking
for the right food … This
delicious dog treat
contains tender slices of
meat in gravy and is
formulated to meet the
nutritional needs of small
dogs.
• Filet Mignon
Flavor;
• Porterhouse Steak
Flavor;
• CESAR Canine
Cuisine provides
complete and
balanced nutrition
…
1.filet mignon
2.porterhouse
steak
cesar
canine
cuisine
Given product profiles (e.g., titles, descriptions, bullets) and a set of
attributes: extract values of attributes from profile texts
graph
product
KDD 2018
Attribute Extraction as Sequence Tagging
w1 w2 w3 w4 w5
x
w6
beef meal & ranch raised lamb
w7
recipe
B I O E B I O E B I O E B I O E B I O E B I O E B I O E y
t1 t2 t3 t4 t5 t6 t7
B
I
O
E
Beginning of attribute value
Inside of attribute value
Outside of attribute value
End of attribute value
x={w1,w2,…,wn} input sequence
y={t1,t2,…,tn} tagging decision
{beef meal} {ranch raise lamb}Flavor Extractions
8
OUTPUT
INPUT
graph
product
KDD 2018
Outline
• Introduction
• Models
• BiLSTM
• BiLSTM + CRF
• Attention Mechanism
• OpenTag Architecture
• Active Learning
9
graph
product
KDD 2018
OpenTag Architecture
10
graph
product
KDD 2018
OpenTag Architecture (1/4): Word Embedding
11
Map ‘beef’, ‘chicken’, ‘pork’ to nearby points in Flavor– embedding
space
graph
product
KDD 2018
OpenTag Architecture (2/4): Bidirectional LSTM
12
Capture long and short range dependencies in input sequence via
forward and backward hidden states
graph
product
KDD 2018
OpenTag Architecture (3/4): CRF
13
• Bi-LSTM captures dependency between token sequences, but not
between output tags
• Conditional Random Field (CRF) enforces tagging consistency
graph
product
KDD 2018
OpenTag Architecture (4/4): Attention
14
• Focus on important hidden concepts, downweight the rest => attention!
• Attention matrix A to attend to important BiLSTM hidden states (ht)
• αt,tʹ ∈ A captures importance of ht w.r.t. htʹ
• Attention-focused representation lt of token xt given by:
graph
product
KDD 2018
OpenTag Architecture
15
graph
product
KDD 2018
Experimental Discussions: Datasets
16
graph
product
KDD 2018
Results
17
Overall, OpenTag obtains
high F-score of 82.8%
graph
product
KDD 2018
Results
18
- Highest improvement in F-score of 5.3% over
BiLSTM-CRF for product descriptions
- However, less accurate than titles
graph
product
KDD 2018
OpenTag discovers new attribute-values not seen during
training with 82.4% F-score
19
No overlap in attribute value between train and test splits
graph
product
KDD 2018
Interpretability via Attention
20
graph
product
KDD 2018
OpenTag achieves better concept clustering
Distribution of word vectors before attention Distribution of word vectors after attention
21
graph
product
KDD 2018
Semantically related words come closer in the
embedding space
22
graph
product
KDD 2018
Outline
• Introduction
• Models
• BiLSTM
• BiLSTM + CRF
• Attention Mechanism
• OpenTag Architecture
• Active Learning
23
graph
product
KDD 2018
Active Learning: Motivation
• Annotating training data is expensive and time-consuming
• Does not scale to thousands of verticals with hundreds of
attributes and thousands of values in each domain
24
graph
product
KDD 2018
Active Learning (Settles, 2009)
25
• Query selection strategy like uncertainty sampling selects
sample with highest uncertainty for annotation
• Ignores difficulty in estimating individual tags
graph
product
KDD 2018
Tag Flip as Query Strategy
• Simulate a committee of OpenTag learners over multiple epochs
• Most informative sample => major disagreement among committee
members for tags of its tokens across epochs
• Use dropout mechanism for simulating committee of learners
26
duck , fillet mignon and ranch raised lamb flavor
B O B E O B I E O
B O B O O O O B O
Tag flips = 4
• Most informative sample has highest tag flips across all the epochs
graph
product
KDD 2018
Tag Flip (red) better than Uncertainty Sampling (blue)
TF v.v. LC on detergent data TF v.v. LC on multi extraction
27
graph
product
KDD 2018
OpenTag reduces burden of human annotation by 3.3x
Learning from scratch on detergent data Learning from scratch on multi extraction
28
• OpenTag requires only 500 training samples to obtain > 90% P-R
• Active learning brings it down to 150 training samples to match
similar performance
graph
product
KDD 2018
Production Impact
Previous Coverage of
Existing Production
System (%)
OpenTag
Coverage (%)
Increase in
Coverage (%)
Attribute_1 23 78 53
Attribute_2 21 72 45
Attribute_3 < 1 56 50
Attribute_4 < 1 49 48
29
graph
product
KDD 2018
Summary
• OpenTag models open world assumption (OWA), multi-word and multiple
attribute value extraction with sequence tagging
• Word embeddings + Bi-LSTM + CRF + attention
• OpenTag + Active learning reduces burden of human annotation (by 3.3x)
• Method of tag flip as query strategy
• Interpretability
• Better concept clustering, attention heatmap, etc.
30
graph
product
KDD 2018
Summary
• OpenTag models open world assumption (OWA), multi-word and multiple
attribute value extraction with sequence tagging
• Word embeddings + Bi-LSTM + CRF + attention
• OpenTag + Active learning reduces burden of human annotation (by 3.3x)
• Method of tag flip as query strategy
• Interpretability
• Better concept clustering, attention heatmap, etc.
31
Thank you for your attention!
graph
product
KDD 2018
Backup Slides
32
graph
product
KDD 2018
Word Embedding
• Map words co-occurring in a similar context to nearby points in
embedding space
• Pre-trained embeddings learn single representation for each word
• But ‘duck’ as a Flavor should have different embedding than ‘duck’ as a Brand
• OpenTag learns word embeddings conditioned on attribute-tags
33
graph
product
KDD 2018
Bi-directional LSTM
• LSTM (Hochreiter, 1997) capture long and short range dependencies between
tokens, suitable for modeling token sequences
• Bi-directional LSTM’s improve over LSTM’s capturing both forward (ft) and
backward (bt) states at each timestep ‘t’
• Hidden state ht at each timestep generated as: ht = !([bt, ft])
34
graph
product
KDD 2018
Bi-directional LSTM
w1
ranch raised beef flavor
e1 e2 e3 e4
f1 f2 f3 f4
b1 b2 b3 b4
h1 h2 h3 h4
B I O E B I O E B I O E B I O E
Word Index
Word Embedding
glove embedding 50
Forward LSTM
100 units
Backward LSTM
100 units
Hidden Vector
100+100=200 units
Cross Entropy Loss
w2 w3 w4
35
graph
product
KDD 2018
Conditional Random Fields (CRF)
• Bi-LSTM captures dependency between token sequences, but not
between output tags
• Likelihood of a token-tag being ‘E’ (end) or ‘I’ (intermediate) increases, if
the previous token-tag was ‘I’ (intermediate)
• Given an input sequence x = {x1,x2, …, xn} with tags y = {y1, y2, …, yn}:
linear-chain CRF models:
graph
product
KDD 2018
Bi-directional LSTM + CRF
w1
ranch
w2
raised
w3
beef
w4
flavor
e1 e2 e3 e4
f1 f2 f3 f4
b1 b2 b3 b4
h1 h2 h3 h4
B I O E B I O E B I O E B I O E
Word Index
Embedding
glove embedding 50
Forward LSTM
100 units
Backward LSTM
100 units
Hidden Vector
100+100=200 units
Cross Entropy Loss
CRF Conditional Random Fields
CRF feature space formed by Bi-
LSTM hidden states
37
graph
product
KDD 2018
Attention Mechanism
• Not all hidden states equally important for the CRF
• Focus on important concepts, downweight the rest => attention!
• Attention matrix A to attend to important BiLSTM hidden states (ht)
• αt,tʹ ∈ A captures importance of ht w.r.t. htʹ
• Attention-focused representation lt of token xt given by:
38
h1 h2 h3 h4
l1 l2 l3 l4
graph
product
KDD 2018
Final Classification
39
Maximize log-likelihood of joint distribution
Best possible tag sequence with highest conditional probability
CRF feature space formed by attention-focused
representation of hidden states
graph
product
KDD 2018
Uncertainty Sampling: Probability as Query Strategy
• Select instance with maximum uncertainty
• Best possible tag sequence from CRF:
• Label instance with maximum uncertainty:
• Considers entire label sequence y, ignores difficulty in estimating
individual tags yt ∈ y
40
graph
product
KDD 2018
Tag Flip as Query Strategy
• Most informative instance has maximum tag flips aggregated over all
of its tokens across all the epochs:
• Top B samples with the highest number of flips are manually
annotated with tags
duck , fillet mignon and ranch raised lamb flavor
B O B E O B I E O
B O B O O O O B O
Tag flips = 4
41
graph
product
KDD 2018
Multiple attribute values
• Predicting multiple attribute values jointly
• Modify tagging strategy to have separate tag-set {Ba, Ia, Oa, Ea} for
each attribute ‘a’
42

Mais conteúdo relacionado

Semelhante a OpenTag: Open Attribute Value Extraction From Product Profiles

Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collectionsBIOVIA
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaWei-Bo Chen
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingBigML, Inc
 
Jornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS EnglishJornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS Englishsabueso81
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and developmentWes McKinney
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoJoel Falcou
 
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Rainer Stropek
 
Serving predictive models with Redis
Serving predictive models with RedisServing predictive models with Redis
Serving predictive models with RedisTague Griffith
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Victor Rentea
 
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...Alessandro Confetti
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonMicrosoft
 

Semelhante a OpenTag: Open Attribute Value Extraction From Product Profiles (20)

Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON China
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
Ml5
Ml5Ml5
Ml5
 
Jornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS EnglishJornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS English
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.Proto
 
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
Coding Like the Wind - Tips and Tricks for the Microsoft Visual Studio 2012 C...
 
Serving predictive models with Redis
Serving predictive models with RedisServing predictive models with Redis
Serving predictive models with Redis
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
 
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 

Mais de Subhabrata Mukherjee

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language ModelSubhabrata Mukherjee
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesSubhabrata Mukherjee
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Subhabrata Mukherjee
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesSubhabrata Mukherjee
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsSubhabrata Mukherjee
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Subhabrata Mukherjee
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelSubhabrata Mukherjee
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Subhabrata Mukherjee
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilaritySubhabrata Mukherjee
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...Subhabrata Mukherjee
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviewsSubhabrata Mukherjee
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 

Mais de Subhabrata Mukherjee (18)

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
 
Fact Checking from Text
Fact Checking from TextFact Checking from Text
Fact Checking from Text
 
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language Model
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review Communities
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News Communities
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health Forums
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic Model
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 

Último

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Último (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

OpenTag: Open Attribute Value Extraction From Product Profiles

  • 1. graph product KDD 2018 OpenTag: Open Attribute Value Extraction From Product Profiles Guineng Zheng*, Subhabrata MukherjeeΔ, Xin Luna DongΔ, FeiFei Li* ΔAmazon.com, *University of Utah 1
  • 2. graph product KDD 2018 Motivation 2 Alexa, what are the flavors of nescafe? Nescafe Coffee flavors include caramel, mocha, vanilla, coconut, cappuccino, original/regular, decaf, espresso, and cafe au lait decaf.
  • 3. graph product KDD 2018 Attribute value extraction from product profiles 3 Flavor Brand
  • 4. graph product KDD 2018 Characteristics of Attribute Extraction Open World Assumption • No Predefined Attribute Value • New Attribute Value Discovery Limited semantics, irregular syntax • Most titles have 10-15 words • Most bullets have 5-6 words • Phrases not Sentences • Lack of regular grammatical structure in titles and bullets • Attribute stacking 1. Rachael Ray Nutrish Just 6 Natural Dry Dog Food, Lamb Meal & Brown Rice Recipe 2. Lamb Meal is the #1 Ingredient 1. beef flavor 2. lamb flavor 3. venison flavor 4
  • 5. graph product KDD 2018 Prior Work and Our Contributions 5 Open World Assumption No Lexicon, No Hand-crafted Features Active Learning Ghani et al. 2003, Putthividhya et al. 2011, Ling et al. 2012, Petrovski et al. 2017 Huang et al. 2015, Kozareva et al. 2016 Kozareva et al. 2016, Lample et al. 2016, Ma et al. 2016 OpenTag (this work)
  • 6. graph product KDD 2018 Outline 6 • Problem Definition • Models • Experiments • Active Learning • Experiments
  • 7. graph product KDD 2018 Recap: Problem Statement 7 Input Product Profile Output Extractions Title Description Bullets Flavor Brand … CESAR Canine Cuisine Variety Pack Filet Mignon & Porterhouse Steak Dog Food (Two 12-Count Cases) A Delectable Meaty Meal for a Small Canine Looking for the right food … This delicious dog treat contains tender slices of meat in gravy and is formulated to meet the nutritional needs of small dogs. • Filet Mignon Flavor; • Porterhouse Steak Flavor; • CESAR Canine Cuisine provides complete and balanced nutrition … 1.filet mignon 2.porterhouse steak cesar canine cuisine Given product profiles (e.g., titles, descriptions, bullets) and a set of attributes: extract values of attributes from profile texts
  • 8. graph product KDD 2018 Attribute Extraction as Sequence Tagging w1 w2 w3 w4 w5 x w6 beef meal & ranch raised lamb w7 recipe B I O E B I O E B I O E B I O E B I O E B I O E B I O E y t1 t2 t3 t4 t5 t6 t7 B I O E Beginning of attribute value Inside of attribute value Outside of attribute value End of attribute value x={w1,w2,…,wn} input sequence y={t1,t2,…,tn} tagging decision {beef meal} {ranch raise lamb}Flavor Extractions 8 OUTPUT INPUT
  • 9. graph product KDD 2018 Outline • Introduction • Models • BiLSTM • BiLSTM + CRF • Attention Mechanism • OpenTag Architecture • Active Learning 9
  • 11. graph product KDD 2018 OpenTag Architecture (1/4): Word Embedding 11 Map ‘beef’, ‘chicken’, ‘pork’ to nearby points in Flavor– embedding space
  • 12. graph product KDD 2018 OpenTag Architecture (2/4): Bidirectional LSTM 12 Capture long and short range dependencies in input sequence via forward and backward hidden states
  • 13. graph product KDD 2018 OpenTag Architecture (3/4): CRF 13 • Bi-LSTM captures dependency between token sequences, but not between output tags • Conditional Random Field (CRF) enforces tagging consistency
  • 14. graph product KDD 2018 OpenTag Architecture (4/4): Attention 14 • Focus on important hidden concepts, downweight the rest => attention! • Attention matrix A to attend to important BiLSTM hidden states (ht) • αt,tʹ ∈ A captures importance of ht w.r.t. htʹ • Attention-focused representation lt of token xt given by:
  • 17. graph product KDD 2018 Results 17 Overall, OpenTag obtains high F-score of 82.8%
  • 18. graph product KDD 2018 Results 18 - Highest improvement in F-score of 5.3% over BiLSTM-CRF for product descriptions - However, less accurate than titles
  • 19. graph product KDD 2018 OpenTag discovers new attribute-values not seen during training with 82.4% F-score 19 No overlap in attribute value between train and test splits
  • 21. graph product KDD 2018 OpenTag achieves better concept clustering Distribution of word vectors before attention Distribution of word vectors after attention 21
  • 22. graph product KDD 2018 Semantically related words come closer in the embedding space 22
  • 23. graph product KDD 2018 Outline • Introduction • Models • BiLSTM • BiLSTM + CRF • Attention Mechanism • OpenTag Architecture • Active Learning 23
  • 24. graph product KDD 2018 Active Learning: Motivation • Annotating training data is expensive and time-consuming • Does not scale to thousands of verticals with hundreds of attributes and thousands of values in each domain 24
  • 25. graph product KDD 2018 Active Learning (Settles, 2009) 25 • Query selection strategy like uncertainty sampling selects sample with highest uncertainty for annotation • Ignores difficulty in estimating individual tags
  • 26. graph product KDD 2018 Tag Flip as Query Strategy • Simulate a committee of OpenTag learners over multiple epochs • Most informative sample => major disagreement among committee members for tags of its tokens across epochs • Use dropout mechanism for simulating committee of learners 26 duck , fillet mignon and ranch raised lamb flavor B O B E O B I E O B O B O O O O B O Tag flips = 4 • Most informative sample has highest tag flips across all the epochs
  • 27. graph product KDD 2018 Tag Flip (red) better than Uncertainty Sampling (blue) TF v.v. LC on detergent data TF v.v. LC on multi extraction 27
  • 28. graph product KDD 2018 OpenTag reduces burden of human annotation by 3.3x Learning from scratch on detergent data Learning from scratch on multi extraction 28 • OpenTag requires only 500 training samples to obtain > 90% P-R • Active learning brings it down to 150 training samples to match similar performance
  • 29. graph product KDD 2018 Production Impact Previous Coverage of Existing Production System (%) OpenTag Coverage (%) Increase in Coverage (%) Attribute_1 23 78 53 Attribute_2 21 72 45 Attribute_3 < 1 56 50 Attribute_4 < 1 49 48 29
  • 30. graph product KDD 2018 Summary • OpenTag models open world assumption (OWA), multi-word and multiple attribute value extraction with sequence tagging • Word embeddings + Bi-LSTM + CRF + attention • OpenTag + Active learning reduces burden of human annotation (by 3.3x) • Method of tag flip as query strategy • Interpretability • Better concept clustering, attention heatmap, etc. 30
  • 31. graph product KDD 2018 Summary • OpenTag models open world assumption (OWA), multi-word and multiple attribute value extraction with sequence tagging • Word embeddings + Bi-LSTM + CRF + attention • OpenTag + Active learning reduces burden of human annotation (by 3.3x) • Method of tag flip as query strategy • Interpretability • Better concept clustering, attention heatmap, etc. 31 Thank you for your attention!
  • 33. graph product KDD 2018 Word Embedding • Map words co-occurring in a similar context to nearby points in embedding space • Pre-trained embeddings learn single representation for each word • But ‘duck’ as a Flavor should have different embedding than ‘duck’ as a Brand • OpenTag learns word embeddings conditioned on attribute-tags 33
  • 34. graph product KDD 2018 Bi-directional LSTM • LSTM (Hochreiter, 1997) capture long and short range dependencies between tokens, suitable for modeling token sequences • Bi-directional LSTM’s improve over LSTM’s capturing both forward (ft) and backward (bt) states at each timestep ‘t’ • Hidden state ht at each timestep generated as: ht = !([bt, ft]) 34
  • 35. graph product KDD 2018 Bi-directional LSTM w1 ranch raised beef flavor e1 e2 e3 e4 f1 f2 f3 f4 b1 b2 b3 b4 h1 h2 h3 h4 B I O E B I O E B I O E B I O E Word Index Word Embedding glove embedding 50 Forward LSTM 100 units Backward LSTM 100 units Hidden Vector 100+100=200 units Cross Entropy Loss w2 w3 w4 35
  • 36. graph product KDD 2018 Conditional Random Fields (CRF) • Bi-LSTM captures dependency between token sequences, but not between output tags • Likelihood of a token-tag being ‘E’ (end) or ‘I’ (intermediate) increases, if the previous token-tag was ‘I’ (intermediate) • Given an input sequence x = {x1,x2, …, xn} with tags y = {y1, y2, …, yn}: linear-chain CRF models:
  • 37. graph product KDD 2018 Bi-directional LSTM + CRF w1 ranch w2 raised w3 beef w4 flavor e1 e2 e3 e4 f1 f2 f3 f4 b1 b2 b3 b4 h1 h2 h3 h4 B I O E B I O E B I O E B I O E Word Index Embedding glove embedding 50 Forward LSTM 100 units Backward LSTM 100 units Hidden Vector 100+100=200 units Cross Entropy Loss CRF Conditional Random Fields CRF feature space formed by Bi- LSTM hidden states 37
  • 38. graph product KDD 2018 Attention Mechanism • Not all hidden states equally important for the CRF • Focus on important concepts, downweight the rest => attention! • Attention matrix A to attend to important BiLSTM hidden states (ht) • αt,tʹ ∈ A captures importance of ht w.r.t. htʹ • Attention-focused representation lt of token xt given by: 38 h1 h2 h3 h4 l1 l2 l3 l4
  • 39. graph product KDD 2018 Final Classification 39 Maximize log-likelihood of joint distribution Best possible tag sequence with highest conditional probability CRF feature space formed by attention-focused representation of hidden states
  • 40. graph product KDD 2018 Uncertainty Sampling: Probability as Query Strategy • Select instance with maximum uncertainty • Best possible tag sequence from CRF: • Label instance with maximum uncertainty: • Considers entire label sequence y, ignores difficulty in estimating individual tags yt ∈ y 40
  • 41. graph product KDD 2018 Tag Flip as Query Strategy • Most informative instance has maximum tag flips aggregated over all of its tokens across all the epochs: • Top B samples with the highest number of flips are manually annotated with tags duck , fillet mignon and ranch raised lamb flavor B O B E O B I E O B O B O O O O B O Tag flips = 4 41
  • 42. graph product KDD 2018 Multiple attribute values • Predicting multiple attribute values jointly • Modify tagging strategy to have separate tag-set {Ba, Ia, Oa, Ea} for each attribute ‘a’ 42