SlideShare uma empresa Scribd logo
1 de 22
Building Structured Data from
Product Descriptions
Keiji Shinzato
Product information extraction

An Italian product. This is a fruity
red wine that mainly consists of
sangiovese grapes of Tuscany.

Type

Red

Grape
variety

Sangiovese

Region

Italy,
Tuscany
2
Background

• Structured data play a crucial role for
making Rakuten more attractive service.
– Faceted navigation, recommendation, and
market analysis.

ベリンダ・コーリー キアンティ
2011 750ml
トスカーナ州 キャ
ンティ地区のサン
ジョベーゼ種を主
体につくられる、
イタリアを代表す
る赤ワインの一つ。

Attribute

Value

Type

赤

Region

イタリア,
トスカーナ州キャンティ
地区

Grape

サンジョベーゼ

Vintage

2011

3
Faceted navigation

Reference: http://www.amazon.com/
4
Background

• Structured data play a crucial role for
making Rakuten more attractive service.
– Faceted navigation, recommendation, and
market analysis.

• Unsupervised methodology is required.
– 100 million products / 40,000 categories.
ベリンダ・コーリー キアンティ
2011 750ml
トスカーナ州 キャ
ンティ地区のサン
ジョベーゼ種を主
体につくられる、
イタリアを代表す
る赤ワインの一つ。

Attribute

Value

Type

赤

Region

イタリア,
トスカーナ州キャンティ
地区

Grape

サンジョベーゼ

Vintage

2011

5
Table is an useful clue, but…
WINE > CHILE

WINE > CHILE

Montes Alpha M 2009

Montes Alpha M 2009

Type

Red

Region

Chile

38%

Grape

Cabernet
sauvignon,
Merlot,
Cabernet franc,
Petit verdot

Year

2009

Product page including a table

Montes Alpha M is a blend
of Cabernet
Sauvignon, Merlot, Cabern
et Franc, and Petit Verdot.
A powerful wine with very
good level of soft and
rounded tannins. Intense
dark red color. The wine is
elegant and has a …
Product page consists of
sentences
6
Product information extraction
WINE > CHILE

Montes Alpha M 2009
Montes Alpha M is a blend
of Cabernet Sauvignon,
Merlot, Cabernet Franc,
and Petit Verdot.
A powerful wine with very
good level of soft and
rounded tannins. Intense
dark red color. The wine is
elegant and has a very
well defined character. …

Product page (unstructured)

Attribute

Value

Type

Red

Region

Chile

Grape

Cabernet sauvignon,
Merlot,
Cabernet franc,
Petit verdot

Vintage

2009

Company

Montes

Structured data

• Issue1: How do we know attributes for a category ??
• Issue2: How do we extract attribute values from full
texts ??
7
Attribute name collection
Analyze a large amount of table data
for collecting attributes of an object

Attribute values
Attribute names
of Wine

Reference: http://item.rakuten.co.jp/redbox/odm3000728/
8
Attribute value database (wine)
ぶどう品種
(Grape
variety)

内容量
(Volume)

産地
(Region)

生産者
(Winery)

味わい
(Taste)

Chardonnay

750ML

France

Farnese

Dry

Chardonnay
100%

720ML

Italy

Mas de
Monistrol

Full body

Merlot

375ML

Spain

Leroy

Medium body

Riesling

500ML

Chile

M. Chapoutier

Slightly sweet

Syrah

1500ML

German

Mastroberardino

Sweet

Grenache

360ML

Australia

Santero

Medium dry

Merlot

200ML

America

Saltarelli

Extremely sweet

Tempranillo

3000ML

Bordeaux

Cavicchioli

Medium dry

Sangiovese

1800ML

Champagne

Fontodi

Red Full body

Syrah100%

1000ML

Argentina

Ca'Rugate

Middle sweet

Precision is high, but coverage is low.
9
Product information extraction
WINE > CHILE

Montes Alpha M 2009
Montes Alpha M is a blend
of Cabernet Sauvignon,
Merlot, Cabernet Franc,
and Petit Verdot.
A powerful wine with very
good level of soft and
rounded tannins. Intense
dark red color. The wine is
elegant and has a very
well defined character. …

Product page (unstructured)

Attribute

Value

Type

Red

Region

Chile

Grape

Cabernet sauvignon,
Merlot,
Cabernet franc,
Petit verdot

Vintage

2009

Company

Montes

Structured data

• Issue1: How do we know attributes for each category ??
• Issue2: How do we extract attribute values from product
descriptions ??
10
Unsupervised attribute value extraction
- distant supervision approach Semi-structured data

Generation
Chateau d’Issan 1994

Construction
Database
:
<Region, Margaux>
<Color, White>
:

This is a wine
from Margaux.
...

Annotation

Rule
wine from x
⇒ x is a Region
Rule is generated
through machine
learning algorithm.

Product page including
entries in the database
11
Corpus with attribute-value annotations (wine)
• <産地>アルザス</産地>で最も香り豊かと言われるスパイシーで華やかなワイ
J:

E: ン。
A spicy and gorgeous wine that is known as the richest aroma one in

J: <production_area> Alsace </production_area>.
•

最もお手頃で、<生産者>ドメーヌ・ペゴー</生産者>の美味しさを気軽に楽し

E: める、とっても嬉しい一本なのです
This is a very nice wine because we can easily enjoy the taste of <winery>

J: Domaine Pegau </winery> at the best price.
• <ぶどう品種>ソーヴィニヨン・ブラン</ぶどう品種>種の特長がよく表れたワ
E:

J: イン。
A wine that <grape_variety> Sauvignon Blanc </grape_variety> was well

E: featured.
•

<タイプ>白</タイプ>身魚の塩焼きやシンプルな味付けのソテー、焼き牡蠣、

豚のしょうが焼き、ボンゴレビアンコなどと。

12
Unsupervised attribute value extraction
- distant supervision approach Semi-structured data

Generation
Chateau d’Issan 1994

Construction
Database
:
<Region, Margaux>
<Color, White>
:

This is a wine
from Margaux.
...

Annotation

Rule
wine from x
⇒ x is a Region
Rule is generated
through machine
learning algorithm.

Product page including
entries in the database
13
Extraction rule generation
• Algorithm: Conditional random fields [Lafferty+ 2001]
• Chunk tag: Start/End (IOBES) model [Sekine+ 1998]
• Features:
–
–
–
–
–
–
–

Token: Surface form of the token.
Base: Base form of the token.
PoS: Part-of-Speech tag of the token.
Char. type: Types of characters in the token.
Prefix: Double character prefix of the token.
Suffix: Double character suffix of the token.
The above features of ±3 tokens surrounding the token.

They are frequently employed in the task of Japanese
named entity recognition.
14
Unsupervised attribute value extraction
- distant supervision approach Semi-structured data

Generation
Chateau d’Issan 1994

Construction
Database
:
<Region, Margaux>
<Color, White>
:

This is a wine
from Margaux.
...

Annotation

Rule
wine from x
⇒ x is a Region
Rule is generated
through machine
learning algorithm.

Product page including
entries in the database
15
Unsupervised attribute value extraction
- distant supervision approach Terre di matraja
Bianco 2012

Apply
Rule
wine from x
⇒ x is a Region

This is a wine
from Tuscany.
...

Rule

1800 < x <= 2013
⇒ x is a Vintage

Attribute
Region
Vintage
Grape

Value
Tuscany
2012
Chardonnay
16
Performance (F-score)

Without ML
With ML

43.8 pt.
60.1pt.
Wine

24.1pt.
71.5 pt.
Shampoo
17
Wine / Japanese

An Italian product. This is a fruity
red wine that mainly consists of
sangiovese grapes of Tuscany.

Type

Red

Grape
variety

Sangiovese

Region

Italy,
Tuscany
18
Shampoo / Japanese

``MCH Natural shampoo 1000ml’’ is a shampoo
consisting of cypress oil and charcoal.
Category
Product
name

Shampoo
MCH Natural shampoo
1000ml

Ingredient

Cypress oil,
Charcoal

19
Video game / French

Product
type
Saga

Nintendo 64,
Nintendo DS
Mario

20
Conclusion
• Developing a technique for extracting product
information from unstructured data.
– Independent of any category and language.

• Useful services can be realized on structured
product data.
• Our paper is available on the web.
– ACL anthology: http://aclweb.org/anthology//I/I13/

21
Thank you for listing !

22

Mais conteúdo relacionado

Semelhante a Extracting Structured Product Data from Descriptions

Wine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAUWine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAUSarita Maharia
 
Riondo winemaker's tasting notes
Riondo winemaker's tasting notesRiondo winemaker's tasting notes
Riondo winemaker's tasting notesRiondo USA
 
Italy and Spain Oct 13th.
Italy and Spain Oct 13th.Italy and Spain Oct 13th.
Italy and Spain Oct 13th.FIUWINETECH
 
2011 Foundation Wine Course 3: Rest of the Old World
2011 Foundation Wine Course 3: Rest of the Old World2011 Foundation Wine Course 3: Rest of the Old World
2011 Foundation Wine Course 3: Rest of the Old WorldLynn Wilkinson
 
October 24th, 2016
October 24th, 2016October 24th, 2016
October 24th, 2016Joost Röben
 
Argentine wines by viners club
Argentine wines by viners clubArgentine wines by viners club
Argentine wines by viners clubVinersClub
 
( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure
( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure ( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure
( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure Phung Hoang
 
wine and grape with france regions.......
wine and grape with france regions.......wine and grape with france regions.......
wine and grape with france regions.......vikas dobhal
 
The vineyards of bergerac france
The vineyards of bergerac franceThe vineyards of bergerac france
The vineyards of bergerac franceBruno Vuillemin
 
International market japan
International market   japanInternational market   japan
International market japanElodie Dubois
 
Reethi Wine List 01 March 2011
Reethi Wine List 01 March  2011Reethi Wine List 01 March  2011
Reethi Wine List 01 March 2011Aravindan VR
 
Wine Beginer Course Jan06 Chinese English
Wine Beginer Course Jan06 Chinese EnglishWine Beginer Course Jan06 Chinese English
Wine Beginer Course Jan06 Chinese Englishlinkcd
 

Semelhante a Extracting Structured Product Data from Descriptions (19)

Wine of italy
Wine of italyWine of italy
Wine of italy
 
WINES OF ITALY.pptx
WINES OF ITALY.pptxWINES OF ITALY.pptx
WINES OF ITALY.pptx
 
Italian wine
Italian wine Italian wine
Italian wine
 
Italian wines
Italian winesItalian wines
Italian wines
 
Wine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAUWine Data Analysis using R, SQL and TABLEAU
Wine Data Analysis using R, SQL and TABLEAU
 
24 10-12 presentation vca marco tiggelman
24 10-12 presentation vca marco tiggelman24 10-12 presentation vca marco tiggelman
24 10-12 presentation vca marco tiggelman
 
Riondo winemaker's tasting notes
Riondo winemaker's tasting notesRiondo winemaker's tasting notes
Riondo winemaker's tasting notes
 
Italy and Spain Oct 13th.
Italy and Spain Oct 13th.Italy and Spain Oct 13th.
Italy and Spain Oct 13th.
 
Italian bologna Wine
Italian bologna WineItalian bologna Wine
Italian bologna Wine
 
2011 Foundation Wine Course 3: Rest of the Old World
2011 Foundation Wine Course 3: Rest of the Old World2011 Foundation Wine Course 3: Rest of the Old World
2011 Foundation Wine Course 3: Rest of the Old World
 
October 24th, 2016
October 24th, 2016October 24th, 2016
October 24th, 2016
 
Toschi Book 4.2012 Email
Toschi Book 4.2012  EmailToschi Book 4.2012  Email
Toschi Book 4.2012 Email
 
Argentine wines by viners club
Argentine wines by viners clubArgentine wines by viners club
Argentine wines by viners club
 
( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure
( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure ( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure
( Domaines Barons de Rothschild (Lafite) Wines Vietnam Brochure
 
wine and grape with france regions.......
wine and grape with france regions.......wine and grape with france regions.......
wine and grape with france regions.......
 
The vineyards of bergerac france
The vineyards of bergerac franceThe vineyards of bergerac france
The vineyards of bergerac france
 
International market japan
International market   japanInternational market   japan
International market japan
 
Reethi Wine List 01 March 2011
Reethi Wine List 01 March  2011Reethi Wine List 01 March  2011
Reethi Wine List 01 March 2011
 
Wine Beginer Course Jan06 Chinese English
Wine Beginer Course Jan06 Chinese EnglishWine Beginer Course Jan06 Chinese English
Wine Beginer Course Jan06 Chinese English
 

Mais de Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャーRakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 

Mais de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Extracting Structured Product Data from Descriptions

  • 1. Building Structured Data from Product Descriptions Keiji Shinzato
  • 2. Product information extraction An Italian product. This is a fruity red wine that mainly consists of sangiovese grapes of Tuscany. Type Red Grape variety Sangiovese Region Italy, Tuscany 2
  • 3. Background • Structured data play a crucial role for making Rakuten more attractive service. – Faceted navigation, recommendation, and market analysis. ベリンダ・コーリー キアンティ 2011 750ml トスカーナ州 キャ ンティ地区のサン ジョベーゼ種を主 体につくられる、 イタリアを代表す る赤ワインの一つ。 Attribute Value Type 赤 Region イタリア, トスカーナ州キャンティ 地区 Grape サンジョベーゼ Vintage 2011 3
  • 5. Background • Structured data play a crucial role for making Rakuten more attractive service. – Faceted navigation, recommendation, and market analysis. • Unsupervised methodology is required. – 100 million products / 40,000 categories. ベリンダ・コーリー キアンティ 2011 750ml トスカーナ州 キャ ンティ地区のサン ジョベーゼ種を主 体につくられる、 イタリアを代表す る赤ワインの一つ。 Attribute Value Type 赤 Region イタリア, トスカーナ州キャンティ 地区 Grape サンジョベーゼ Vintage 2011 5
  • 6. Table is an useful clue, but… WINE > CHILE WINE > CHILE Montes Alpha M 2009 Montes Alpha M 2009 Type Red Region Chile 38% Grape Cabernet sauvignon, Merlot, Cabernet franc, Petit verdot Year 2009 Product page including a table Montes Alpha M is a blend of Cabernet Sauvignon, Merlot, Cabern et Franc, and Petit Verdot. A powerful wine with very good level of soft and rounded tannins. Intense dark red color. The wine is elegant and has a … Product page consists of sentences 6
  • 7. Product information extraction WINE > CHILE Montes Alpha M 2009 Montes Alpha M is a blend of Cabernet Sauvignon, Merlot, Cabernet Franc, and Petit Verdot. A powerful wine with very good level of soft and rounded tannins. Intense dark red color. The wine is elegant and has a very well defined character. … Product page (unstructured) Attribute Value Type Red Region Chile Grape Cabernet sauvignon, Merlot, Cabernet franc, Petit verdot Vintage 2009 Company Montes Structured data • Issue1: How do we know attributes for a category ?? • Issue2: How do we extract attribute values from full texts ?? 7
  • 8. Attribute name collection Analyze a large amount of table data for collecting attributes of an object Attribute values Attribute names of Wine Reference: http://item.rakuten.co.jp/redbox/odm3000728/ 8
  • 9. Attribute value database (wine) ぶどう品種 (Grape variety) 内容量 (Volume) 産地 (Region) 生産者 (Winery) 味わい (Taste) Chardonnay 750ML France Farnese Dry Chardonnay 100% 720ML Italy Mas de Monistrol Full body Merlot 375ML Spain Leroy Medium body Riesling 500ML Chile M. Chapoutier Slightly sweet Syrah 1500ML German Mastroberardino Sweet Grenache 360ML Australia Santero Medium dry Merlot 200ML America Saltarelli Extremely sweet Tempranillo 3000ML Bordeaux Cavicchioli Medium dry Sangiovese 1800ML Champagne Fontodi Red Full body Syrah100% 1000ML Argentina Ca'Rugate Middle sweet Precision is high, but coverage is low. 9
  • 10. Product information extraction WINE > CHILE Montes Alpha M 2009 Montes Alpha M is a blend of Cabernet Sauvignon, Merlot, Cabernet Franc, and Petit Verdot. A powerful wine with very good level of soft and rounded tannins. Intense dark red color. The wine is elegant and has a very well defined character. … Product page (unstructured) Attribute Value Type Red Region Chile Grape Cabernet sauvignon, Merlot, Cabernet franc, Petit verdot Vintage 2009 Company Montes Structured data • Issue1: How do we know attributes for each category ?? • Issue2: How do we extract attribute values from product descriptions ?? 10
  • 11. Unsupervised attribute value extraction - distant supervision approach Semi-structured data Generation Chateau d’Issan 1994 Construction Database : <Region, Margaux> <Color, White> : This is a wine from Margaux. ... Annotation Rule wine from x ⇒ x is a Region Rule is generated through machine learning algorithm. Product page including entries in the database 11
  • 12. Corpus with attribute-value annotations (wine) • <産地>アルザス</産地>で最も香り豊かと言われるスパイシーで華やかなワイ J: E: ン。 A spicy and gorgeous wine that is known as the richest aroma one in J: <production_area> Alsace </production_area>. • 最もお手頃で、<生産者>ドメーヌ・ペゴー</生産者>の美味しさを気軽に楽し E: める、とっても嬉しい一本なのです This is a very nice wine because we can easily enjoy the taste of <winery> J: Domaine Pegau </winery> at the best price. • <ぶどう品種>ソーヴィニヨン・ブラン</ぶどう品種>種の特長がよく表れたワ E: J: イン。 A wine that <grape_variety> Sauvignon Blanc </grape_variety> was well E: featured. • <タイプ>白</タイプ>身魚の塩焼きやシンプルな味付けのソテー、焼き牡蠣、 豚のしょうが焼き、ボンゴレビアンコなどと。 12
  • 13. Unsupervised attribute value extraction - distant supervision approach Semi-structured data Generation Chateau d’Issan 1994 Construction Database : <Region, Margaux> <Color, White> : This is a wine from Margaux. ... Annotation Rule wine from x ⇒ x is a Region Rule is generated through machine learning algorithm. Product page including entries in the database 13
  • 14. Extraction rule generation • Algorithm: Conditional random fields [Lafferty+ 2001] • Chunk tag: Start/End (IOBES) model [Sekine+ 1998] • Features: – – – – – – – Token: Surface form of the token. Base: Base form of the token. PoS: Part-of-Speech tag of the token. Char. type: Types of characters in the token. Prefix: Double character prefix of the token. Suffix: Double character suffix of the token. The above features of ±3 tokens surrounding the token. They are frequently employed in the task of Japanese named entity recognition. 14
  • 15. Unsupervised attribute value extraction - distant supervision approach Semi-structured data Generation Chateau d’Issan 1994 Construction Database : <Region, Margaux> <Color, White> : This is a wine from Margaux. ... Annotation Rule wine from x ⇒ x is a Region Rule is generated through machine learning algorithm. Product page including entries in the database 15
  • 16. Unsupervised attribute value extraction - distant supervision approach Terre di matraja Bianco 2012 Apply Rule wine from x ⇒ x is a Region This is a wine from Tuscany. ... Rule 1800 < x <= 2013 ⇒ x is a Vintage Attribute Region Vintage Grape Value Tuscany 2012 Chardonnay 16
  • 17. Performance (F-score) Without ML With ML 43.8 pt. 60.1pt. Wine 24.1pt. 71.5 pt. Shampoo 17
  • 18. Wine / Japanese An Italian product. This is a fruity red wine that mainly consists of sangiovese grapes of Tuscany. Type Red Grape variety Sangiovese Region Italy, Tuscany 18
  • 19. Shampoo / Japanese ``MCH Natural shampoo 1000ml’’ is a shampoo consisting of cypress oil and charcoal. Category Product name Shampoo MCH Natural shampoo 1000ml Ingredient Cypress oil, Charcoal 19
  • 20. Video game / French Product type Saga Nintendo 64, Nintendo DS Mario 20
  • 21. Conclusion • Developing a technique for extracting product information from unstructured data. – Independent of any category and language. • Useful services can be realized on structured product data. • Our paper is available on the web. – ACL anthology: http://aclweb.org/anthology//I/I13/ 21
  • 22. Thank you for listing ! 22