SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Distributed Representations of Words and
Phrases and their Compositionally
長岡技術科学大学 自然言語処理研究室
高橋寛治
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013).
Distributed Representations of Words and Phrases and their
Compositionality. Advances in Neural Information Processing
Systems 26 (NIPS 2013)
「word2vecによる自然言語処理」の図を利用
文献紹介 2016年4月13日
概要
•MikolovらのWord2vecの論文
•前のモデルと比べ、計算が早くなり高精度化
•フレーズも考慮
Ø“Canada”と“Air”→”Air Canada”
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
はじめに
•ベクトルによる単語の表現は1986年から研究
•Mikolovら(2013)がSkip-gram modelを提案
•vec(“Madrid”) – vec(“Spain”) + vec(“France”)
≒ vec(“Paris”)
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
Skip-gramモデル Mikolov(2013)
Distributed	Representations	of	Words	and	Phrases	and	their	
Compositionally
•入力単語の文脈中の単語
を推定
•これを拡張
Skip-gramモデル
•単語列w1,w2,w3…wT,文脈サイズc
•W(105~107)が大きすぎて計算は非現実的
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
階層的ソフトマックス
•グループ化し計算を省略
•語彙数Nの場合、O(logN)に削減
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
ハフマン符号を利用
ネガティブサンプリング
•ランダムに5個ぐらい
偽の入力
•不正解ニューロンを選
ぶ確率は単語の出現確
率の3/4乗にする
Distributed	Representations	of	Words	and	Phrases	and	their	
Compositionally
高頻度語のサブサンプリング
•“in”, “the”, “a”などの頻出語をサブサンプリング
•f(wi)は単語wiの相対頻度
•t(スレッショルド)は10-5
•高頻度語がよく間引かれる
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
実験結果
•類推タスク
Øvec(“Berlin”)-vec(“Germany”)+vec(“France”)が
vec(“Paris”)かどうか
•NEG-15が良い
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
複合語の学習
•複合語は単純な意味の合算ではない
•δは割引係数
•ユニグラムとバイグラムでスコアを計算
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
複合語タスクと結果
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
類推タスクの例
結果
語構成の確認
•単純なベクトル計算による構成
•ANDのような振る舞い
Ø似た文脈で同様の単語列が現れるから、似たベクト
ルと考えられる
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
他の分散表現との比較
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally
300億単語で学習
まとめ
•Skip-gramモデルによる単語・複合語の単語ベク
トル表現
•省略による学習の高速化と高精度化
•単純なベクトル演算で意味を表現できた
Distributed	Representations	of	Words	and	Phrases	and	their	Compositionally

Mais conteúdo relacionado

Destaque

Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) 신동 강
 
Ai and neural networks
Ai and neural networksAi and neural networks
Ai and neural networksNikhil Kansari
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Bhaskar Mitra
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence Muhammad Ahad
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationXiaohu ZHU
 
Introduction to Peer-to-Peer Networks
Introduction to Peer-to-Peer Networks Introduction to Peer-to-Peer Networks
Introduction to Peer-to-Peer Networks Venkatesh Iyer
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Yasir Khan
 
neural network
neural networkneural network
neural networkSTUDENT
 
Peer To Peer Networking
Peer To Peer NetworkingPeer To Peer Networking
Peer To Peer Networkingicanhasfay
 

Destaque (11)

Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd) Recurrent Neural Network tutorial (2nd)
Recurrent Neural Network tutorial (2nd)
 
Ai and neural networks
Ai and neural networksAi and neural networks
Ai and neural networks
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Peer to peer system
Peer to peer systemPeer to peer system
Peer to peer system
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Introduction to Peer-to-Peer Networks
Introduction to Peer-to-Peer Networks Introduction to Peer-to-Peer Networks
Introduction to Peer-to-Peer Networks
 
Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence Knowledge Representation in Artificial intelligence
Knowledge Representation in Artificial intelligence
 
neural network
neural networkneural network
neural network
 
Peer To Peer Networking
Peer To Peer NetworkingPeer To Peer Networking
Peer To Peer Networking
 

Mais de Kanji Takahashi

20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組み20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組みKanji Takahashi
 
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical TurkKanji Takahashi
 
論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword Information論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword InformationKanji Takahashi
 
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学するKanji Takahashi
 
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine TranslationKanji Takahashi
 
言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告Kanji Takahashi
 
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...Kanji Takahashi
 
20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword Units20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword UnitsKanji Takahashi
 
Enriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine TranslationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine TranslationKanji Takahashi
 
A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...Kanji Takahashi
 
Reducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine TranslationReducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine TranslationKanji Takahashi
 
文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine TranslationKanji Takahashi
 
Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)Kanji Takahashi
 
Domain-spesific Paraphrase Extraction
Domain-spesific Paraphrase ExtractionDomain-spesific Paraphrase Extraction
Domain-spesific Paraphrase ExtractionKanji Takahashi
 
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An InvestigationVietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An InvestigationKanji Takahashi
 
Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...Kanji Takahashi
 
日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用Kanji Takahashi
 
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...Kanji Takahashi
 
20150728So similar and yet incompatible: Toward automated identification of s...
20150728So similar and yet incompatible:Toward automated identification of s...20150728So similar and yet incompatible:Toward automated identification of s...
20150728So similar and yet incompatible: Toward automated identification of s...Kanji Takahashi
 
20150701 Improving SMT quality with morpho-syntactic analysis
20150701 Improving SMT quality with morpho-syntactic analysis20150701 Improving SMT quality with morpho-syntactic analysis
20150701 Improving SMT quality with morpho-syntactic analysisKanji Takahashi
 

Mais de Kanji Takahashi (20)

20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組み20180718Eightニュースフィード活性化のための自然言語処理の取り組み
20180718Eightニュースフィード活性化のための自然言語処理の取り組み
 
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
論文読み会 Creating Speech and Language Data With Amazon’s Mechanical Turk
 
論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword Information論文読み会 Enriching Word Vectors with Subword Information
論文読み会 Enriching Word Vectors with Subword Information
 
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
第17回Machine Learning 15 minutes!:ビジネスの出会いを科学する
 
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
論文読み会 Data Augmentation for Low-Resource Neural Machine Translation
 
言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告言語処理学会第23回年次大会参加報告
言語処理学会第23回年次大会参加報告
 
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...20170203The Effects of Data Size and Frequency Range on Distributional Semant...
20170203The Effects of Data Size and Frequency Range on Distributional Semant...
 
20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword Units20161215Neural Machine Translation of Rare Words with Subword Units
20161215Neural Machine Translation of Rare Words with Subword Units
 
Enriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine TranslationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine Translation
 
A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...A Beam-Search Decoder for Normalization of Social Media Text with Application...
A Beam-Search Decoder for Normalization of Social Media Text with Application...
 
Reducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine TranslationReducing the Impact of Data Sparsity in Statistical Machine Translation
Reducing the Impact of Data Sparsity in Statistical Machine Translation
 
文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation文献紹介:Morphological analysis for Statistical Machine Translation
文献紹介:Morphological analysis for Statistical Machine Translation
 
Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)Nlp2016参加報告(高橋)
Nlp2016参加報告(高橋)
 
Domain-spesific Paraphrase Extraction
Domain-spesific Paraphrase ExtractionDomain-spesific Paraphrase Extraction
Domain-spesific Paraphrase Extraction
 
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An InvestigationVietnamese Word Segmentation with CRFs and SVMs: An Investigation
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
 
Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...Improving vietnamese word segmentation and pos tagging using MEM with various...
Improving vietnamese word segmentation and pos tagging using MEM with various...
 
日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用日本語機能表現の自動検出と統計的係り受け解析への応用
日本語機能表現の自動検出と統計的係り受け解析への応用
 
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
20150916How Far are We from Fully Automatic High Quality Grammatical Error Co...
 
20150728So similar and yet incompatible: Toward automated identification of s...
20150728So similar and yet incompatible:Toward automated identification of s...20150728So similar and yet incompatible:Toward automated identification of s...
20150728So similar and yet incompatible: Toward automated identification of s...
 
20150701 Improving SMT quality with morpho-syntactic analysis
20150701 Improving SMT quality with morpho-syntactic analysis20150701 Improving SMT quality with morpho-syntactic analysis
20150701 Improving SMT quality with morpho-syntactic analysis
 

Distributed Representations of Words and Phrases and their Compositionally