Variational Template Machine for Data-to-Text Generation

論文紹介ゼミ
Variational Template Machine
for Data-to-Text Generation
北海道大学大学院情報科学院
調和系工学研究室
博士1年吉田拓海
7月8日(水)

論文情報
• 著者
– Rong Ye1*, Wenxian Shi, Hao Zhou, Zhongyu Wei1, Lei Li
– 1 Fudan University
– ByteDance AI Lab
– * research intern
• 学会
– ICLR2020
• 概要
– Data-to-Text
• structured data(table) -> description
– neural encoder-decoderによる既存アプローチでは
多様性に欠けることが多い
– VAEベースのVariational Template Machine(VTM)の提案
• 潜在変数をテンプレートとコンテンツ(table)に分ける
– table の無い raw text を使用する半教師あり学習
1

Introduction
• data-to-text
– 人物紹介、天気予報、スポーツニュースなどの生成
2

Introduction
• 文構造の多様性を高めるためにはテンプレートが重要
– 異なるテンプレートは文の配置を制御
• 文の生成を変化させることができる
3

Related Work
• Data-to-Text
– Encoder-Decoder Model による End-to-End学習[1,2]
– 潜在変数としてテンプレートを導入
• 制御可能で解釈可能な生成に着目
• Semi-HMM Decoder[3]
• Semi-HMMモデルを使用したData2Text Studio[4]
– 対話的な手法で Table入力からテンプレートを抽出して文生成
4
[1] Parag Jain, Anirban Laha, Karthik Sankaranarayanan, Preksha Nema, Mitesh M Khapra, and Shreyas Shetty. A
mixed hierarchical attention based encoder-decoder approach for standard table summarization. In Proceedings of
the Conference of the North American Chapter of the Association for Computational Linguistics, 2018.
[2] Heng Gong, Xiaocheng Feng, Bing Qin, and Ting Liu. Table-to-text generation with effective hierarchical
encoder on three dimensions (row, column and time). In Proceedings of the Conference on Empirical Methods in
Natural Language Processing and the International Joint Conference on Natural Language Processing, 2019.
[3] Sam Wiseman, Stuart Shieber, and Alexander Rush. Learning neural templates for text generation. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018.
[4] Longxu Dou, Guanghui Qin, Jinpeng Wang, Jin-Ge Yao, and Chin-Yew Lin. Data2text studio: Automated text
generation from structured data. In Proceedings of the Conference on Empirical Methods in Natural Language
Processing: System Demonstrations, 2018.
Encoder-Decoderは流暢な文を生成可能だが文の多様性に欠ける

Related Work
• Semi-supervised Learning From Raw Data
– data と text の pairwiseデータの整備は高コスト
• raw text データは取得が比較的容易
– 十分なデータが無いとEncoder-Decoderは失敗する可能性[1]
– 機械翻訳の分野では逆翻訳が有効[2,3]
5
[1] Shuming Ma, Pengcheng Yang, Tianyu Liu, Peng Li, Jie Zhou, and Xu Sun. Key fact as pivot: A two-stage model for low resource table-to-
text generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2019.
[2] Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. In Proceedings of
the Annual Meeting of the Association for Computational Linguistics, 2016.
[3] Franck Burlot and Franc¸ois Yvon. Using monolingual data in neural machine translation: a systematic study. In Proceedings of the
Conference on Machine Translation: Research Papers, 2018.
本論文では raw text を使用した半教師あり学習手法を提案
（逆翻訳に触発）

Related Work
• Latent Variable Generative Model
– Variational Auto Encoder (VAE) [1]
• RNNベースのVAEで多様で良質な文が生成可能[2]
– 近年では、Disentangledな潜在変数の学習が研究される
• 潜在空間を構文空間と意味空間に分離[3]
6
[1] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the International Conference on
Learning Representations, 2014.
[2] Samuel Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a
continuous space. In Proceedings of the Conference on Computational Natural Language Learning., 2016.
[3] Yu Bao, Hao Zhou, Shujian Huang, Lei Li, Lili Mou, Olga Vechtomova, Xinyu Dai, and Jiajun Chen. Generating sentences from
disentangled syntactic and semantic spaces. In Proceedings of the Conference of the Association for Computational Linguistics,
2019.
本論文では、VAEベースの手法を提案
潜在空間をテンプレート空間とコンテンツ空間に分離

Variational Auto Encoder (VAE)
• エンコードした潜在変数(𝑧)に確率分布を仮定
– 𝑧をサンプリングすることで𝑥を生成
• VAEの学習
– 生成器の対数周辺尤度(marginal log-likelihood) log 𝑝 𝜃(𝑥) の最大化
– 変分下界(ELBO)の最大化を行う
7
潜在変数𝑧
Encoder
𝑞 𝜙(𝑧|𝑥)
Decoder
𝑝 𝜃(𝑥|𝑧)
𝑧~𝑁(𝜇, Σ)
𝑥 𝑥
ELBO
KL divergence : 分布の距離

提案手法(Variational Template Machine)
• ベースは Variational Auto Encoder (VAE)
• 潜在変数をテンプレート(𝑧)とコンテンツ(𝑐)に分ける
8
content
潜在変数𝑐
Encoder
𝑞 𝜙 𝑧
(𝑧|𝑦)
Decoder
𝑝 𝜃(𝑦|𝑧, 𝑐)
table
𝑥
text
𝑦
tempate
潜在変数𝑧
text
𝑦
Encoder
𝑓𝑒𝑛𝑐 𝑥 = 𝑐
確定的(𝑥から𝑐が一意に決まる)
確率的(𝑧はサンプリングされる)

問題設定
• Data-to-Text
– データとしてtable(𝒙𝑖)とtext(𝒚𝑖)のペアがある
• 𝑥𝑖は field, position, value から構成
– positionはvalueのfield内でのインデックス
– （例）”ジョン・レノン”という項目は次の二つで表現
» (Name, 1, John)
» (Name, 2, Lennon)
– table(𝑥)のEncoder
– table-textペアに加えて raw texts もデータとしてある
• 𝐷𝑟 = 𝒚𝑖 𝑖=1
𝑀
9
( 𝑝, 𝑓, 𝑣 の embedding)
大抵の場合 𝑀 ≫ 𝑁

提案手法 VTM の学習 (Training procedure) 10
𝐿 𝐸𝐿𝐵𝑂 𝑝：pair wiseデータのELBO
𝐿 𝐸𝐿𝐵𝑂 𝑟
：raw textデータのELBO
𝐿 𝑝𝑡：テンプレート情報を潜在変数に埋め込むための補助損失項
𝐿 𝑝𝑐：コンテンツ情報を潜在変数に埋め込むための補助損失項
𝐿 𝑀𝐼：KL collapse を緩和するための相互情報量項
pairwise
raw text
pairwise & raw text

提案手法 VTM の学習（pairwise data） 11
ELBO
𝑞 𝜙 𝑧
(𝑧|𝑦)は正規分布𝑁(𝜇 𝜙 𝑧
𝑦 , Σ 𝜙 𝑧
(𝑦))
𝑝(𝑧)は標準正規分布𝑁(0. 𝐼)
content
潜在変数𝑐
Encoder
𝑞 𝜙 𝑧
(𝑧|𝑦)
Decoder
𝑝 𝜃(𝑦|𝑧, 𝑐)table
𝑥
text
𝑦
tempate
潜在変数𝑧
text
𝑦
Encoder

提案手法 VTM の学習（pairwise data）
• Preserving-Template Loss (𝐿 𝑝𝑡)
– テンプレート情報を潜在変数に埋め込むための補助損失項
• table(𝑥)を用いて文中のトークンを<𝑒𝑛𝑡>に置き換えることで
大雑把なテンプレート( 𝑦)を作成
12
content
潜在変数𝑐
Encoder
𝑞 𝜙 𝑧
(𝑧|𝑦)
Decoder
𝑝 𝜃(𝑦|𝑧, 𝑐)table
𝑥
text
𝑦
tempate
潜在変数𝑧
text
𝑦
Encoder
template
extractor
𝑝 𝜂( 𝑦|𝑧)
template
𝑦
Preserving-Template Loss

提案手法 VTM の学習（raw text） 13
ELBO
content
潜在変数𝑐
Encoder
𝑞 𝜙 𝑧
(𝑧|𝑦)
Decoder
text
𝑦
tempate
潜在変数𝑧
text
𝑦
Encoder
𝑞 𝜙 𝑐
(𝑐|𝑦)
text
𝑦
𝑞 𝜙 𝑐
(𝑐|𝑦)は正規分布𝑁(𝜇 𝜙 𝑐
𝑦 , Σ 𝜙 𝑐
(𝑦))
𝑝 𝑧 , 𝑝(𝑐)は標準正規分布𝑁(0. 𝐼)

提案手法 VTM の学習（pairwise text）
• Preserving-Content Loss (𝐿 𝑝𝑐)
– コンテンツ情報を潜在変数に埋め込むための補助損失項
14
content
潜在変数𝑐
Encoder
𝑞 𝜙 𝑧
(𝑧|𝑦)
Decoder
text
𝑦
tempate
潜在変数𝑧
text
𝑦
Encoder
𝑞 𝜙 𝑐
(𝑐|𝑦)
text
𝑦
Preserving-Content Loss
ℎ = 𝑓𝑒𝑛𝑐 𝑥 ：pairwise dataで使用していたencoder

提案手法 VTM の学習
• Mutual Information Loss 𝐿 𝑀𝐼
– KL崩壊 (KL collapse) を緩和させるために相互情報項を追加[1,2,3]
– KL崩壊
• VAEの学習で起きる課題の一つ
• 潜在変数の事後分布と事前分布を一致させる方向に学習してしまう
15
Mutual Information Loss
相互情報量
[1] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan:
Interpretable representation learning by information maximizing generative adversarial nets.
Proceedings of the Advances in Neural Information Processing Systems, 2016.
[2] Shengjia Zhao, Jiaming Song, and Stefano Ermon. Infovae: Information maximizing variational
autoencoders. arXiv preprint arXiv:1706.02262, 2017.
[3] Tiancheng Zhao, Kyusong Lee, and Maxine Eskenazi. Unsupervised discrete sentence
representation learning for interpretable neural dialog generation. In Proceedings of the Annual Meeting
of the Association for Computational Linguistics, 2018.

提案手法 VTM の学習 (Training procedure) 16
𝐿 𝐸𝐿𝐵𝑂 𝑝：pair wiseデータのELBO
𝐿 𝐸𝐿𝐵𝑂 𝑟
：raw textデータのELBO
𝐿 𝑝𝑐：コンテンツ情報を潜在変数に埋め込むための補助損失項
pairwise
raw text
pairwise & raw text

Experiment
• Dataset
– 2種類
– データセットの table-text のペアの一部を使用
– 大部分を raw text として text のみ使用
• ペア：raw text = 1 : 10
17
SPNLG[1] レストランの記述
WIKI[2,3] Wikipediaの人物紹介[2,3] (+動物[3])
[1] Lena Reed, Shereen Oraby, and Marilyn Walker. Can neural generators for dialogue learn sentence planning and
discourse structuring? In Proceedings of the International Conference on Natural Language Generation, 2018.
[2] Remi Lebret, David Grangier, and Michael Auli. Neural text generation from structured data with ´ application to the
biography domain. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016.
[3] Qingyun Wang, Xiaoman Pan, Lifu Huang, Boliang Zhang, Zhiying Jiang, Heng Ji, and Kevin Knight. Describing a
knowledge base. In Proceedings of the International Conference on Natural Language Generation, 2018b.

Experiment Dataset SPNLG 18
E2E NLG Challenge
http://www.macs.hw.ac.uk/InteractionLab/E2E/
SPNLGは E2E を拡張したモノ

Experiment Dataset WIKI 19
https://en.wikipedia.org/wiki/Jack_Ryder_(cricketer)

Experiment
• Evaluation Metrics
– BLEU
• 高い方が良い
• 生成文と正解文の n-gram の一致に基づいて計算
– self-BLEU[1]
• 低いほど良い
• 生成文内でBLEUを計算
20
流暢さ多様性
SPNLG BLEU-4, NIST, ROUGE-L(F-score) self-BLEU
WIKI BLEU-4, NIST, METEOR, ROUGE-L(F-score), CIDEr self-BLEU
[1] Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. Texygen: A
benchmarking platform for text generation models. In Proceedings of the International ACM SIGIR
Conference on Research & Development in Information Retrieval, 2018.

Experiment
• Baseline Models
– Table2seq
• tableをEncode -> Seq2seqで文生成
• TableEncoderとDecoderのアーキテクチャは提案手法と同じ
• 学習はペアのデータのみ（raw textは使用しない）
• 復号化
– ビームサーチで5文 (Table2seq-beam)
– forward sampling (Table2seq-sample)
• raw textでDecoderを事前学習(Table2seq-pretrain)
– 復号化はビームサーチ(Table2seq-beamと同じ)
– Temp-KN[1]
• 5-gram Kneser-Ney 言語モデルによってテンプレートを生成
次にfieldのトークンをtableから単語に置き換え
21
[1] Remi Lebret, David Grangier, and Michael Auli. Neural text generation from structured data with
application to the biography domain. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing, 2016.

Experiment
• hyper parameter
– decoder: LSTM with attention mechanism
– word embedding: 300-dimension
– optimizer: Adam
– initial learning rate: 0.001
– SPNLG dataset
• latent template variable: 64-dimension
• latent content variable: 100-dimension
• table: 300-dimension
• 𝜆 𝑀𝐼 = 0.5, 𝜆 𝑝𝑡 = 1.0, 𝜆 𝑝𝑐 = 0.5
– WIKI dataset
• latent template variable: 100-dimension
• latent content variable: 200-dimension
• table: 300-dimension
• 𝜆 𝑀𝐼 = 1.0, 𝜆 𝑝𝑡 = 1.0, 𝜆 𝑝𝑐 = 1.0
22

Experimental Results on SPNLG Dataset
• Quantitative Analysis
23
VTM(提案手法)は流暢さも良い性能を示しつつ
多様性の評価も高い

• Quantitative Analysis
24
Table2seq-beamは流暢さの評価が高く、多様性は低い
Table2seq-sampleは流暢さの評価が低い、多様性は高い
Table2seq-pretrainは多様性が更に悪化
Temp-KNは多様性は良いが、流暢な文は生成できない

• 補助ロスの有効性
25
𝑳 𝒑𝒄がないと流暢さは低下
𝑳 𝑴𝑰, 𝑳 𝒑𝒕を除去すると流暢さ・多様性は低下
𝐿 𝑝𝑐：コンテンツ情報を潜在変数に埋め込むための補助損失

• raw textの有効性
26
raw text を用いることで流暢さ・多様性が向上

• 流暢さと多様性のトレードオフを解析
– 異なるサンプリング方法の下での品質と多様性を評価
– 温度付きソフトマックス関数を使用
• 温度𝜏を変化させプロット(0.1, 0.2, 0.3, 0.5, 0.6, 0.9, 1.0)
27
左上の方が性能が良いモデル
VTM(提案手法)の方が良い

• Human evaluation
– 生成された120サンプル（各5文）を無作為に選ぶ
– 3人のアノテーターに１～５で評価（リッカート尺度）
• Accuracy：生成文がtableの内容と一致しているか
• Coherence：生成文が一貫しているか
• Diversity：生成文が可能な限り多くのパターン/構造を持っているか
28
太字は他より
有意に高い(𝛼 = 0.01)
提案手法(VTM)
• 最高のAccuracy, Coherence
• Table2seq-sample, Temp-KNと同等の多様性(???)
• raw textを使用しないVTMと比較すると多様性が向上

• raw text の割合と多様性(self-BLEU)について
– raw textの割合を変えて学習（raw text : pair data）
• 0.5:1, 1:1, 2:1, 3:1, 5:1, 7:1, 10:1
29
5:1くらいまでself-BLEUは減少
以降はそうでもない

• Case Study
30

• Case Study
31
テンプレート構造が異なる文章を生成しているが
文中の情報が間違っている
（例）
文4：”it is a Japanese place.”

• Case Study
32
• テンプレートの多様性が高く、正確な文が生成されている
• 文数や接続詞が異なる文を生成可能
提案手法

Experimental Results on WIKI Dataset
• Quantitative Analysis, Ablation Study
33
SPNLGデータセットと同様の結果

• Comparison with the pseudo-table-based method
– raw text の別の利用法
• 固有表現抽出(NER)によってraw text から疑似的なTableを構築
– NER+Table2seq
• table-textデータからBi-LSTM-CRFモデル[1]を学習し、
raw textの疑似的なTableを構築
• table-textデータと疑似table-textデータの両方でTable2seqを学習
– ドメイン変更：人物紹介(841,507文) -> 動物(101,807文)
• モデルの一般化を証明
34
[1] Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm-crf models for sequence tagging. arXiv preprint
arXiv:1508.01991, 2015.

• Comparison with the pseudo-table-based method
– 流暢さと多様性のトレードオフを解析
• 異なるサンプリング方法の下での品質と多様性を評価
• 温度付きソフトマックス関数を使用
– 温度𝜏を変化させプロット(0.1, 0.2, 0.3, 0.5, 0.6, 0.9, 1.0)
35
左上の方が良いモデル
VTM(提案手法)の方が良い

• Computational cost
– Train: 検証セット中で最も低い誤差に到達するまでの時間
– Test: テストセットで72k 文を生成するのにかかる時間
– hardware
• single Tesla V100 GPU
36
提案手法(VTM)はBaselineより学習には時間がかかるが
推論にかかる時間は同程度

• Case Study
37

• Case Study
38
https://en.wikipedia.org/wiki/Jack_Ryder_(cricketer)

• Case Study
39
• 多様な文を生成可能
• 誤った内容や無関係な内容を生成する可能性が高い
（例）
文３ではクラブ名が間違っている

• Case Study
40
可読性を維持しつつ複数のテンプレートを持つ文章を生成可能
提案手法
(raw text 非使用)

• Case Study
41
• 他のモデルには無い、より多様な表現を生成
• pairデータに無いraw textによってテンプレート空間
の情報が豊かになる可能性を暗示している
（例）
5. “[fullname], also known as [nickname] ([birth date] –
[daeth date]) was a [country] [article name 4].”
提案手法

まとめ
• Data-to-Text
– structured data(table) -> description
• VAEベースのVariational Template Machine(VTM)の提案
– 潜在変数をテンプレートとコンテンツ(table)に分ける
• table の無い raw text を使用する半教師あり学習
42

Variational Template Machine for Data-to-Text Generation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Variational Template Machine for Data-to-Text Generation

Semelhante a Variational Template Machine for Data-to-Text Generation (20)

Mais de harmonylab

Mais de harmonylab (20)

Último

Último (10)

Variational Template Machine for Data-to-Text Generation