[DL輪読会]Diffusion-based Voice Conversion with Fast Maximum Likelihood Sampling Scheme
1. DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Di
ff
usion-based Voice Conversion with Fast
Maximum Likelihood Sampling Scheme
発表者: 阿久澤圭 (松尾研D3)
2. 書誌情報
• タイトル:Di
ff
usion-based Voice Conversion with Fast Maximum
Likelihood Sampling Scheme
• 著者:Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima
Sadekova, Mikhail Sergeevich Kudinov, Jiansheng Wei(所属:
Huawei Noah s Ark Lab)
• 発表:ICLR2022 (oral)
• 概要:深層生成モデルの一種であるDi
ff
usion Modelを音声変換に利用
• 発表理由:Di
ff
usion-based 生成モデルの勉強,VCへの興味
16. References
• Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep
unsupervised learning using nonequilibrium thermodynamics. In International
Conference on Machine Learning, pp. 2256‒2265, 2015.
• Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising di
ff
usion probabilistic models.
Advances in Neural Information Processing Systems, 33, 2020.
• Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the
data distribution. In Advances in Neural Information Processing Systems, pp. 11895‒
11907, 2019.
• Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon,
and Ben Poole. Score-Based Generative Modeling through Stochastic Di
ff
erential
Equations. In International Conference on Learning Representations, 2021.