Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 擴散模型排成器之研究
Research on Diffusion Model Scheduler
作者 謝竣宇
Hsieh, Chun-Yu
貢獻者 蔡炎龍
Tsai, Yen-Lung
謝竣宇
Hsieh, Chun-Yu
關鍵詞 擴散模型
排成器
圖像生成
Diffusion Models
Schedulers
Image Generation
日期 2025
上傳時間 1-Jul-2025 14:40:43 (UTC+8)
摘要 本研究聚焦於圖像生成技術中不同採樣排程器(scheduler)對生成品質與效率的影響。傳統上,生成對抗網路( Generative Adversarial Networks,GAN)透過對抗式訓練生成高品質影像,為早期主流方法。隨著擴散模型(Diffusion Models)興起,其透過加噪與逐步去噪的方式生成樣本,展現出優異的品質與穩定性。潛空間擴散模型(Latent Diffusion Models, LDM)則進一步藉由在潛空間中運行,降低計算負擔,並成為 Stable Diffusion 等應用的核心架構。常見的採樣策略包括 DDPM(Denoising Diffusion Probabilistic Models)、DDIM(Denoising Diffusion Implicit Models)、SDE( Stochastic Differential Equations,SDE) 及 ODE(Ordinary Differential Equations,ODE) 方法,其中 DDPM 速度較慢,DDIM 採用確定性推理提升效率,而 SDE 與 ODE 則統一建構於連續時間的數學框架下。本研究比較各類方法於不同生成步數下的品質、時間與收斂性,實驗顯示:確定性推理有助於提升圖像穩定性,ODE 採樣在速度上表現最佳,除 SDE 與 DDPM 偶有不收斂情形,其餘方法皆具良好穩定性與實用性。
This study focuses on image generation techniques, particularly examining how different sampling schedulers affect the quality and efficiency of generated results. Traditionally, Generative Adversarial Networks (GANs) dominated the field by producing high-quality images through adversarial training. In recent years, Diffusion Models have emerged as a powerful alternative, generating samples via a gradual process of noise addition and denoising, offering strong stability and image fidelity. Latent Diffusion Models (LDMs), which operate in a compressed latent space, further reduce computational cost and serve as the core architecture behind models like Stable Diffusion. Common sampling strategies include DDPM (Denoising Diffusion Probabilistic Models), DDIM (Denoising Diffusion Implicit Models), and methods based on Stochastic Differential Equations (SDEs) and Ordinary Differential Equations (ODEs). Among these, DDPM tends to be slower, while DDIM improves efficiency through deterministic inference. SDE- and ODE-based approaches reformulate the sampling process under a continuous-time mathematical framework. This study compares these methods in terms of sample quality, runtime, and convergence under various sampling steps. Experimental results show that deterministic samplers enhance output stability, with ODE-based methods achieving the fastest generation. Except for occasional non-convergence in SDE and DDPM, all other methods demonstrate reliable convergence and practical effectiveness.
參考文獻 [1] John Butcher. Runge-kutta methods. Scholarpedia, 2(9):3147, 2007. [2] Ting Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023. [3] Robert J. Elliott and Brian D.O. Anderson. Reverse time diffusions. Stochastic Processes and their Applications, 19(2):327–339, 1985. [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020. [5] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. [6] Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, 2005. [7] Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019. [8] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. [9] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpmsolver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022. [10] Shakir Mohamed and Balaji Lakshminarayanan. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016. [11] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021. [12] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021. [13] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. [14] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015. [15] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. CoRR, abs/2010.02502, 2020. [16] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019. [17] Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in artificial intelligence, pages 574–584. PMLR, 2020. [18] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. [19] Mingtian Zhang, Tim Z Xiao, Brooks Paige, and David Barber. Improving vae-based representation learning. arXiv e-prints, pages arXiv–2205, 2022. [20] Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. Advances in Neural Information Processing Systems, 36:49842–49869, 2023.
描述 碩士
國立政治大學
應用數學系
110751013
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0110751013
資料類型 thesis
dc.contributor.advisor 蔡炎龍zh_TW
dc.contributor.advisor Tsai, Yen-Lungen_US
dc.contributor.author (Authors) 謝竣宇zh_TW
dc.contributor.author (Authors) Hsieh, Chun-Yuen_US
dc.creator (作者) 謝竣宇zh_TW
dc.creator (作者) Hsieh, Chun-Yuen_US
dc.date (日期) 2025en_US
dc.date.accessioned 1-Jul-2025 14:40:43 (UTC+8)-
dc.date.available 1-Jul-2025 14:40:43 (UTC+8)-
dc.date.issued (上傳時間) 1-Jul-2025 14:40:43 (UTC+8)-
dc.identifier (Other Identifiers) G0110751013en_US
dc.identifier.uri (URI) https://nccur.lib.nccu.edu.tw/handle/140.119/157750-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 應用數學系zh_TW
dc.description (描述) 110751013zh_TW
dc.description.abstract (摘要) 本研究聚焦於圖像生成技術中不同採樣排程器(scheduler)對生成品質與效率的影響。傳統上,生成對抗網路( Generative Adversarial Networks,GAN)透過對抗式訓練生成高品質影像,為早期主流方法。隨著擴散模型(Diffusion Models)興起,其透過加噪與逐步去噪的方式生成樣本,展現出優異的品質與穩定性。潛空間擴散模型(Latent Diffusion Models, LDM)則進一步藉由在潛空間中運行,降低計算負擔,並成為 Stable Diffusion 等應用的核心架構。常見的採樣策略包括 DDPM(Denoising Diffusion Probabilistic Models)、DDIM(Denoising Diffusion Implicit Models)、SDE( Stochastic Differential Equations,SDE) 及 ODE(Ordinary Differential Equations,ODE) 方法,其中 DDPM 速度較慢,DDIM 採用確定性推理提升效率,而 SDE 與 ODE 則統一建構於連續時間的數學框架下。本研究比較各類方法於不同生成步數下的品質、時間與收斂性,實驗顯示:確定性推理有助於提升圖像穩定性,ODE 採樣在速度上表現最佳,除 SDE 與 DDPM 偶有不收斂情形,其餘方法皆具良好穩定性與實用性。zh_TW
dc.description.abstract (摘要) This study focuses on image generation techniques, particularly examining how different sampling schedulers affect the quality and efficiency of generated results. Traditionally, Generative Adversarial Networks (GANs) dominated the field by producing high-quality images through adversarial training. In recent years, Diffusion Models have emerged as a powerful alternative, generating samples via a gradual process of noise addition and denoising, offering strong stability and image fidelity. Latent Diffusion Models (LDMs), which operate in a compressed latent space, further reduce computational cost and serve as the core architecture behind models like Stable Diffusion. Common sampling strategies include DDPM (Denoising Diffusion Probabilistic Models), DDIM (Denoising Diffusion Implicit Models), and methods based on Stochastic Differential Equations (SDEs) and Ordinary Differential Equations (ODEs). Among these, DDPM tends to be slower, while DDIM improves efficiency through deterministic inference. SDE- and ODE-based approaches reformulate the sampling process under a continuous-time mathematical framework. This study compares these methods in terms of sample quality, runtime, and convergence under various sampling steps. Experimental results show that deterministic samplers enhance output stability, with ODE-based methods achieving the fastest generation. Except for occasional non-convergence in SDE and DDPM, all other methods demonstrate reliable convergence and practical effectiveness.en_US
dc.description.tableofcontents 致謝 i 中文摘要 ii Abstract iii 目錄 iv 表目錄 vi 圖目錄 vii 第一章 介紹 1 第二章 擴散模型 2 第一節 變分自編碼器 2 第二節 雜訊與擴散模型的關係 3 一、擴散模型的概念 3 二、前向擴散過程 3 三、反向擴散過程 6 第三節 潛空間擴散模型 7 第四節 排成器 8 第三章 去噪擴散離散型排成器 9 第四章 去噪隱式離散型排成器 11 第一節 概念 11 第二節 生成過程 11 第五章 隨機微分方程描述連續型排成器 16 第一節 概念介紹 16 第二節 連續型擴散過程介紹 16 一、前向擴散 16 二、反向擴散 18 第六章 ODE求解描述連續型排成器 20 第一節 前言 20 第二節 以ODE求解還原圖像 20 一、概念介紹 20 二、還原圖像的過程 21 第三節 ODE在擴散模型中的應用 22 第七章 實驗流程 24 第一節 實驗方法 24 第二節 實驗結果 24 第八章 結論 31 參考文獻 32zh_TW
dc.format.extent 17845915 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0110751013en_US
dc.subject (關鍵詞) 擴散模型zh_TW
dc.subject (關鍵詞) 排成器zh_TW
dc.subject (關鍵詞) 圖像生成zh_TW
dc.subject (關鍵詞) Diffusion Modelsen_US
dc.subject (關鍵詞) Schedulersen_US
dc.subject (關鍵詞) Image Generationen_US
dc.title (題名) 擴散模型排成器之研究zh_TW
dc.title (題名) Research on Diffusion Model Scheduleren_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] John Butcher. Runge-kutta methods. Scholarpedia, 2(9):3147, 2007. [2] Ting Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023. [3] Robert J. Elliott and Brian D.O. Anderson. Reverse time diffusions. Stochastic Processes and their Applications, 19(2):327–339, 1985. [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020. [5] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. [6] Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, 2005. [7] Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019. [8] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. [9] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpmsolver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022. [10] Shakir Mohamed and Balaji Lakshminarayanan. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016. [11] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021. [12] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021. [13] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. [14] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015. [15] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. CoRR, abs/2010.02502, 2020. [16] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019. [17] Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in artificial intelligence, pages 574–584. PMLR, 2020. [18] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. [19] Mingtian Zhang, Tim Z Xiao, Brooks Paige, and David Barber. Improving vae-based representation learning. arXiv e-prints, pages arXiv–2205, 2022. [20] Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. Advances in Neural Information Processing Systems, 36:49842–49869, 2023.zh_TW