擴散模型排成器之研究 | Publication

Publications-Theses

Article View/Open

pdf(0)

Publication Export

Google Scholar^TM

題名	擴散模型排成器之研究 Research on Diffusion Model Scheduler
作者	謝竣宇 Hsieh, Chun-Yu
貢獻者	蔡炎龍 Tsai, Yen-Lung 謝竣宇 Hsieh, Chun-Yu
關鍵詞	擴散模型排成器圖像生成 Diffusion Models Schedulers Image Generation
日期	2025
上傳時間	1-Jul-2025 14:40:43 (UTC+8)
摘要	本研究聚焦於圖像生成技術中不同採樣排程器（scheduler）對生成品質與效率的影響。傳統上，生成對抗網路（ Generative Adversarial Networks，GAN）透過對抗式訓練生成高品質影像，為早期主流方法。隨著擴散模型（Diffusion Models）興起，其透過加噪與逐步去噪的方式生成樣本，展現出優異的品質與穩定性。潛空間擴散模型（Latent Diffusion Models， LDM）則進一步藉由在潛空間中運行，降低計算負擔，並成為 Stable Diffusion 等應用的核心架構。常見的採樣策略包括 DDPM（Denoising Diffusion Probabilistic Models）、DDIM（Denoising Diffusion Implicit Models）、SDE（ Stochastic Differential Equations，SDE）及 ODE（Ordinary Differential Equations，ODE）方法，其中 DDPM 速度較慢，DDIM 採用確定性推理提升效率，而 SDE 與 ODE 則統一建構於連續時間的數學框架下。本研究比較各類方法於不同生成步數下的品質、時間與收斂性，實驗顯示：確定性推理有助於提升圖像穩定性，ODE 採樣在速度上表現最佳，除 SDE 與 DDPM 偶有不收斂情形，其餘方法皆具良好穩定性與實用性。 This study focuses on image generation techniques, particularly examining how different sampling schedulers affect the quality and efficiency of generated results. Traditionally, Generative Adversarial Networks (GANs) dominated the field by producing high-quality images through adversarial training. In recent years, Diffusion Models have emerged as a powerful alternative, generating samples via a gradual process of noise addition and denoising, offering strong stability and image fidelity. Latent Diffusion Models (LDMs), which operate in a compressed latent space, further reduce computational cost and serve as the core architecture behind models like Stable Diffusion. Common sampling strategies include DDPM (Denoising Diffusion Probabilistic Models), DDIM (Denoising Diffusion Implicit Models), and methods based on Stochastic Differential Equations (SDEs) and Ordinary Differential Equations (ODEs). Among these, DDPM tends to be slower, while DDIM improves efficiency through deterministic inference. SDE- and ODE-based approaches reformulate the sampling process under a continuous-time mathematical framework. This study compares these methods in terms of sample quality, runtime, and convergence under various sampling steps. Experimental results show that deterministic samplers enhance output stability, with ODE-based methods achieving the fastest generation. Except for occasional non-convergence in SDE and DDPM, all other methods demonstrate reliable convergence and practical effectiveness.
參考文獻	[1] John Butcher. Runge-kutta methods. Scholarpedia, 2(9):3147, 2007. [2] Ting Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023. [3] Robert J. Elliott and Brian D.O. Anderson. Reverse time diffusions. Stochastic Processes and their Applications, 19(2):327–339, 1985. [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020. [5] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. [6] Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, 2005. [7] Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019. [8] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. [9] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpmsolver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022. [10] Shakir Mohamed and Balaji Lakshminarayanan. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016. [11] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021. [12] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021. [13] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. [14] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015. [15] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. CoRR, abs/2010.02502, 2020. [16] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019. [17] Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in artificial intelligence, pages 574–584. PMLR, 2020. [18] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. [19] Mingtian Zhang, Tim Z Xiao, Brooks Paige, and David Barber. Improving vae-based representation learning. arXiv e-prints, pages arXiv–2205, 2022. [20] Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. Advances in Neural Information Processing Systems, 36:49842–49869, 2023.
描述	碩士國立政治大學應用數學系 110751013
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0110751013
資料類型	thesis

dc.contributor.advisor	蔡炎龍	zh_TW
dc.contributor.advisor	Tsai, Yen-Lung	en_US
dc.contributor.author (Authors)	謝竣宇	zh_TW
dc.contributor.author (Authors)	Hsieh, Chun-Yu	en_US
dc.creator (作者)	謝竣宇	zh_TW
dc.creator (作者)	Hsieh, Chun-Yu	en_US
dc.date (日期)	2025	en_US
dc.date.accessioned	1-Jul-2025 14:40:43 (UTC+8)	-
dc.date.available	1-Jul-2025 14:40:43 (UTC+8)	-
dc.date.issued (上傳時間)	1-Jul-2025 14:40:43 (UTC+8)	-
dc.identifier (Other Identifiers)	G0110751013	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/157750	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	應用數學系	zh_TW
dc.description (描述)	110751013	zh_TW
dc.description.abstract (摘要)	本研究聚焦於圖像生成技術中不同採樣排程器（scheduler）對生成品質與效率的影響。傳統上，生成對抗網路（ Generative Adversarial Networks，GAN）透過對抗式訓練生成高品質影像，為早期主流方法。隨著擴散模型（Diffusion Models）興起，其透過加噪與逐步去噪的方式生成樣本，展現出優異的品質與穩定性。潛空間擴散模型（Latent Diffusion Models， LDM）則進一步藉由在潛空間中運行，降低計算負擔，並成為 Stable Diffusion 等應用的核心架構。常見的採樣策略包括 DDPM（Denoising Diffusion Probabilistic Models）、DDIM（Denoising Diffusion Implicit Models）、SDE（ Stochastic Differential Equations，SDE）及 ODE（Ordinary Differential Equations，ODE）方法，其中 DDPM 速度較慢，DDIM 採用確定性推理提升效率，而 SDE 與 ODE 則統一建構於連續時間的數學框架下。本研究比較各類方法於不同生成步數下的品質、時間與收斂性，實驗顯示：確定性推理有助於提升圖像穩定性，ODE 採樣在速度上表現最佳，除 SDE 與 DDPM 偶有不收斂情形，其餘方法皆具良好穩定性與實用性。	zh_TW
dc.description.abstract (摘要)	This study focuses on image generation techniques, particularly examining how different sampling schedulers affect the quality and efficiency of generated results. Traditionally, Generative Adversarial Networks (GANs) dominated the field by producing high-quality images through adversarial training. In recent years, Diffusion Models have emerged as a powerful alternative, generating samples via a gradual process of noise addition and denoising, offering strong stability and image fidelity. Latent Diffusion Models (LDMs), which operate in a compressed latent space, further reduce computational cost and serve as the core architecture behind models like Stable Diffusion. Common sampling strategies include DDPM (Denoising Diffusion Probabilistic Models), DDIM (Denoising Diffusion Implicit Models), and methods based on Stochastic Differential Equations (SDEs) and Ordinary Differential Equations (ODEs). Among these, DDPM tends to be slower, while DDIM improves efficiency through deterministic inference. SDE- and ODE-based approaches reformulate the sampling process under a continuous-time mathematical framework. This study compares these methods in terms of sample quality, runtime, and convergence under various sampling steps. Experimental results show that deterministic samplers enhance output stability, with ODE-based methods achieving the fastest generation. Except for occasional non-convergence in SDE and DDPM, all other methods demonstrate reliable convergence and practical effectiveness.	en_US
dc.description.tableofcontents	致謝 i 中文摘要 ii Abstract iii 目錄 iv 表目錄 vi 圖目錄 vii 第一章介紹 1 第二章擴散模型 2 第一節變分自編碼器 2 第二節雜訊與擴散模型的關係 3 一、擴散模型的概念 3 二、前向擴散過程 3 三、反向擴散過程 6 第三節潛空間擴散模型 7 第四節排成器 8 第三章去噪擴散離散型排成器 9 第四章去噪隱式離散型排成器 11 第一節概念 11 第二節生成過程 11 第五章隨機微分方程描述連續型排成器 16 第一節概念介紹 16 第二節連續型擴散過程介紹 16 一、前向擴散 16 二、反向擴散 18 第六章 ODE求解描述連續型排成器 20 第一節前言 20 第二節以ODE求解還原圖像 20 一、概念介紹 20 二、還原圖像的過程 21 第三節 ODE在擴散模型中的應用 22 第七章實驗流程 24 第一節實驗方法 24 第二節實驗結果 24 第八章結論 31 參考文獻 32	zh_TW
dc.format.extent	17845915 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0110751013	en_US
dc.subject (關鍵詞)	擴散模型	zh_TW
dc.subject (關鍵詞)	排成器	zh_TW
dc.subject (關鍵詞)	圖像生成	zh_TW
dc.subject (關鍵詞)	Diffusion Models	en_US
dc.subject (關鍵詞)	Schedulers	en_US
dc.subject (關鍵詞)	Image Generation	en_US
dc.title (題名)	擴散模型排成器之研究	zh_TW
dc.title (題名)	Research on Diffusion Model Scheduler	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	[1] John Butcher. Runge-kutta methods. Scholarpedia, 2(9):3147, 2007. [2] Ting Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023. [3] Robert J. Elliott and Brian D.O. Anderson. Reverse time diffusions. Stochastic Processes and their Applications, 19(2):327–339, 1985. [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020. [5] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. [6] Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, 2005. [7] Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019. [8] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. [9] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpmsolver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022. [10] Shakir Mohamed and Balaji Lakshminarayanan. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016. [11] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171. PMLR, 2021. [12] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021. [13] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. [14] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015. [15] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. CoRR, abs/2010.02502, 2020. [16] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019. [17] Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in artificial intelligence, pages 574–584. PMLR, 2020. [18] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. [19] Mingtian Zhang, Tim Z Xiao, Brooks Paige, and David Barber. Improving vae-based representation learning. arXiv e-prints, pages arXiv–2205, 2022. [20] Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. Advances in Neural Information Processing Systems, 36:49842–49869, 2023.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM