Please use this identifier to cite or link to this item:

Title: CycleCoopNet: 基於合作學習的神經網路進行圖片轉換
CycleCoopNet: Image-to-Image Translation with Cooperative Learning Networks
Authors: 翁健豪
Weng, Chien-Hao
Contributors: 郁方
Weng, Chien-Hao
Keywords: 生成式合作網路
Cooperative learning networks
Image-to-Image Translation
deep learning
neural network
Date: 2019
Issue Date: 2020-01-03 15:53:39 (UTC+8)
Abstract: 本文提出了一種新的圖像到圖像轉換方法,CycleCoopNet。圖像到圖像的轉換是一種將圖片從一種樣式更改為另一種樣式的方法,透過該方法,我們可以創建不存在的新穎圖片。 CycleCoopNet採用CoopNet框架,具有兩個主要模型,稱為generator和descriptor。generator生成圖片,該圖片由descriptor通過MCMC(Markov Chain Monte Carlo)採樣進行修訂,因此可以從descriptor指導的監督式學習中讓generator學習。另一方面,descriptor透過 modified contrastive divergence從數據中學習,使得descriptor被調整為與修改後的數據和實數據輸出相同的結果。
先前的一些論文也有實作圖像到圖像的轉換方法。 CycleGAN是與我們的作品類似的著名作品之一,它使用GAN(生成對抗網絡)的概念來演示此方法。它演示了進行圖像到圖像轉換的良好性能。但是,CycleGAN通過無監督學習生成圖片,也就是說,generator的結果在學習過程中沒有標準的生成圖片答案。 CycleGAN僅使用discriminator來確定結果是正確還是不正確。每個結果僅需要通過discriminator測試,這可以使generator僅需要找到如何通過discriminator測試,而無需嘗試找到正確的生成答案或更多可能的答案。這個問題我們稱為Mode collapse,它導致結果的變異性較小,也就是說,generator始終生成相同的圖片,以獲得更好的分數。
我們的目標是透過將discriminator更改為descriptor來升級該網路。descriptor模型改編自CoopNet(合作神經網絡)。這個想法改變discriminator(descriptor)卷積網絡的輸出尺寸。使用descriptor可以讓我們的generator標記答案來調整其模型參數,並具有將此問題更改為監督式學習問題的能力。另外,使用descriptor可以防止Mode collapse。避免generator始終生成相似的結果。
This paper proposes a new Image-to-Image translation method, CycleCoopNet. The image-to-image translation is a method of changing pictures from one style to another style, with which we can create novel pictures that do not exist. CycleCoopNet adopts the CoopNet framework with two main models called generator and descriptor. The generator generates pictures that are revised by the descriptor with MCMC (Markov Chain Monte Carlo) sampling, thus the generator is learned from supervised learning guided by the descriptor. On the other hand, the descriptor learns from real data by modified contrastive divergence, such that the descriptor is adjusted to output the same vector from the revised data and the real data.
Several previous works are doing the Image-to-Image translation method. CycleGAN is one of the famous work doing similar working as our work, it used the concept of GAN (generative adversarial network) to demonstrate this method. It demonstrates the nice performance of doing Image-to-Image translation. However, CycleGAN generating pictures by unsupervised learning, that is, the results of the generator does NOT have a standard generated pictures answer in the learning process. CycleGAN only uses the discriminator to decide the results are correct or incorrect. Every result only needs to pass the discriminator testing, this can make the generator only need to find how to pass the discriminator testing and NOT trying to find the correct generated answer or more possible answers. This problem we called Mode collapse, that causes the results with less variability, that is, the generator always generates the same picture cheating discriminator to getting a better score.
In our experiments, we use the edges2handbags dataset to observe how does the picture change from sketches to bags. We found that our model can generate more diverse results. And these results can be recovered to the origin picture by another opposite generator model stably. Another experiment we use vangogh2photo dataset to observe how does the picture change from photos to VanGogh-style pictures. We show our model can make a better variety.
Our goal is to upgrade this network by changing the discriminator to the descriptor. The descriptor model is adapted from the CoopNet(Cooperative Neural Network). The idea is changing the output dimension of the discriminator (descriptor) convolutional network. Using the descriptor can let our generator have labeled answer to adjust its model parameters, and change this problem to supervised learning. Also, using a descriptor can prevent from the Mode collapse. Avoid the generator always generate similar patterns.
Reference: [1] J. Xie, Y. Lu, R. Gao, and Y. N. Wu, “Cooperative learning of energy-based model and latent variable model via mcmc teaching,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[2] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232.
[3] J. Hui, “GAN — Why it is so hard to train Genera- tive Adversarial Networks!” hui/ gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b, 2018.
[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
[5] J. Xie, Y. Lu, S.-C. Zhu, and Y. Wu, “A theory of generative convnet,” in Interna- tional Conference on Machine Learning, 2016, pp. 2635–2644.
[6] Y. Lu, S.-C. Zhu, and Y. N. Wu, “Learning frame models using cnn filters,” arXiv preprint arXiv:1509.08379, 2015.
[7] T. Han, Y. Lu, S.-C. Zhu, and Y. N. Wu, “Alternating back-propagation for generator network,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[8] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
[9] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in European conference on computer vision. Springer, 2016, pp. 649–666.
[10] G. Larsson, M. Maire, and G. Shakhnarovich, “Learning representations for auto- matic colorization,” in European Conference on Computer Vision. Springer, 2016, pp. 577–593.
[11] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with si- multaneous classification,” ACM Transactions on Graphics (TOG), vol. 35, no. 4, p. 110, 2016.
[12] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin, “Image analo- gies,” in Proceedings of the 28th annual conference on Computer graphics and inter- active techniques. ACM, 2001, pp. 327–340.
[13] A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2. IEEE, 1999, pp. 1033–1038.
[14] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
[15] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
[16] P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays, “Scribbler: Controlling deep image synthesis with sketch and color,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5400–5409.
[17] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learn- ing with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[18] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image com- puting and computer-assisted intervention. Springer, 2015, pp. 234–241.
[20] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural computation, vol. 14, no. 8, pp. 1771–1800, 2002.
[21] R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,” in Artificial intelli- gence and statistics, 2009, pp. 448–455.
[22] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[23] T. Kim and Y. Bengio, “Deep directed generative models with energy-based proba- bility estimation,” arXiv preprint arXiv:1606.03439, 2016.
[24] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
[25] Y. Lu, S.-C. Zhu, and Y. N. Wu, “Learning frame models using cnn filters,” arXiv preprint arXiv:1509.08379, 2015.
[26] A. Dosovitskiy, J. Tobias Springenberg, and T. Brox, “Learning to generate chairs with convolutional neural networks,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 2015, pp. 1538–1546.
[27] A. Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based tracking using the integral histogram,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1. IEEE, 2006, pp. 798–805.
[28] Y. Ma, X. Gu, and Y. Wang, “Histogram similarity measure using variable bin size distance,” Computer Vision and Image Understanding, vol. 114, no. 8, pp. 981–989, 2010.
[29] L.-M. Po and K.-M. Wong, “A new palette histogram similarity measure for mpeg-7 dominant color descriptor,” in 2004 International Conference on Image Processing, 2004. ICIP’04., vol. 3. IEEE, 2004, pp. 1533–1536.
[30] N. Krawetz, “Kind of like that,” The Hacker Factor Blog, 2013.
[31] ——, “Looks like it,” The Hacker Factor Blog, 2011.
[32] C. Zauner, “Implementation and benchmarking of perceptual image hash functions,” 2010.
[33] K. R. Rao and P. Yip, Discrete cosine transform: algorithms, advantages, applica-
tions. Academic press, 2014.
[34] C.-H. Weng, “Github of our work, CycleCoopNet,” CycleCoopNet, 2019.
Description: 碩士
Source URI:
Data Type: thesis
Appears in Collections:[資訊管理學系] 學位論文

Files in This Item:

File SizeFormat
603401.pdf7761KbAdobe PDF107View/Open

All items in 學術集成 are protected by copyright, with all rights reserved.

社群 sharing