Re-GAN:残差生成式对抗网络算法
摘 要
目的 生成式对抗网络(generative adversarial network,GAN)是一种无监督生成模型,通过生成模型和判别模型的博弈学习生成图像。GAN的生成模型是逐级直接生成图像,下级网络无法得知上级网络学习的特征,以至于生成的图像多样性不够丰富。另外,随着网络层数的增加,参数变多,反向传播变得困难,出现训练不稳定和梯度消失等问题。针对上述问题,基于残差网络(residual network,ResNet)和组标准化(group normalization,GN),提出了一种残差生成式对抗网络(residual generative adversarial networks,Re-GAN)。方法 Re-GAN在生成模型中构建深度残差网络模块,通过跳连接的方式融合上级网络学习的特征,增强生成图像的多样性和质量,改善反向传播过程,增强生成式对抗网络的训练稳定性,缓解梯度消失。随后采用组标准化(GN)来适应不同批次的学习,使训练过程更加稳定。结果 在Cifar10、CelebA和LSUN数据集上对算法的性能进行测试。Re-GAN的IS (inception score)均值在批次为64时,比DCGAN (deep convolutional GAN)和WGAN (Wasserstein-GAN)分别提高了5%和30%,在批次为4时,比DCGAN和WGAN分别提高了0.2%和13%,表明无论批次大小,Re-GAN生成图像具有很好的多样性。Re-GAN的FID (Fréchet inception distance)在批次为64时比DCGAN和WGAN分别降低了18%和11%,在批次为4时比DCGAN和WGAN分别降低了4%和10%,表明Re-GAN生成图像的质量更好。同时,Re-GAN缓解了训练过程中出现的训练不稳定和梯度消失等问题。结论 实验结果表明,在图像生成方面,Re-GAN的生成图像质量高、多样性强;在网络训练方面,Re-GAN在不同批次下的训练具有更好的兼容性,使训练过程更加稳定,梯度消失得到缓解。
关键词
Re-GAN: residual generative adversarial network algorithm
Shi Caijuan, Tu Dongjing, Liu Jingyi(College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China) Abstract
Objective A generative adversarial network (GAN) is a currently popular unsupervised generation model that generates images via game learning of the generative and discriminative models. The generative model uses Gaussian noise to generate probability distribution, and the discriminative model distinguishes between the generated and real probability distributions. In the ideal state, the discriminative model cannot distinguish between the two data distributions. However, achieving Nash equilibrium between the generative and discriminative models is difficult. Simultaneously, some problems, such as unstable training, gradient disappearance, and poor image quality, occur. Therefore, many studies have been conducted to address these problems, and these studies can be divided into two directions. One direction involves selecting the appropriate loss function, and the other direction involves changing the structure of GAN, e.g., from a fully connected neural network to a convolutional neural network (CNN). A typical work involves deep convolutional GANs (DCGANs), which adopts CNN and batch normalization (BN). Although DCGAN shave achieved good performance, some problems persist in the training process. Increasing the number of network layers leads to more errors, particularly gradient disappearance when the number of neural network layers is extremely high. In addition, BN leads to poor stability in the training process, particularly with small batch samples. In general, as the number of layers increases, the number of parameters increases and backpropagation becomes difficult as the number of layers increases, resulting in some problems, such as unstable training and gradient disappearance. In addition, the generative model directly generates images step by step, and a lower level network cannot determine the features learned by a higher level network, and thus, the diversity of the generated images is not sufficiently rich. To address the a fore mentioned problems, a residual GAN (Re-GAN) is proposed based on a residual network (ResNet) and group normalization (GN). Method ResNet has been recently proposed to solve the problem of network degradation caused by too many layers of a deep neural network and has been applied to image classification due to its good performance. In contrast with BN, GN divides channels into groups and calculates the normalized mean and variance within each group. Calculation is stable and independent of batch size. Therefore, we apply ResNet and GN to GAN to propose Re-GAN. First, a residual module ResNet is introduced into the generative model of GAN by adding the input and the mapping to the output of the layer to prevent gradient disappearance and enhance training stability. Moreover, the residual module ResNet optimizes feature transmission between neural network layers and enhances the diversity and quality of the generated image. Second, Re-GAN adopts the standardized GN to adapt to different batch learning. GN can reduce the difficulty of standardization caused by the lack of training samples and stabilize the training process of the network. Moreover, when the number of samples is sufficient, GN can make the calculated results match well with the sample distribution and exhibit good compatibility. Result To verify the effectiveness of the proposed algorithm Re-GAN, we compare it with DCGAN and Wasserstein-GAN (WGAN) with different batches of samples on three datasets namely, Cifar10, CcelebA, and LSUN bedroom. Two evaluation criteria, i.e., inception score (IS) and Fréchet inception distance (FID), are adopted in our experiments. As a common evaluation criterion for GAN, IS uses the inception network trained on ImageNet to calculate the information of the generated images. IS focuses on the evaluation of the quality but not the diversity of the generated images. When IS is larger, the quality of the generated images is better. FID is more robust to noise and more suitable for describing the diversity of the generated images. It is computed via a set of generated images and a set of ground images. When FID is smaller, the diversity of the generated images is better. We can obtain the following experimental results. 1) When the batch number is 64, the IS of the proposed algorithm Re-GAN is 5% higher than that of DCGAN and 30% higher than that of WGAN. When the batch is 4, the IS of Re-GAN is 0.2% higher than that of DCGAN and 13% higher than that of WGAN. These results show that the images generated by Re-GAN exhibit good diversity regardless of batch size. 2) When the batch number is 64, the FID of Re-GAN is 18% lower than that of DCGAN and 11% lower than that of WGAN. When the batch number is 4, the FID of Re-GAN is 4% lower than that of DCGAN and 10% lower than that of WGAN. These results indicate that the proposed algorithm Re-GAN can generate images with higher quality. 3) Training instability and gradient disappearance are alleviated during the training process. Conclusion The performance of the proposed Re-GAN is tested using two evaluation criteria, i.e., IS and FID, on three datasets. Extensive experiments are conducted, and the experimental results indicate the following findings. In the aspect of image generation, Re-GAN generates high-quality images with rich diversity. In the aspect of network training, Re-GAN guarantees that training exhibits better compatibility regardless of whether the batch is large or small, and then it makes the training process more stable and alleviates gradient disappearance. In addition, compared with DCGAN and WGAN, the proposed Re-GAN exhibits better performance, which can be attributed to the ResNet and GN adopted in Re-GAN.
Keywords
image generation deep learning convolutional neural network (CNN) generative adversarial network (GAN) residual network (ResNet) group normalization (GN)
|