• 北大核心期刊(《中文核心期刊要目总览》2017版)
  • 中国科技核心期刊(中国科技论文统计源期刊)
  • JST 日本科学技术振兴机构数据库(日)收录期刊

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于特征增强生成对抗网络的文本生成图像方法

吴春燕 潘龙越 杨有

吴春燕,潘龙越,杨有.基于特征增强生成对抗网络的文本生成图像方法[J]. 微电子学与计算机,2023,40(6):51-61 doi: 10.19304/J.ISSN1000-7180.2022.0629
引用本文: 吴春燕,潘龙越,杨有.基于特征增强生成对抗网络的文本生成图像方法[J]. 微电子学与计算机,2023,40(6):51-61 doi: 10.19304/J.ISSN1000-7180.2022.0629
WU C Y,PAN L Y,YANG Y. Text-to-image synthesis method based on feature-enhanced generative adversarial network[J]. Microelectronics & Computer,2023,40(6):51-61 doi: 10.19304/J.ISSN1000-7180.2022.0629
Citation: WU C Y,PAN L Y,YANG Y. Text-to-image synthesis method based on feature-enhanced generative adversarial network[J]. Microelectronics & Computer,2023,40(6):51-61 doi: 10.19304/J.ISSN1000-7180.2022.0629

基于特征增强生成对抗网络的文本生成图像方法

doi: 10.19304/J.ISSN1000-7180.2022.0629
基金项目: 重庆市研究生联合培养基地项目(2019-45);重庆市教育委员会人文社科研究规划项目基金项目(21SKGH044)
详细信息
    作者简介:

    吴春燕:女, (1998-),硕士研究生.研究方向为计算机视觉

    潘龙越:女,(1998-),硕士研究生.研究方向为计算机视觉

    通讯作者:

    男, (1965-),博士,教授.研究方向为计算机视觉、数字图像处理. E-mail:565357950@qq.com

  • 中图分类号: TP391.41

Text-to-image synthesis method based on feature-enhanced generative adversarial network

  • 摘要:

    针对文本生成图像任务过程中存在图像视觉特征和通道特征信息利用不充分问题,提出一种基于特征增强生成对抗网络(FE-GAN)的文本生成图像方法. 首先,在动态记忆读取时,设计二次记忆(MoM)模块来对生成的中间特征进行注意与融合,利用注意力机制在记忆读取时进行第一次视觉特征增强,再将得到的注意力结果和上一个生成器生成的图像特征进行融合,实现第二次图像视觉特征增强. 然后,在残差块中引入通道注意力来获取图像特征中的不同语义,提升相似语义通道之间的关联性,实现通道特征增强. 最后,将实例归一化上采样块和批量归一化上采样块相结合来提高图像分辨率,同时缓解批量大小对生成效果的影响,提升生成图像风格多样性能力. 在CUB-200-2011和Oxford-102数据集上进行的仿真实验表明,所提方法的IS分别达到了4.83和4.13,与DM-GAN相比分别提高了1.68%和5.62%. 实验结果表明,FE-GAN生成的图像在细节处理上更好,更加符合文本语义.

     

  • 图 1  FE-GAN总结构图

    Figure 1.  The overall structure of FE-GAN

    图 2  二次记忆模块

    Figure 2.  The module of Memory on Memory (MoM)

    图 3  通道注意残差块

    Figure 3.  The block of Channel Attention Residual (CAR)

    图 4  不同方法在CUB-200-2011数据集上的效果比较

    Figure 4.  Performance comparison of different networks on CUB-200-2011 dataset

    图 5  FE-GAN与DM-GAN在Oxford-102数据集的效果比较

    Figure 5.  Performance comparison on Oxford-102 dataset between FE-GAN and DM-GAN

    表  1  在两个不同数据集上不同方法的IS比较

    Table  1.   IS comparison of different method on two different datasets

    方法IS↑
    CUB-200-2011Oxford-102
    StackGAN[7]$ 3.7 \pm 0.04 $$ 3.20 \pm 0.01 $
    AttnGAN[1]$ 4.36 \pm 0.03 $-
    DM-GAN[2]$ 4.75 \pm 0.07 $$ 3.91 \pm 0.06^{*} $
    CFA-HAGAN[18]$ 4.54 \pm 0.04 $$ 3.98 \pm 0.03 $
    SegAttnGAN[19]$ 4.82 \pm 0.05 $$ 3.52 \pm 0.09 $
    CRD-CGAN[20]$ 4.75 \pm 0.10 $$ 3.53 \pm 0.06 $
    CSM-GAN[21]$ 4.62 \pm 0.08 $-
    MA-GAN[22]$ 4.76 \pm 0.05 $$ 4.09 \pm 0.08 $
    本文方法$\boldsymbol{4.83 \pm 0.05 }$$\boldsymbol{ 4.13 \pm 0.05 }$
    下载: 导出CSV

    表  2  在两个不同数据集上不同方法的FID比较

    Table  2.   FID comparison of different method on two different datasets

    方法FID↓
    CUB-200-2011Oxford-102
    StackGAN[7]51.8955.28
    AttnGAN[1]25.12-
    DM-GAN[2]16.0943.92*
    CFA-HAGAN[18]22.8945.29
    SSA-GAN[23]15.61-
    Diver-GAN[24]15.63-
    MA-GAN[22]21.6641.85
    本文方法15.3242.61
    下载: 导出CSV

    表  3  FE-GAN和基线在数据集Oxford-102上的性能对比

    Table  3.   Performance comparison on Oxford-102 datasets between FE-GAN and baseline

    方法IS↑FID↓
    Baseline$ 3.91 \pm 0.06 $43.92
    FE-GAN$ \boldsymbol{4.13 \pm 0.05} $42.61
    下载: 导出CSV

    表  4  CUB-200-2011数据集上的消融实验结果

    Table  4.   Results of ablation experiment on CUB-200-2011

    模块选择IS↑FID↓R-precision
    Baseline$ 4.69 \pm 0.05 $16.29$ 71.95 \pm 0.71 $
    +IN$ 4.62 \pm 0.06 $18.40$ 73.06 \pm 0.73 $
    +MoM$ 4.71 \pm 0.03 $19.29$ 73.87 \pm 0.80 $
    +CAR$ 4.78 \pm 0.06 $15.77$ 75.22 \pm 1.06 $
    +IN+MoM$ 4.57 \pm 0.07 $15.99$ 74.95 \pm 0.60 $
    +IN+CAR$ 4.79 \pm 0.04 $20.82$\boldsymbol{ 75.32 \pm 0.75 }$
    +MoM+CAR$ 4.80 \pm 0.06 $16.80$ 72.02 \pm 0.84 $
    +IN+MoM+CAR$ \boldsymbol{4.83 \pm 0.05} $15.32$ 74.20 \pm 0.55 $
    下载: 导出CSV

    表  5  IUpBlock数量对FE-GAN的影响

    Table  5.   IUpBlock quantity analysis on FE-GAN

    模块选择IS↑FID↓R-precision
    Baseline$ 4.69 \pm 0.05 $16.29$ 71.95 \pm 0.71 $
    AIN$ 4.72 \pm 0.06 $18.27$ 72.93 \pm 0.68 $
    SIN$\boldsymbol{ 4.85 \pm 0.05} $15.32$ \boldsymbol{74.20 \pm 0.55 }$
    下载: 导出CSV
  • [1] XU T, ZHANG P C, HUANG Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1316-1324.
    [2] ZHU M F, PAN P B, CHEN W, et al. DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5795-5803.
    [3] 鞠思博, 徐晶, 李岩芳. 基于自注意力机制的文本生成单目标图像方法[J]. 计算机工程与应用,2022,58(3):249-258. DOI: 10.3778/j.issn.1002-8331.2009-0194.

    JU S B, XU J, LI Y F. Text-to-single image method based on self-attention[J]. Computer Engineering and Applications,2022,58(3):249-258. DOI: 10.3778/j.issn.1002-8331.2009-0194.
    [4] SESHADRI A D, RAVINDRAN B. Multi-tailed, multi-headed, spatial dynamic memory refined text-to-image synthesis[EB/OL]. [2022-03-22]. https://arxiv.org/pdf/2110.08143.pdf.
    [5] 张云帆, 易尧华, 汤梓伟, 等. 基于通道注意力机制的文本生成图像方法[J]. 计算机工程,2022,48(4):206-212,222. DOI: 10.19678/j.issn.1000-3428.0062998.

    ZHANG Y F, YI Y H, TANG Z W, et al. Text-to-image synthesis method based on channel attention mechanism[J]. Computer Engineering,2022,48(4):206-212,222. DOI: 10.19678/j.issn.1000-3428.0062998.
    [6] REED S E, AKATA Z, YAN X C, et al. Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning. New York: JMLR. org, 2016: 1060-1069.
    [7] ZHANG H, XU T, LI H S, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5908-5916.
    [8] ZHANG H, XU T, LI H, et al. StackGAN++: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(8):1947-1962. DOI: 10.1109/tpami.2018.2856256.
    [9] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media,2022,8(3):331-368. DOI: 10.1007/s41095-022-0271-y.
    [10] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc, 2017: 6000–6010.
    [11] ZHANG H, GOODFELLOW I J, METAXAS D N, et al. Self-attention generative adversarial networks[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019: 7354-7363.
    [12] CHEN L, ZHANG H W, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6298-6306.
    [13] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023. DOI: 10.1109/TPAMI.2019.2913372.
    [14] WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539.
    [15] TANG H, BAI S, SEBE N. Dual attention GANs for semantic image synthesis[C]//Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020: 1994-2002.
    [16] HUANG S Y, CHEN Y. Generative adversarial networks with adaptive semantic normalization for text-to-image synthesis[J]. Digital Signal Processing,2022,120:103267. DOI: 10.1016/j.dsp.2021.103267.
    [17] HUANG L, WANG W M, CHEN J, et al. Attention on attention for image captioning[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 4633-4642.
    [18] CHENG Q R, GU X D. Cross-modal feature alignment based hybrid attentional generative adversarial networks for text-to-image synthesis[J]. Digital Signal Processing,2020,107:102866. DOI: 10.1016/j.dsp.2020.102866.
    [19] GOU Y C, WU Q C, LI M B, et al. SegAttnGAN: text to image generation with segmentation attention[EB/OL]. [2022-03-22]. https://arxiv.org/pdf/2005.12444.pdf.
    [20] HU T, LONG C J, XIAO C X. CRD-CGAN: category-consistent and relativistic constraints for diverse text-to-image generation[EB/OL]. [2022-03-22]. https://arxiv.org/pdf/2107.13516.pdf.
    [21] TAN H C, LIU X P, YIN B C, et al. Cross-modal semantic matching generative adversarial networks for text-to-image synthesis[J]. IEEE Transactions on Multimedia,2022,24:832-845. DOI: 10.1109/tmm.2021.3060291.
    [22] YANG Y H, WANG L, XIE D, et al. Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis[J]. IEEE Transactions on Image Processing,2021,30:2798-2809. DOI: 10.1109/tip.2021.3055062.
    [23] LIAO W T, HU K, YANG M Y, et al. Text to image generation with semantic-spatial aware GAN[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 18166-18175.
    [24] ZHANG Z X, SCHOMAKER L. DiverGAN: an efficient and effective single-stage framework for diverse text-to-image generation[J]. Neurocomputing,2022,473:182-198. DOI: 10.1016/j.neucom.2021.12.005.
  • 加载中
图(5) / 表(5)
计量
  • 文章访问数:  4
  • HTML全文浏览量:  3
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-10-11
  • 修回日期:  2022-10-23

目录

    /

    返回文章
    返回