基于改进扩散模型结合条件控制的文本图像生成算法
作者:
作者单位:

1.沈阳工业大学;2.沈阳工业大学 理学院;3.北方民族大学 信息与计算科学学院

基金项目:

国家自然科学基金 (11861003);辽宁省教育厅高等学校基本科研项目(LJKZ0157)


Text image generation algorithm based on improved diffusion model combined with conditional control
Author:
Affiliation:

1.沈阳工业大学;2.School of Science,Shenyang University of Technology,Shenyang Liaoning;3.School of Information and Computing Science,Northern University for Nationalities,Yinchuan Ningxia

Fund Project:

National Natural Science Foundation of China (11861003); Liaoning Provincial Department of Education Basic Research Project for Higher Education Institutions (LJKZ0157)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    针对现有的文本图像生成方法存在图像保真度低、图像生成操作难度大、仅适用于特定的任务场景等问题,提出一种新型的基于扩散模型的文本生成图像方法。该方法将目前主流的扩散模型作为主要网络,设计一种新型结构的残差块,有效的提升模型生成性能,添加CBAM(Convolutional Block Attention Module)注意力模块来改进噪声估计网络,增强了模型对图像关键信息的提取能力,进一步提高生成图像质量,最后结合条件控制网络,有效的实现了特定姿势的文本图像生成。与领先方法KNN-Diffsuion、CogView2、textStyleGAN、simplediffusion在数据集上CelebA-HQ做了定性定量分析以及消融实验。根据评价指标以及生成结果显示,方法能够有效的提高文本生成图像的质量,FID平均下降了36.4%,IS和结构相似性平均提高了11.4%和3.9%,结合的条件控制网络,实现了定向动作的文本图像生成任务。

    Abstract:

    A novel text image generation method based on diffusion model is proposed to address the problems of low image fidelity, difficult image generation operations, and limited applicability to specific task scenarios in existing text image generation methods. This method takes the current mainstream diffusion model as the main network, designs a new structure of residual blocks, effectively improves the model generation performance, adds a CBAM (Convolutional Block Attention Module) attention module to improve the noise estimation network, enhances the model"s ability to extract key information from images, further improves the quality of generated images, and finally combines conditional control networks to effectively achieve text image generation for specific poses. Qualitative and quantitative analysis, as well as ablation experiments, were conducted on the dataset using leading methods KNN Difsuion, CogView2, textStyleGAN, and simplefifusion using CelebA HQ. According to the evaluation indicators and generation results, the method can effectively improve the quality of text generated images, with an average decrease of 36.4% in FID, an average increase of 11.4% and 3.9% in IS and structural similarity. Combined with a conditional control network, the task of generating text images with directed actions is achieved.

    参考文献
    相似文献
    引证文献
引用本文

杜洪波,薛皓元,朱立军.基于改进扩散模型结合条件控制的文本图像生成算法[J].南京信息工程大学学报,,():

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-06-19
  • 最后修改日期:2024-12-24
  • 录用日期:2025-02-27

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2025 版权所有  技术支持:北京勤云科技发展有限公司