Skip to main content

Controllable Text−to−Image Generation

Bowen Li‚ Xiaojuan Qi‚ Thomas Lukasiewicz and Philip H. S. Torr


In this paper, we propose a novel visual attributes manipulation method, called controllable generative adversarial network (ControlGAN), which is able to effectively control parts of the image generation according to natural language descriptions, while preserving the generation of other contents. The key to the ControlGAN approach is to create a word-level spatial and channel-wise attention-driven generator that can focus on subregions of an image, and to build a word-level discriminator to provide the generator with fine-grained training feedback, which can effectively disentangle different regions of the image and allow to manipulate a specific visual attribute without affecting the generation of other contents. Also, a semantic preservation model (SPM) is proposed to reduce the randomness involved in the generation, and to encourage the generator to reconstruct text-unchanged contents. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state-of-the-arts, and is able to effectively control the generation of visual attributes and produce high-quality images.

Book Title
Proceedings of the 33rd Annual Conference on Neural Information Processing Systems‚ NeurIPS 2019‚ Vancouver‚ Canada‚ December 8–14‚ 2019