Image&#x2212;to&#x2212;Image Translation with Text Guidance

Li, Bowen; Lukasiewicz, Thomas

Image−to−Image Translation with Text Guidance

Bowen Li‚ Philip Torr and Thomas Lukasiewicz

Abstract

In this paper, we focus on image-to-image translation with text guidance, where a text description is used to control visual attributes of the synthetic image produced from a given semantic mask. To accomplish this task, we propose a new multi-stage generative adversarial network with three novel components: (1) a discriminator with dual-directional feedback, which provides the generator at the same stage with fine-grained supervisory feedback related to image regions, encouraging it to produce realistic images with finer regional details, and also facilitating generators at following stages to have the ability to complete missing contents and correct inappropriate visual attributes, (2) a compatibility loss guides generators to produce both realistic objects and the background, and also to achieve a good compatibility between them, and (3) a part-of-speech tagging-based spatial attention to better build connection between image regions and corresponding semantic words. Experimental results demonstrate that our model can effectively control the image translation using text descriptions. More importantly, the text input allows our model to produce much diverse results and even new synthetic images that are out-of-distribution of the dataset.

Book Title

Proceedings of the 33rd British Machine Vision Conference 2022‚ BMVC 2022‚ London‚ UK‚ November 21−24‚ 2022

Month

November

Pages

581

Year

2022

Image−to−Image Translation with Text Guidance

Abstract

Links

See Also