Xiaojin Zhu, Andrew B. Goldberg, Mohamed Eldawy, Charles R. Dyer, Bradley Strock
We present a novel Text-to-Picture system that synthesizes a picture from general, unrestricted natural language text. The process is analogous to Text-to-Speech synthesis, but with pictorial output that conveys the gist of the text. Our system integrates multiple AI components, including natural language processing, computer vision, computer graphics, and machine learning. We present an integration framework that combines these components by first identifying informative and `picturable' text units, then searching for the most likely image parts conditioned on the text, and finally optimizing the picture layout conditioned on both the text and image parts. The effectiveness of our system is assessed in two user studies using children's books and news articles. Experiments show that the synthesized pictures convey as much information about children's stories as the original artists' illustrations, and much more information about news articles than their original photos alone. These results suggest that Text-to-Picture synthesis has great potential in augmenting human-computer and human-human communication modalities, with applications in education and health care, among others.
Subjects: 6. Computer-Human Interaction; 6.3 User Interfaces
Submitted: Apr 23, 2007