Identifying Image Related Sentences in News Articles
Melike Esma İlter, Lale Akarun, Arzucan Özgür
2019 27th Signal Processing and Communications Applications Conference (SIU)
Abstract
With the increasing availability of images on the web, identifying image related sentences has become an important problem. This research area is also important for the news publishing community for automatic captioning of news images and summarization. Although a large body of research has been devoted to image captioning, it is still a challenging problem. Previous works on image captioning mostly focus on generating new captions for the images. The problem of identifying image related sentences in news articles is discussed in this study for the first time and is novel because we do not try to generate a caption from scratch, but we try to select the most appropriate set of sentences for the image from the news text itself. We have used the CNN news dataset which only contains the text parts of news as basis and we have augmented the dataset by collecting the images of the news articles. We generated two class ground truth for the image and sentences of news by using Tf-Idf and Word2Vec vectors cosine and SEMILAR sentence-tosentence similarity methods, respectively. The experiment results show that Naive Bayes classifier with HOG feature selection gives better results.