Multimodal Information Retrieval


The multimodal information retrieval problem is an information retrieval task where queries and target database elements are represented in a multimodal way; this is, each element is described through information obtained from one or more sensors. Multimodal information retrieval system aims to provide an ordered list of target database elements that are relevant to a given query. The primary challenge in this type of systems comes from the difficulty to compare incompatible representations from each modality; to solve this challenge know as “the media gap,” some strategies have been developed such as learn a common representation mapping from each modality to a shared space or learning a distance function that can measure the similarity between different modalities. Multimodal information retrieval takes advantage of additional information provided by multimodal data, combining them, using data fusion techniques to improve information retrieval performance.


Victor Hugo Contreras Ordonñez

Juan Sebastián Lara Ramírez

Johan David  Rodríguez Portela


Caicedo, J. C., BenAbdallah, J., Gonzalez, F. A., & Nasraoui, O. (2012). Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing, ISSN 0925-2312, Vol 76. No.1, pp 50–60. Elsevier. (PDF).
Müller, H., Clough, P., Deselaers, T., Caputo, B., & CLEF, I. (2010). Experimental evaluation in visual information retrieval. The Information Retrieval Series, 32, 1-554.
Wang, D., Cui, P., Ou, M., & Zhu, W. (2015, July). Deep Multimodal Hashing with Orthogonal Regularization. In IJCAI (Vol. 367, pp. 2291-2297).
Mourão, A., Martins, F., & Magalhães, J. (2015). Multimodal medical information retrieval with unsupervised rank fusion. Computerized Medical Imaging and Graphics, 39, 35-45.
Wang, D., Gao, X., Wang, X., & He, L. (2015, July). Semantic Topic Multimodal Hashing for Cross-Media Retrieval. In IJCAI (pp. 3890-3896).
Pereira, J. C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G. R., Levy, R., & Vasconcelos, N. (2014). On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE transactions on pattern analysis and machine intelligence, 36(3), 521-535.