Abstract

Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive half-day workshop focusing on the subject of multimodal representation and retrieval. The agenda includes keynote speeches, oral presentations, and an interactive panel discussion.

Call for Papers


Our objective with this workshop is to capture the interest of researchers in this emerging challenge. We anticipate that the workshop will serve as a catalyst for establishing a dedicated community focused on this topic. By highlighting the novelty and significance of the problem, we aim to attract researchers who are eager to explore and contribute to this field. We invite original research & industrial application papers that present research on multimodal data representation and retrieval.

Submission Guidelines

Submissions of short papers must be in English, in PDF format, and be at most 4 pages (including figures, tables, proofs, appendixes, acknowledgments, and any content except references) in length, with unrestricted space for references, in the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word). ACM's CCS concepts and keywords are required for review.


For LaTeX, the following should be used:

\documentclass[sigconf,natbib=true,anonymous=true{acmart}]

Submissions must be anonymous and should be submitted electronically via EasyChair:


https://easychair.org/conferences/?conf=mrr2024

Important dates for submissions to MRR 2024

Topics includes but not limited to

Accepted Papers

Kang Zhao, Xinyu Zhao, Zhipeng Jin, Yi Yang, Xuewu Jiao, Wen Tao, Yafei Li, Cong Han, Shuanglong Li, and Lin Liu. Image Captioning for Baidu Ad Image Generation with Multi-Stage Refinements.

Jing Zhu, Xiang Song, Vassilis Ioannidis, Danai Koutra, and Christos Faloutsos. Improving Feature Representation through Graph-Centric Finetuning.

Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Karim Bouyarmane, Shioulin Sam, Ismail Tutar, and Junzhou Huang. Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment.

Kevin Dela Rosa. Video Enriched Retrieval Augmented Generation Using Aligned Video Captions.

Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, and Sri Reddy. Async Learned User Embeddings for Ads Delivery Optimization.

Sarthak Srivastava, and Kathy Wu. Vision-Language Understanding in Hyperbolic Space.

Program


Time Activity Host
9:00 AM - 9:05 AM Opening Remarks Doug Gray
9:05 AM - 9:35 AM Keynote Address by Hamed Zamani Doug Gray
9:35 AM - 10:35 AM Oral Presentations Xinliang Zhu
10:35 AM - 10:45 AM Coffee Break -
10:45 AM - 11:15 AM Keynote Address by Dinesh Manocha Arnab Dhua
11:15 AM - 11:45 AM Panel Discussion Arnab Dhua
11:45 AM - 11:50 AM Closing Remarks Xinliang Zhu
11:50 AM - 12:15 PM Networking -

Speakers

Panel

Organizers