Seminar: Machine Learning for Language and Vision


May 9, 2023. Our grading standard is online.

Apr 26, 2023. Our tentative schedule is online. Please contact us ASAP if you have any questions.

Apr 17, 2023. Our kickoff meeting is scheduled for April 18, 2023, from 12:15 to 13:45, at C7 3 - Seminarraum 1.12. As the demand for attendance is exceedingly high, we have added more papers to our list. We kindly request that you attend the first two meetings to secure your spot. In the event that you are unable to attend but would still like to participate, please send us an email and we will arrange remote attendance for you.

Mar 27, 2023. If you are interested, please send Xudong Hong an email to register for this seminar.


Date Topic Paper Title Presenter
May 9, 2023 Contrastive Pre-training - Image Learning transferable visual models from natural language supervision Raj Mohan Tumarada
May 9, 2023 Contrastive Pre-training - Image Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation Larisa Ivanova
May 16, 2023 Seq2seq Pre-training - Image Simple Visual Language Model Pretraining with Weak Supervision Julian Schlenker
May 16, 2023 Seq2seq Pre-training - Image Flava: A foundational language and vision alignment model Mehrad Zamani
May 23, 2023 Pre-training - Video Merlot: Multimodal neural script knowledge models Yage Zhang
May 23, 2023 Pre-training - Video Merlot reserve: Neural script knowledge through vision and language and sound -
May 30, 2023 Multitask Learning Unifying Vision-and-Language Tasks via Text Generation Nitish Juttu
May 30, 2023 Multitask Learning Unit: Multimodal multitask learning with a unified transformer Zixuan Liu
June 6, 2023 Multitask Learning Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework Jakob Gürtler
June 6, 2023 Parameter Efficiency - Prompting An empirical study of gpt-3 for few-shot knowledge-based vqa Raphael Maximilian Stephan Maser
June 13, 2023 Parameter Efficiency - Prompting Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Karen Li
June 13, 2023 Parameter Efficiency - Prompt Tuning Multimodal few-shot learning with frozen language models Muhammad Anas Tahir
June 20, 2023 Parameter Efficiency - Prompt Tuning Transitional adaptation of pretrained models for visual storytelling Mahnoor Shahid
June 20, 2023 Parameter Efficiency - Prefix-Tuning Hyperpelt: Unified parameter-efficient language model tuning for both language and vision-and-language tasks Sijie Wu
June 27, 2023 Parameter Efficiency - Prefix-Tuning Visual Prompt Tuning Prathvish Mithare
June 27, 2023 Parameter Efficiency - Adapters Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks Muhammed Saeed
July 4, 2023 Parameter Efficiency - Adapters LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning Rajkumar Anilkumar Vaghashiya
July 4, 2023 Generative Model - Text-to-Image High-resolution image synthesis with latent diffusion models Shreyash Arya
July 11, 2023 Generative Model - GPT GPT-4, GPT-4 Technical Report Abdul Rafay
July 11, 2023 Generative Model - GPT Visual ChatGPT: Talking, Drawing, and Editing with Visual Foundation Models -
July 18, 2023 Reinforcement Learning No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling -
July 18, 2023 Reinforcement Learning What Makes a Good Story? Designing Composite Rewards for Visual Storytelling -
July 25, 2023 Summary   Xudong Hong and Ruitao Feng


Please find it here.


Please find them here.



  1. Kickoff Meeting
  2. Introduction Meeting


Students should have a basic understanding of deep learning, natural language processing and computer vision concepts. Students are expected to actively engage in discussions and critically analyze the papers presented during the seminar. They are also encouraged to share their own insights and perspectives on the topics covered.

Discussion Format

We will have a group discussion on each paper, where participants need to first present the papers. Then others can share their thoughts and insights on the research.

Date and Time

every Tue 12:15-13:45

kick-off meeting on Apr 18, 2023 Location: Gebäude C7 3 - Seminarraum 1.12


If you have any questions or concerns, please contact us via email. We look forward to seeing you at the discussion!

Xudong Hong:

Ruitao Feng:

(The following is under construction. Please stay tuned. )