Seminar: Machine Learning for Language and Vision
News
May 9, 2023. Our grading standard is online.
Apr 26, 2023. Our tentative schedule is online. Please contact us ASAP if you have any questions.
Apr 17, 2023. Our kickoff meeting is scheduled for April 18, 2023, from 12:15 to 13:45, at C7 3 - Seminarraum 1.12. As the demand for attendance is exceedingly high, we have added more papers to our list. We kindly request that you attend the first two meetings to secure your spot. In the event that you are unable to attend but would still like to participate, please send us an email and we will arrange remote attendance for you.
Mar 27, 2023. If you are interested, please send Xudong Hong an email to register for this seminar.
Schedule
Date |
Topic |
Paper Title |
Presenter |
May 9, 2023 |
Contrastive Pre-training - Image |
Learning transferable visual models from natural language supervision |
Raj Mohan Tumarada |
May 9, 2023 |
Contrastive Pre-training - Image |
Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation |
Larisa Ivanova |
May 16, 2023 |
Seq2seq Pre-training - Image |
Simple Visual Language Model Pretraining with Weak Supervision |
Julian Schlenker |
May 16, 2023 |
Seq2seq Pre-training - Image |
Flava: A foundational language and vision alignment model |
Mehrad Zamani |
May 23, 2023 |
Pre-training - Video |
Merlot: Multimodal neural script knowledge models |
Yage Zhang |
May 23, 2023 |
Pre-training - Video |
Merlot reserve: Neural script knowledge through vision and language and sound |
- |
May 30, 2023 |
Multitask Learning |
Unifying Vision-and-Language Tasks via Text Generation |
Nitish Juttu |
May 30, 2023 |
Multitask Learning |
Unit: Multimodal multitask learning with a unified transformer |
Zixuan Liu |
June 6, 2023 |
Multitask Learning |
Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework |
Jakob Gürtler |
June 6, 2023 |
Parameter Efficiency - Prompting |
An empirical study of gpt-3 for few-shot knowledge-based vqa |
Raphael Maximilian Stephan Maser |
June 13, 2023 |
Parameter Efficiency - Prompting |
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language |
Karen Li |
June 13, 2023 |
Parameter Efficiency - Prompt Tuning |
Multimodal few-shot learning with frozen language models |
Muhammad Anas Tahir |
June 20, 2023 |
Parameter Efficiency - Prompt Tuning |
Transitional adaptation of pretrained models for visual storytelling |
Mahnoor Shahid |
June 20, 2023 |
Parameter Efficiency - Prefix-Tuning |
Hyperpelt: Unified parameter-efficient language model tuning for both language and vision-and-language tasks |
Sijie Wu |
June 27, 2023 |
Parameter Efficiency - Prefix-Tuning |
Visual Prompt Tuning |
Prathvish Mithare |
June 27, 2023 |
Parameter Efficiency - Adapters |
Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks |
Muhammed Saeed |
July 4, 2023 |
Parameter Efficiency - Adapters |
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning |
Rajkumar Anilkumar Vaghashiya |
July 4, 2023 |
Generative Model - Text-to-Image |
High-resolution image synthesis with latent diffusion models |
Shreyash Arya |
July 11, 2023 |
Generative Model - GPT |
GPT-4, GPT-4 Technical Report |
Abdul Rafay |
July 11, 2023 |
Generative Model - GPT |
Visual ChatGPT: Talking, Drawing, and Editing with Visual Foundation Models |
- |
July 18, 2023 |
Reinforcement Learning |
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling |
- |
July 18, 2023 |
Reinforcement Learning |
What Makes a Good Story? Designing Composite Rewards for Visual Storytelling |
- |
July 25, 2023 |
Summary |
|
Xudong Hong and Ruitao Feng |
Introduction
Please find it here.
Topics
Please find them here.
Grading
- 10% draft presentation (due each Wednesday)
- 10% questions about the papers (due each Friday)
- 35% final talk
- 5% Attendance of all the talks and giving feedback
- 5% Discussion during the talk with the others
- 35% Term paper on your understanding of the paper. 5 pages, using ACL 2023 template
- (optional) 10% Demo
Representations
- Kickoff Meeting
- Introduction Meeting
Requirement
Students should have a basic understanding of deep learning, natural language processing and computer vision concepts. Students are expected to actively engage in discussions and critically analyze the papers presented during the seminar. They are also encouraged to share their own insights and perspectives on the topics covered.
We will have a group discussion on each paper, where participants need to first present the papers. Then others can share their thoughts and insights on the research.
Date and Time
every Tue 12:15-13:45
kick-off meeting on Apr 18, 2023
Location: Gebäude C7 3 - Seminarraum 1.12
If you have any questions or concerns, please contact us via email. We look forward to seeing you at the discussion!
Xudong Hong: xhong@coli.uni-saarland.de
Ruitao Feng: fruitao@coli.uni-saarland.de
(The following is under construction. Please stay tuned. )