About Me

I am an assistant professor of the Department of AI at Ewha Womans University. Before joining Ewha Womans University, I was a research scientist at NAVER AI Lab from Dec. 2021 to Feb. 2025. I received PhD from Yonsei University, advised by Prof. Kwanghoon Sohn. Previously I interned at Adobe Research in 2021, working with Justin Salamon and Dingzeyu Li, and collaborated with Microsoft Research, working with Daniel McDuff in 2020.

I am broadly interested in multimodal learning & computer vision. Mostly, I am interested in audio-visual/vision-language models, generative modeling, and video understanding, but not limited to. For more check out my CV.

Recruiting Undergraduate Interns/ Graduate Master&PhD Students: I am always looking for undergraduate interns, and graduate students to collaborate with! If you are interested in doing cool multimodal learning research, please send me an email ({last_name}.{first_name}@ewha.ac.kr) introducing yourself and describing your research interests and experience.

News

04/2025, Multimodal AI Lab @ EWHA website is now open! 👋

02/2025, 1 paper is accepted in CVPR 2025.

02/2025, I will join in Dept. of AI, Ewha Womans University.

12/2024, Giving a talk at Postech AI day (topic: Read, Watch and Scream! Sound Generation from Text and Video).

12/2024, Giving a talk at HUST, Vietnam (topic: Audio Generation from Visual Contents).

12/2024, 1 paper is accepted in AAAI 2025.

10/2024, 1 paper is accepted in NeurIPS 2024 Workshop on Video-Language Models.

09/2024, 1 paper is accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

09/2024, Start a lecture, Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, at Seoul National University (Fall 2024).

06/2024, 1 paper is accepted in Pattern Recognition.

01/2024, 2 papers are accepted at ICLR 2024.

older news

09/2023, Start a lecture, Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, at Seoul National University (Fall 2023).

07/2023, 2 papers are accepted at ICCV 2023.

04/2023, 1 paper is accepted at ICML 2023.

04/2023, 1 paper is accepted at CVPR Workshop 2023.

02/2023, 1 paper is accepted at CVPR 2023.

02/2023, 1 paper is accepted at ICASSP 2023.

11/2022, 1 paper is accepted at AAAI 2023.

10/2022, 1 paper is accepted at WACV 2023.

09/2022, 1 paper is accepted at NeurIPS 2022.

07/2022, 1 paper is accepted at ECCV 2022.

03/2022, 2 papers are accepted at CVPR 2022.

01/2022, 1 paper is accepted at ICASSP 2022.

01/2022, 1 paper is accepted at CLeaR 2022.

12/2021, I join the NAVER AI Lab.

10/2021, 1 paper is accepted at BMVC 2021.

05/2021, I start a remote internship in the Creative Intelligence Lab at Adobe Research.

05/2021, 1 paper is accepted at ICIP 2021.

03/2021, 2 papers are accepted at CVPR 2021.

07/2020, 1 paper is accepted at ECCV 2020.

05/2020, 1 paper is accepted in IEEE TIP.

01/2020, I will join the Human Understanding and Empathy Group, Microsoft Research, Redmond, United States in this year for a research internship. (Canceled by COVID-19)

Publication

$^\star$ equal contribution, $^\dagger$ corresponding author(s)

Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
Jungin Park, Jiyoung Lee^$\dagger$, and Kwanghoon Sohn^$\dagger$
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2025.
Paper

Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee^$\dagger$
NeurIPS Workshop on Video-Language Models, Dec 2024.
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), Feb 2025. Paper Project Code

Prototype-guided Attention Distillation for Discriminative Person Search
Hanjae Kim, Jiyoung Lee, and Kwanghoon Sohn^$\dagger$
IEEE Transactions on Pattern Analysis and Machine Intelligence, Sep 2024. Paper

Discriminative Action Tubelet Detector for Weakly-supervised Action Detection
Jiyoung Lee, Seungryong Kim, Sunok Kim, and Kwanghoon Sohn^$\dagger$
Pattern Recognition, Jun 2024. Paper

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
Junyoung Seo^$\star$, Wooseok Jang^$\star$, Min-Seop Kwak^$\star$, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim^$\dagger$, Jiyoung Lee^$\dagger$, and Seungryong Kim^$\dagger$
International Conference on Learning Representations (ICLR), May 2024. Preprint Code Project Demo

Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park, Jiyoung Lee^$\dagger$, and Kwanghoon Sohn^$\dagger$
International Conference on Learning Representations (ICLR), May 2024. Preprint Code

Dense Text-to-Image Generation with Attention Modulation
Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, and Jun-Yan Zhu
IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2023. Preprint Code Demo

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
Hanjae Kim, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2023. Preprint Code

Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J Kim, and Jin-Hwa Kim
International Conference on Machine Learning (ICML), Jul 2023.
Preprint

Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild
Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, and Sangdoo Yun
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Jun 2023.
Preprint Code

Dual-path Adaptation from Image to Video Transformers
JungIn Park^$\star$, Jiyoung Lee^$\star$, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2023.
Preprint Code

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee, Joon Son Chung, and Soo-Whan Chung
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023. Preprint Project Code

MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation
Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, and Seungryong Kim
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), Feb 2023.
Preprint Project Code

Language-free Training for Zero-shot Video Grounding
Dahye Kim, JungIn Park, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Jan 2023.
Preprint

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, and Sang-Woo Lee
Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), Nov 2022.
Preprint Code

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation
Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, and Kwanghoon Sohn
European Conference on Computer Vision (ECCV), Oct 2022.
Preprint

Pin the Memory: Learning to Generalize Semantic Segmentation
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min^$\dagger$, and Kwanghoon Sohn^$\dagger$
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022.
Paper Project

Probabilistic Representations for Video Contrastive Learning
Jungin Park, Jiyoung Lee, Ig-Jae Kim, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022.
Paper

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution
Somi Jeong, Jiyoung Lee, and Kwanghoon Sohn
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022.
Preprint

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning
Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Alexander Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, and Ashish Kapoor
Causal Learning and Reasoning (CLeaR), Apr 2022.
Preprint Project

Wide and Narrow: Video Prediction from Context and Motion
Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, and Kwanghoon Sohn
British Machine Vision Conference (BMVC), Nov 2021.
Paper

Self-balanced Learning for Domain Generalization
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, and Kwanghoon Sohn
IEEE International Conference on Image Processing (ICIP), Sep 2021.
Paper

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee^*, Soo-Whan Chung^*, Sunok Kim, Hong-Goo Kang, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
Paper Project

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park, Jiyoung Lee, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.
Paper

SumGraph: Video summarization via Recursive Graph Modeling
Jungin Park^*, Jiyoung Lee^*, Ig-Jae Kim, and Kwanghoon Sohn
European Conference on Computer Vision (ECCV), Aug 2020.
Paper

Multi-modal Recurrent Attention Networks for Facial Expression Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE Transactions on Image Processing (TIP), Mar 2020. Paper

Video Summarization by Learning Relationships between Action and Scene
Jungin Park, Jiyoung Lee, Sangryul Jeon, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct 2019. (3rd Award)
Paper

Context-Aware Emotion Recognition Networks
Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2019.
Paper Project

Graph Regularization Network With Semantic Affinity for Weakly-supervised Temporal Action Localization
Jungin Park, Jiyoung Lee, Sangryul Jeon, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Image Processing (ICIP), Sep 2019.
Paper

Audio-Visual Attention Networks for Emotion Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
ACM Multimedia Workshop (MMW), Oct 2018.
Paper

Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos
Jungin Park, Sangryul Jeon, Seungryong Kim, Jiyoung Lee, Sunok Kim, and Kwanghoon Sohn
ACM Multimedia Workshop (MMW), Oct 2018.
Paper

Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018.
Paper

Automatic 2D-to-3D Conversion using Multi-scale Deep Neural Network
Jiyoung Lee, Hyungjoo Jung, Youngjung Kim, and Kwanghoon Sohn
IEEE International Conference on Image Processing (ICIP), Sep 2017.
Paper

Preprint

Language-Guided Recursive Spatiotemporal Graph Modeling for Video Summarization
Jungin Park, Jiyoung Lee, and Kwanghoon Sohn^$\dagger$
International Journal of Computer Vision, Feb 2024. (Under Review).

Professional Service

Reviewer: ICML 2023-2025, AAAI 2025, SIGGRAPH 2024, NeurIPS 2023, ICCV 2023, CVPR 2022-2024, ICASSP 2023, ECCV 2022-2024, IEEE TPAMI, IEEE TIP, IEEE Access
Lecture
- “Intelligent Algorithm” and “Speech Recognition”, Ewha Womans University, Mar 2025.
- Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, Seoul National University, Fall 2024 (AI773).
- Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, Seoul National University, Fall 2023 (AI773).