About Me

I am a research scientist at NAVER AI Lab, South Korea. I am broadly interested in multimodal learning & computer vision. Mostly, I am interested in video understanding, audio-visual/vision-language models, generative modeling, but not limited to. For more check out my CV.

I received PhD from Yonsei University, advised by Prof. Kwanghoon Sohn. Previously I interned at Adobe Research in 2021, working with Justin Salamon and Dingzeyu Li, and collaborated with Microsoft Research, working with Daniel McDuff in 2020.

Internship at NAVER AI Lab: I am always looking for interns to collaborate with! If you are interested in doing a cool multimodal learning project, please send me an email introducing yourself and describing your research interests and experience.


10/2024, 1 paper is accepted in NeurIPS 2024 Workshop on Video-Language Models.

09/2024, 1 paper is accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

09/2024, Start a lecture, Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, at Seoul National University (Fall 2024).

06/2024, 1 paper is accepted in Pattern Recognition.

01/2024, 2 papers are accepted at ICLR 2024.

09/2023, Start a lecture, Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, at Seoul National University (Fall 2023).

07/2023, 2 papers are accepted at ICCV 2023.

04/2023, 1 paper is accepted at ICML 2023.

04/2023, 1 paper is accepted at CVPR Workshop 2023.

02/2023, 1 paper is accepted at CVPR 2023.

02/2023, 1 paper is accepted at ICASSP 2023.

older news

11/2022, 1 paper is accepted at AAAI 2023.

10/2022, 1 paper is accepted at WACV 2023.

09/2022, 1 paper is accepted at NeurIPS 2022.

07/2022, 1 paper is accepted at ECCV 2022.

03/2022, 2 papers are accepted at CVPR 2022.

01/2022, 1 paper is accepted at ICASSP 2022.

01/2022, 1 paper is accepted at CLeaR 2022.

12/2021, I join the NAVER AI Lab.

10/2021, 1 paper is accepted at BMVC 2021.

05/2021, I start a remote internship in the Creative Intelligence Lab at Adobe Research.

05/2021, 1 paper is accepted at ICIP 2021.

03/2021, 2 papers are accepted at CVPR 2021.

07/2020, 1 paper is accepted at ECCV 2020.

05/2020, 1 paper is accepted in IEEE TIP.

01/2020, I will join the Human Understanding and Empathy Group, Microsoft Research, Redmond, United States in this year for a research internship. (Canceled by COVID-19)


$^\star$ equal contribution, $^\dagger$ corresponding author(s)

Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong, Yunji Kim, Sanghyuk Chun, Jiyoung Lee$\dagger$
NeurIPS Workshop on Video-Language Models, Dec 2024.

Prototype-guided Attention Distillation for Discriminative Person Search
Hanjae Kim, Jiyoung Lee, and Kwanghoon Sohn$\dagger$
IEEE Transactions on Pattern Analysis and Machine Intelligence, Sep 2024.

Discriminative Action Tubelet Detector for Weakly-supervised Action Detection
Jiyoung Lee, Seungryong Kim, Sunok Kim, and Kwanghoon Sohn$\dagger$
Pattern Recognition, Jun 2024.

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
Junyoung Seo$\star$, Wooseok Jang$\star$, Min-Seop Kwak$\star$, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim$\dagger$, Jiyoung Lee$\dagger$, and Seungryong Kim$\dagger$
International Conference on Learning Representations (ICLR), May 2024.

Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park, Jiyoung Lee$\dagger$, and Kwanghoon Sohn$\dagger$
International Conference on Learning Representations (ICLR), May 2024.

Dense Text-to-Image Generation with Attention Modulation
Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, and Jun-Yan Zhu
IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2023.

Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
Hanjae Kim, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2023.

Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J Kim, and Jin-Hwa Kim
International Conference on Machine Learning (ICML), Jul 2023.

Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild
Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, and Sangdoo Yun
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Jun 2023.

Dual-path Adaptation from Image to Video Transformers
JungIn Park$\star$, Jiyoung Lee$\star$, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2023.

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee, Joon Son Chung, and Soo-Whan Chung
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023.

MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation
Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, and Seungryong Kim
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), Feb 2023.

Language-free Training for Zero-shot Video Grounding
Dahye Kim, JungIn Park, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Jan 2023.

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, and Sang-Woo Lee
Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), Nov 2022.

PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation
Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, and Kwanghoon Sohn
European Conference on Computer Vision (ECCV), Oct 2022.

Pin the Memory: Learning to Generalize Semantic Segmentation
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min$\dagger$, and Kwanghoon Sohn$\dagger$
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022.

Probabilistic Representations for Video Contrastive Learning
Jungin Park, Jiyoung Lee, Ig-Jae Kim, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022.

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution
Somi Jeong, Jiyoung Lee, and Kwanghoon Sohn
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022.

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning
Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Alexander Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, and Ashish Kapoor
Causal Learning and Reasoning (CLeaR), Apr 2022.

Wide and Narrow: Video Prediction from Context and Motion
Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, and Kwanghoon Sohn
British Machine Vision Conference (BMVC), Nov 2021.

Self-balanced Learning for Domain Generalization
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, and Kwanghoon Sohn
IEEE International Conference on Image Processing (ICIP), Sep 2021.

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee*, Soo-Whan Chung*, Sunok Kim, Hong-Goo Kang, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park, Jiyoung Lee, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021.

SumGraph: Video summarization via Recursive Graph Modeling
Jungin Park*, Jiyoung Lee*, Ig-Jae Kim, and Kwanghoon Sohn
European Conference on Computer Vision (ECCV), Aug 2020.

Multi-modal Recurrent Attention Networks for Facial Expression Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE Transactions on Image Processing, Mar 2020.

Video Summarization by Learning Relationships between Action and Scene
Jungin Park, Jiyoung Lee, Sangryul Jeon, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct 2019. (3rd Award)

Context-Aware Emotion Recognition Networks
Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision(ICCV), Oct 2019.

Graph Regularization Network With Semantic Affinity for Weakly-supervised Temporal Action Localization
Jungin Park, Jiyoung Lee, Sangryul Jeon, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Image Processing(ICIP), Sep 2019.

Audio-Visual Attention Networks for Emotion Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
ACM Multimedia Workshop(MMW), Oct 2018.

Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos
Jungin Park, Sangryul Jeon, Seungryong Kim, Jiyoung Lee, Sunok Kim, and Kwanghoon Sohn
ACM Multimedia Workshop(MMW), Oct 2018.

Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Apr 2018.

Automatic 2D-to-3D Conversion using Multi-scale Deep Neural Network
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Image Processing(ICIP), Sep 2017.


Language-Guided Recursive Spatiotemporal Graph Modeling for Video Summarization
Jungin Park, Jiyoung Lee, and Kwanghoon Sohn$\dagger$
International Journal of Computer Vision, Feb 2024. (Under Review).

Professional Service