About Me

I am a research scientist at NAVER AI Lab, South Korea. I am broadly interested in multimodal learning & computer vision. Mostly, I like human-centric research that interacts with humans and machines in general. I am particularly interested in video understanding, vision-language model, generative modeling, but not limited to. For more check out my CV.

I received PhD from Yonsei University, advised by Prof. Kwanghoon Sohn. Previously I interned at Adobe Research in 2021, working with Justin Salamon and Dingzeyu Li, and collaborated with Microsoft Research, working with Daniel McDuff in 2020.

Internship at NAVER AI Lab: I am always looking for interns to collaborate with! If you are interested in doing a cool multimodal learning project, please send me an email introducing yourself and describe your research interests and experience.


News

09/2023, Start a lecture, Topics in Artificial Intelligence: Multimodal Deep Learning Theories and Applications, at Seoul National University (Fall 2023).

07/2023, 2 papers are accepted at ICCV 2023.

04/2023, 1 paper is accepted at ICML 2023.

04/2023, 1 paper is accepted at CVPR Workshop 2023.

02/2023, 1 paper is accepted at CVPR 2023.

02/2023, 1 paper is accepted at ICASSP 2023.

11/2022, 1 paper is accepted at AAAI 2023.

10/2022, 1 paper is accepted at WACV 2023.

older news

09/2022, 1 paper is accepted at NeurIPS 2022.

07/2022, 1 paper is accepted at ECCV 2022.

03/2022, 2 papers are accepted at CVPR 2022.

01/2022, 1 paper is accepted at ICASSP 2022.

01/2022, 1 paper is accepted at CLeaR 2022.

12/2021, I join the NAVER AI Lab.

10/2021, 1 paper is accepted at BMVC 2021.

05/2021, I start a remote internship in the Creative Intelligence Lab at Adobe Research.

05/2021, 1 paper is accepted at ICIP 2021.

03/2021, 2 papers are accepted at CVPR 2021.

07/2020, 1 paper is accepted at ECCV 2020.

05/2020, 1 paper is accepted at IEEE TIP.

01/2020, I will join Human Understanding and Empathy Group, Microsoft Research, Redmond, United States in this year for research internship. (Canceled by COVID-19)


Publication

(* equal contribution)

Dense Text-to-Image Generation with Attention Modulation
Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, and Jun-Yan Zhu
IEEE/CVF International Conference on Computer Vision(ICCV), Oct, 2023.


Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
Hanjae Kim, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision(ICCV), Oct, 2023.


Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J Kim, and Jin-Hwa Kim
International Conference on Machine Learning (ICML), Jul, 2023.


Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild
Gyeongsik Moon, Hongsuk Choi, Sanghyuk Chun, Jiyoung Lee, and Sangdoo Yun
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPRW), Jun, 2023.


Dual-path Adaptation from Image to Video Transformers
JungIn Park*, Jiyoung Lee*, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2023.


Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee, Joon Son Chung, and Soo-Whan Chung
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun, 2023.


MIDMs: Matching Interleaved Diffusion Models for Exemplar-based Image Translation
Junyoung Seo, Gyuseong Lee, Seokju Cho, Jiyoung Lee, and Seungryong Kim
Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), Feb, 2023.


Language-free Training for Zero-shot Video Grounding
Dahye Kim, JungIn Park, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Jan, 2023.


Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, and Sang-Woo Lee
Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), Nov, 2022.


PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation
Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, and Kwanghoon Sohn
European Conference on Computer Vision (ECCV), Oct, 2022.


Pin the Memory: Learning to Generalize Semantic Segmentation
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2022.


Probabilistic Representations for Video Contrastive Learning
Jungin Park, Jiyoung Lee, Ig-Jae Kim, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2022.


Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution
Somi Jeong, Jiyoung Lee, and Kwanghoon Sohn
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May, 2022.


CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning
Daniel McDuff, Yale Song, Jiyoung Lee, Vibhav Vineet, Sai Vemprala, Nicholas Alexander Gyde, Hadi Salman, Shuang Ma, Kwanghoon Sohn, and Ashish Kapoor
Causal Learning and Reasoning (CLeaR), Apr, 2022.


Wide and Narrow: Video Prediction from Context and Motion
Jaehoon Cho, Jiyoung Lee, Changjae Oh, Wonil Song, and Kwanghoon Sohn
British Machine Vision Conference (BMVC), Nov, 2021.


Self-balanced Learning for Domain Generalization
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, and Kwanghoon Sohn
IEEE International Conference on Image Processing (ICIP), Sep, 2021.


Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee*, Soo-Whan Chung*, Sunok Kim, Hong-Goo Kang, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2021.


Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park, Jiyoung Lee, and Kwanghoon Sohn
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2021.


SumGraph: Video summarization via Recursive Graph Modeling
Jungin Park*, Jiyoung Lee*, Ig-Jae Kim, and Kwanghoon Sohn
European Conference on Computer Vision (ECCV), Aug, 2020.


Multi-modal Recurrent Attention Networks for Facial Expression Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE Transactions on Image Processing, Mar, 2020.


Video Summarization by Learning Relationships between Action and Scene
Jungin Park, Jiyoung Lee, Sangryul Jeon, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct, 2019. (3rd Award)


Context-Aware Emotion Recognition Networks
Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, and Kwanghoon Sohn
IEEE/CVF International Conference on Computer Vision(ICCV), Oct, 2019.


Graph Regularization Network With Semantic Affinity for Weakly-supervised Temporal Action Localization
Jungin Park, Jiyoung Lee, Sangryul Jeon, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Image Processing(ICIP), Sep, 2019.


Audio-Visual Attention Networks for Emotion Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
ACM Multimedia Workshop(MMW), Oct, 2018.


Learning to Detect, Associate, and Recognize Human Actions and Surrounding Scenes in Untrimmed Videos
Jungin Park, Sangryul Jeon, Seungryong Kim, Jiyoung Lee, Sunok Kim, and Kwanghoon Sohn
ACM Multimedia Workshop(MMW), Oct, 2018.


Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Apr, 2018.


Automatic 2D-to-3D Conversion using Multi-scale Deep Neural Network
Jiyoung Lee, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn
IEEE International Conference on Image Processing(ICIP), Sep, 2017.


Preprint

Language-Guided Recursive Spatiotemporal Graph Modeling for Video Summarization
Jungin Park, Jiyoung Lee, and Kwanghoon Sohn
IEEE Transactions on Pattern Analysis and Machine Intelligence, Dec, 2022. (Under Review).

Learning Discriminative Action Tubelets for Weakly-Supervised Action Detection
Jiyoung Lee, Seungryong Kim, Sunok Kim, and Kwanghoon Sohn
Pattern Recognition, May, 2021. (Under Review).

Professional Service