Posts by Collection

portfolio

publications

Improving Face Recognition with Large Age Gaps by Learning to Distinguish Children

Published in BMVC, 2021

Jungsoo Lee*, Jooyeol Yun*, Sunghyun Park, Yonggyu Kim, Jaegul Choo

TL;DR

Despite the unprecedented improvement of face recognition, existing face recognition models still show considerably low performances in determining whether a pair of child and adult images belong to the same identity. Previous approaches mainly focused on increasing the similarity between child and adult images of a given identity to overcome the discrepancy of facial appearances due to aging. However, we observe that reducing the similarity between child images of different identities is crucial for learning distinct features among children and thus improving face recognition performance in child-adult pairs. Based on this intuition, we propose a novel loss function called the Inter-Prototype loss which minimizes the similarity between child images. Unlike the previous studies, the Inter-Prototype loss does not require additional child images or training additional learnable parameters. Our extensive experiments and in-depth analyses show that our approach outperforms existing baselines in face recognition with child-adult pairs.

[Paper]

iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

Published in WACV, 2023

Jooyeol Yun*, Sanghyeon Lee*, Minho Park*, Jaegul Choo

TL;DR

Point-interactive image colorization aims to colorize grayscale images when a user provides the colors for specific locations. It is essential for point-interactive colorization methods to appropriately propagate user-provided colors (i.e., user hints) in the entire image to obtain a reasonably colorized image with minimal user effort. However, existing approaches often produce partially colorized results due to the inefficient design of stacking convolutional layers to propagate hints to distant relevant regions. To address this problem, we present iColoriT, a novel point-interactive colorization Vision Transformer capable of propagating user hints to relevant regions, leveraging the global receptive field of Transformers. The self-attention mechanism of Transformers enables iColoriT to selectively colorize relevant regions with only a few local hints. Our approach colorizes images in real-time by utilizing pixel shuffling, an efficient upsampling technique that replaces the decoder architecture. Also, in order to mitigate the artifacts caused by pixel shuffling with large upsampling ratios, we present the local stabilizing layer. Extensive quantitative and qualitative results demonstrate that our approach highly outperforms existing methods for point-interactive colorization, producing accurately colorized images with a user’s minimal effort.

[Paper]

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

Published in ICCV, 2023

Minho Park*, Jooyeol Yun*, Seunghwan Choi, Jaegul Choo

TL;DR

Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5 billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and corresponding layout pairs. Our experiments reveal that we can guide text-to-image generation models to be aware of the semantics of different image regions, by training the model to generate semantic labels for each pixel. We demonstrate that our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset, where text-image pairs are scarce.

[Paper]

talks

Talk 1 on Relevant Topic in Your Field

Published:

TL;DR

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

TL;DR

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

TL;DR

This is a description of a teaching experience. You can use markdown like any other post.