Introduction to Professor Jiaying Liu and Her Lab at Peking University

Could you briefly introduce yourself (and your University/Lab)?

I am currently an Associate Professor, Boya Young Fellow at Peking University. I received the Ph.D. degree (Hons.) in computer science from Peking University, Beijing China, 2010. I have authored over 100 technical articles in refereed journals and proceedings, and holds 50 granted patents. My current research interests include multimedia signal processing, compression, and computer vision.

I am a Senior Member of IEEE, CSIG and CCF. I was a Visiting Scholar with the University of Southern California, Los Angeles, from 2007 to 2008. I was a Visiting Researcher with the Microsoft Research Asia in 2015 supported by the Star Track Young Faculties Award. I received the IEEE ICME-2020 Best Paper Award and IEEE MMSP-2015 Top10% Paper Award. I have served as a member of MSA TC, and VSPC TC in IEEE CASS, and IEEE ICME Steering Committee. I have also served as the Associate Editor of IEEE TIP, IEEE TCSVT and Elsevier JVCI, the Technical Program Chair of IEEE ICME-2021/ACM ICMR-2021/IEEE VCIP-2019, and the Area Chair of CVPR-2021/ECCV-2020/ICCV-2019. I was the APSIPA Distinguished Lecturer (2016-2017).

What have been your most significant research contributions up to now?

My work focuses on the researches of image/video restoration, stylization/translation, and analysis/understanding.

For image/video enhancement, we are one of the first teams to develop deep-learning deraining methods. Our CVPR’17 and TPAMI’21 is one of the pioneering works on deep-learning (DL) based single-image deraining and our CVPR’18 and TIP’18 also open a new door to DL-based video deraining. In 2020, we make a comprehensive survey for the deraining field on TPAMI’21. Recently, we make effort in Unsupervised self-learned deraining, which makes the deraining totally get rid of the reliance on the synthetic data and step into the data-free age. This work is potentially to have profound influence and change the whole field in the future.

For image stylization, we put forward a new research problem of text effect transfer and artistic text generation, which is of high application value in design industry and research value in style transfer and disentangled feature learning. Our CVPR’17 and TIP’19 propose the first framework for supervised text effect transfer and unsupervised artistic text generation. Our AAAI’19 and TPAMI’20 benchmark text effect transfer with a large-scale text effects dataset constructed to promote the development of this problem. Recently, we explore the potential of dynamic artistic text generation in ICCV’19 and TPAMI’21.

For action analysis/understanding, we mainly focus on skeleton-based action analytics. Our spatio-temporal attention model published on AAAI’17 and TIP’18 attracts much attention in the area of action recognition and detection. Our TOMM’20 presents the currently largest multi-modal video dataset, PKU-MMD, collected by ourselves, to further facilitate the research community. Recently, we are working on self-supervised learning for action recognition to get rid of tedious annotation on ACM MM’20. Besides, our CVPR’21 proposes an elegant framework for referring expression comprehension in videos, which is a step towards exploring the connection between CV and NLP in video analytics.

What problems in your research field deserve more attention (or what problems will you like to solve) in the next few years, and why?

1) Unsupervised restoration/enhancement. Most existing methods heavily rely on synthetic paired data, which might result in large gaps with real applications. It will be desirable to develop unsupervised or semi-supervised methods that can be well generalized to real applications.

2) Restoration/enhancement for machines. With the rapid growth of works on rain removal, it is still challenging to measure whether a method is sufficiently effective. Existing quality assessment methods are still far from capturing real visual perception of humans. Thus, there is a potential direction in which the community can pay more attention to.

3) Few-shot text effect transfer. Due to the high regularity and diversity of text effects, it still requires a large-scale style reference images for style transfer models to learn to render plausible text effects. It will be desirable to leverage the unique characteristics of the text to simulate the design process of humans to realize style transfer with limited style references.

4) Action analytics based on noisy skeletons. Previous works on skeleton-based action recognition are developed and evaluated with skeletons captured in the laboratory. However, skeletons in real scenarios are with much noise caused by various factors, leading to performance degradation. There is a lack of exploration for skeleton degradation modeling. Besides, how to deal with noisy skeletons in the real world is essential for applications of skeleton-based action analytics.