Hi, I am now an Associate Professor with Multimedi Institute of Tianjin University. I received the B.E and Ph.D. degree from Shanghai Jiao Tong Univeristy under the supervision of Prof. Xiaokang Yang and Prof. Guangtao Zhai. I am also supervised by Prof. Chang Wen Chen when visiting at SUNY@Buffalo from 2014-2015. My research interest includes: image/video processing and cross-modal video content analysis.
I have contributed to deep learning methods for solving artificial intelligence challenges in real-world applications, including image enhancement, anomaly detection, recommendation systems, and cross modal retrieval. My research has been published in top conferences and journals, including TIP, TCSVT, TMM, DSP, CVPR, and IFTC . My proposal received funding from the National Natural Science Foundation of China in 2023. I have served as the coordinator of the national key research and development program, the “collaborative dissemination and recommendation technology of mainstream values”, the director of the Tianjin Natural Science Foundation project “Research on video image bit depth up conversion”, and the director of the National Natural Science Foundation project “Research on key technology of bit depth up conversion based on natural image characteristics”.
We are looking for self-motivated graduate students working with me. For prospective students, please send your resume and transcript to my email.
课题组招收2024年秋季入学硕士研究生,欢迎保研或考研同学邮件联系!
PhD in Computer Science, 2011-2017
Shanghai Jiao Tong University, China
Visiting Scholar in Computer Science, 2014-2015
State University of New York, U.S
BSc in Computer Science and Technology, 2007-2011
Shanghai Jiao Tong University, China
Owing to its inherently dynamic nature and economical training cost, offline reinforcement learning (RL) is typically employed to implement an interactive recommender system (IRS). A crucial challenge in offline RL-based IRSs is the data sparsity issue, i.e. , it is hard to mine user preferences well from the limited number of user-item interactions. In this paper, we propose a knowledge-enhanced causal reinforcement learning model (KCRL) to mitigate data sparsity in IRSs. We make technical extensions to the offline RL framework in terms of the reward function and state representation. Specifically, we first propose a group preference-injected causal user model (GCUM) to learn user satisfaction ( i.e. , reward) estimation. We introduce beneficial group preference information, namely, the group effect, via causal inference to compensate for incomplete user interests extracted from sparse data. Then, we learn the RL recommendation policy with the reward given by the GCUM. We propose a knowledge-enhanced state encoder (KSE) to generate knowledge-enriched user state representations at each time step, which is assisted by a self-constructed user-item knowledge graph.
Comprehensive understanding of video content requires both spatial and temporal localization. However, there lacks a unified video action localization framework, which hinders the coordinated development of this field. Existing 3D CNN methods take fixed and limited input length at the cost of ignoring temporally long-range cross-modal interaction. On the other hand, despite having large temporal context, existing sequential methods often avoid dense cross-modal interactions for complexity reasons. To address this issue, in this paper, we propose a unified framework which handles the whole video in sequential manner with long-range and dense visual-linguistic interaction in an end-to-end manner. Specifically, a lightweight relevance filtering based transformer (Ref-Transformer) is designed, which is composed of relevance filtering based attention and temporally expanded MLP. The text-relevant spatial regions and temporal clips in video can be efficiently highlighted through the relevance filtering and then propagated among the whole video sequence with the temporally expanded MLP.
📙Scientific Computing Language
📘Big Data Analysis (in English)
📕Big Data Analysis
📓Advanced Course in Big Data Analysis (in English)
📗Multimedia Intelligent Analysis and Computing