I am a PhD student at the University of Chinese Academy of Sciences (UCAS), jointly trained with Harbin Institute of Technology, Shenzhen (HITSZ), advised by Prof. Zhuotao Tian (田倬韬). Previously, I received my Master's degree in Electronic Information from Kunming University of Science and Technology, and my Bachelor's degree in Communication Engineering from Lanzhou Jiaotong University.
My research interests lie in multimodal large models, GUI-Agent, open-vocabulary object detection, and multimodal retrieval. I have published papers at top-tier venues including ICML, ICLR, NeurIPS, CVPR, AAAI, ACM MM, and ICME, and received the ACM MM 2025 Outstanding Paper Award. I have over two years of internship experience at ByteDance, Tencent, Xiaoice, and Huawei, working on human motion generation, unknown task detection, multimodal perception, and Multi-Agent research & deployment. Related algorithms have been applied to DNF, Honor of Kings, TikTok, and other mainstream platforms.
He anticipates graduating in 2027 for industrial research positions!
* denotes equal contribution, # denotes corresponding author. Full list on Google Scholar.