Bin Kang

Bin Kang

康斌
PhD Student
University of Chinese Academy of Sciences
(Joint training with Harbin Institute of Technology, Shenzhen)
Advisor: Prof. Zhuotao Tian (田倬韬)
Shenzhen, China

Biography

I am a PhD student at the University of Chinese Academy of Sciences (UCAS), jointly trained with Harbin Institute of Technology, Shenzhen (HITSZ), advised by Prof. Zhuotao Tian (田倬韬). Previously, I received my Master's degree in Electronic Information from Kunming University of Science and Technology, and my Bachelor's degree in Communication Engineering from Lanzhou Jiaotong University.

My research interests lie in multimodal large models, GUI-Agent, open-vocabulary object detection, and multimodal retrieval. I have published papers at top-tier venues including ICML, ICLR, NeurIPS, CVPR, AAAI, ACM MM, and ICME, and received the ACM MM 2025 Outstanding Paper Award. I have over two years of internship experience at ByteDance, Tencent, Xiaoice, and Huawei, working on human motion generation, unknown task detection, multimodal perception, and Multi-Agent research & deployment. Related algorithms have been applied to DNF, Honor of Kings, TikTok, and other mainstream platforms.

He anticipates graduating in 2027 for industrial research positions!

Multimodal Large Models GUI-Agent Open-Vocabulary Object Detection Multimodal Retrieval

News

Education

🎓

University of Chinese Academy of Sciences

PhD in Computer Science and Technology (Joint training with HIT Shenzhen)
2023.09 - Present
🎓

Kunming University of Science and Technology

Master's in Electronic Information
2020.08 - 2023.06
🎓

Lanzhou Jiaotong University

Bachelor's in Communication Engineering
2016.08 - 2020.06

Publications

* denotes equal contribution, # denotes corresponding author. Full list on Google Scholar.

AgentSteer TTS
AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech
Bin Kang, Shaoguo Wen, Yang Fan, Shunlong Wu, Junjie Wang, Yulin Li, Jing Zhao, Ju Wang, Zhiyong Tian
LongHorizon UI
LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent
Bin Kang, Shaoguo Wen, Yifei Bi, Shunlong Wu, Xinbin Yuan, Rui Shao, et al.
Less is More
Less is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
Yulin Li, Hao Gui, Zhuo Fan, Junjie Wang, Bin Kang, Boce Chen, Zhiyong Tian
DeClip
Declip: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang, Boce Chen, Yulin Li, Bin Kang, Yichen Chen, Zhiyong Tian
OV-DQUO
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang, Boce Chen, Bin Kang, Yulin Li, Wei Xian, Yichen Chen, Yao Xu
CalibClip
CalibClip: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval
Bin Kang, Boce Chen, Junjie Wang, Yulin Li, Jing Zhao, Ju Wang, Zhiyong Tian
DASS-Net
DASS-Net: Degradation-Adaptive Semi-Symmetric Network for Robust Image Hiding
Jing Zhao, Hao He, Fan Chen, Liang Qu, Bin Kang
Multi-Attr VL
Multi-Attribute Consistency Driven Visual Language Framework for Surface Defect Detection
Bin Kang, Boce Chen, Junjie Wang, Wei Xian, Hao Chang
Multi-Path
Multi-Path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval
Bin Kang, Boce Chen, Junjie Wang, Yao Xu

Experience

ByteDance

Multimodal Algorithm Intern
2026.02 - 2026.05

Tencent

Technical Research - Algorithm Intern
2025.03 - 2026.02

Huawei

Algorithm Intern (Cloud Computing)
2023.02 - 2023.08

Xiaoice (formerly Microsoft XiaoIce)

Computer Vision Algorithm Intern (Research Department)
2022.08 - 2023.03