Bin Kang

康斌

PhD Student
University of Chinese Academy of Sciences
(Joint training with Harbin Institute of Technology, Shenzhen)
Advisor: Prof. Zhuotao Tian (田倬韬)
Shenzhen, China

Email Google Scholar GitHub

Biography

I am a PhD student at the University of Chinese Academy of Sciences (UCAS), jointly trained with Harbin Institute of Technology, Shenzhen (HITSZ), advised by Prof. Zhuotao Tian (田倬韬). Previously, I received my Master's degree in Electronic Information from Kunming University of Science and Technology, and my Bachelor's degree in Communication Engineering from Lanzhou Jiaotong University.

My research interests lie in multimodal large models, GUI-Agent, open-vocabulary object detection, and multimodal retrieval. I have published papers at top-tier venues including ICML, ICLR, NeurIPS, CVPR, AAAI, ACM MM, and ICME, and received the ACM MM 2025 Outstanding Paper Award. I have over two years of internship experience at ByteDance, Tencent, Xiaoice, and Huawei, working on human motion generation, unknown task detection, multimodal perception, and Multi-Agent research & deployment. Related algorithms have been applied to DNF, Honor of Kings, TikTok, and other mainstream platforms.

He anticipates graduating in 2027 for industrial research positions!

Multimodal Large Models GUI-Agent Open-Vocabulary Object Detection Multimodal Retrieval

News

2026 One paper on TTS Agent accepted by ICML 2026.
2026 One paper on GUI Agent accepted by ICLR 2026.
2025 One paper on video understanding accepted by NeurIPS 2025.
2025 One paper on open-vocabulary perception accepted by CVPR 2025.
2025 One paper on open-vocabulary detection accepted by AAAI 2025.
2025 One paper on text-driven image retrieval accepted by ACM MM 2025.

Education

🎓

University of Chinese Academy of Sciences

PhD in Computer Science and Technology (Joint training with HIT Shenzhen)

2023.09 - Present

🎓

Kunming University of Science and Technology

Master's in Electronic Information

2020.08 - 2023.06

🎓

Lanzhou Jiaotong University

Bachelor's in Communication Engineering

2016.08 - 2020.06

Publications

* denotes equal contribution, # denotes corresponding author. Full list on Google Scholar.

AgentSteer TTS

AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech

Bin Kang, Shaoguo Wen, Yang Fan, Shunlong Wu, Junjie Wang, Yulin Li, Jing Zhao, Ju Wang, Zhiyong Tian

Paper (ICML 2026) arXiv

LongHorizon UI

LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent

Bin Kang, Shaoguo Wen, Yifei Bi, Shunlong Wu, Xinbin Yuan, Rui Shao, et al.

Paper (ICLR 2026)

Less is More

Less is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior

Yulin Li, Hao Gui, Zhuo Fan, Junjie Wang, Bin Kang, Boce Chen, Zhiyong Tian

Paper (NeurIPS 2025)

DeClip

Declip: Decoupled Learning for Open-Vocabulary Dense Perception

Junjie Wang, Boce Chen, Yulin Li, Bin Kang, Yichen Chen, Zhiyong Tian

Paper (CVPR 2025)

OV-DQUO

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Junjie Wang, Boce Chen, Bin Kang, Yulin Li, Wei Xian, Yichen Chen, Yao Xu

Paper (AAAI 2025)

CalibClip

CalibClip: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval

Bin Kang, Boce Chen, Junjie Wang, Yulin Li, Jing Zhao, Ju Wang, Zhiyong Tian

Paper (ACM MM 2025) Outstanding Paper Award

DASS-Net

DASS-Net: Degradation-Adaptive Semi-Symmetric Network for Robust Image Hiding

Jing Zhao, Hao He, Fan Chen, Liang Qu, Bin Kang

Paper (IEEE TCSVT)

Multi-Attr VL

Multi-Attribute Consistency Driven Visual Language Framework for Surface Defect Detection

Bin Kang, Boce Chen, Junjie Wang, Wei Xian, Hao Chang

Paper (ICME 2024)

Multi-Path

Multi-Path Exploration and Feedback Adjustment for Text-to-Image Person Retrieval

Bin Kang, Boce Chen, Junjie Wang, Yao Xu

arXiv

Experience

ByteDance

Multimodal Algorithm Intern

2026.02 - 2026.05

Tencent

Technical Research - Algorithm Intern

2025.03 - 2026.02

Huawei

Algorithm Intern (Cloud Computing)

2023.02 - 2023.08

Xiaoice (formerly Microsoft XiaoIce)

Computer Vision Algorithm Intern (Research Department)

2022.08 - 2023.03