“Be Boundless.”
Haotian Zhang is a Research Scientist at Apple AI/ML, Visual Intelligence. His research aims to enable embodied agents to understand the outside world. To that end, he works on designing sensible modules that learn the effective representation of information from 2D/3D image data, as well as natural language. His recent work on GLIP&GLIPv2 has been accepted to the CVPR 2022 (Best Paper Finalist), and NeurIPS 2022. He also co-organized the ECCV 2022 workshop on Computer Vision in the Wild.
Prior to joining Apple, he obtained his Ph.D. in the Information Processing Lab at University of Washington, advised by Prof. Jenq-Neng Hwang, where he focused on monocular 3D object detection and multi-object tracking. He received his B.S. degree at Shanghai Jiao Tong University in 2017, supervised by Prof. Jun-Fa Mao.
He believes that living an interesting life is done by doing interesting things with interesting people, and that’s what he hopes to do 🔥.
Download CV here.
PhD in Elecricial & Computer Engineering, 2022
University of Washington
MSc in Applied Mathematics, 2021
University of Washington
MSc in Elecricial & Computer Engineering, 2019
University of Washington
BSc in Nano & Microelectronics, 2017
Shanghai Jiao Tong University
[10/2023] One Paper is accepted to WACV 2024: UDA(Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models).
[09/2023] A summary of my recent papers: (1) a new multimodal LLM that can refer and ground anything anywhere at any granularity Ferret. (2) using LLM and multimodal LLM for alt-text re-writing to improve CLIP training veCLIP.
[10/2022] Serving as session co-chair for ECCV CVinW Workshop and being responsible for ODinW. Full schedule here: https://computer-vision-in-the-wild.github.io/eccv-2022/.
[10/2022] Selected as one of the Young Scholar Award recipients for NeurIPS 2022.
[09/2022] One paper accepted to NeurIPS 2022: GLIPv2. A team effort to push CVinW
[08/2022] Updated GLIP Hugging Face Gradio Demo! Feel free to check it out!!!