Haotian Zhang

Haotian Zhang

Research Scientist at Apple
Former Research Intern at MSR AI
Ph.D. at University of Washington

Apple AI/ML


“Be Boundless.”

Haotian Zhang is a Research Scientist at Apple AI/ML, Visual Intelligence. His research aims to enable embodied agents to understand the outside world. To that end, he works on designing sensible modules that learn the effective representation of information from 2D/3D image data, as well as natural language. His recent work on GLIP&GLIPv2 has been accepted to the CVPR 2022 (Best Paper Finalist), and NeurIPS 2022. He also co-organized the ECCV 2022 workshop on Computer Vision in the Wild.

Prior to joining Apple, he obtained his Ph.D. in the Information Processing Lab at University of Washington, advised by Prof. Jenq-Neng Hwang, where he focused on monocular 3D object detection and multi-object tracking. He received his B.S. degree at Shanghai Jiao Tong University in 2017, supervised by Prof. Jun-Fa Mao.

He believes that living an interesting life is done by doing interesting things with interesting people, and that’s what he hopes to do 🔥.

Download CV here.

  • Open-Vocabulary Object Detection
  • Vision-and-Language Pre-training
  • Large-scaled Pre-trained Models
  • 3D Monocular Object Detection
  • 2D/3D Multi-object Tracking
  • PhD in Elecricial & Computer Engineering, 2022

    University of Washington

  • MSc in Applied Mathematics, 2021

    University of Washington

  • MSc in Elecricial & Computer Engineering, 2019

    University of Washington

  • BSc in Nano & Microelectronics, 2017

    Shanghai Jiao Tong University

Recent News

All news»

[10/2023] One Paper is accepted to WACV 2024: UDA(Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models).

[09/2023] A summary of my recent papers: (1) a new multimodal LLM that can refer and ground anything anywhere at any granularity Ferret. (2) using LLM and multimodal LLM for alt-text re-writing to improve CLIP training veCLIP.

[10/2022] Serving as session co-chair for ECCV CVinW Workshop and being responsible for ODinW. Full schedule here: https://computer-vision-in-the-wild.github.io/eccv-2022/.

[10/2022] Selected as one of the Young Scholar Award recipients for NeurIPS 2022.

[09/2022] One paper accepted to NeurIPS 2022: GLIPv2. A team effort to push CVinW

[08/2022] Updated GLIP Hugging Face Gradio Demo! Feel free to check it out!!!

Working Experience

Apple AI/ML
Research Scientist, Visual Intelligence
Jan 2023 – Present Cuppertino, California
Research scientist @ Visual Intelligence Team, directed by Yinfei Yang. I will continue pushing the boundary of CV (OD) and Multi-modal intelligence on my new position with this great team.
Microsoft Research
Research Intern, Deep Learning
Jun 2021 – Mar 2022 Redmond, Washington
Research Intern @ Deep Learning Group, mentored by Pengchuan Zhang, Jianwei Yang, Chunyuan Li, and Jianfeng Gao. It’s my great honor and pleasure to work with such a talented team.
Azure AI
Research Intern, Computer Vision
Jun 2021 – Sep 2021 Redmond, Washington
Research Intern @ Visual Document Intelligence Team, mentored by Dinei Florencio, Yijuan Lu, and Guoxin Wang. I appreciate their helpful guidance and suggestions during the internship.
University of Washington
Research Ph.D. student, ECE
Sep 2017 – Present Seattle, Washington
Ph.D. student @ Information Processing Lab, supervised by Prof. Jenq-Neng Hwang. “A teacher for a day is a father for a lifetime.” 👨‍🏫