Haotian Zhang
Haotian Zhang
Home
Publications
Experience
CV
Light
Dark
Automatic
1
GLIPv2: Unifying Localization and Vision-Language Understanding
We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning).
Haotian Zhang
,
Pengchuan Zhang
,
Xiaowei Hu
,
Yen-Chun Chen
,
Liunian Harold Li
,
Xiyang Dai
,
Lijuan Wang
,
Lu Yuan
,
Jenq-Neng Hwang
,
Jianfeng Gao
PDF
Cite
Code
Project
GLIP: Grounded Language-Image Pre-training
This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training.
Liunian Harold Li
,
Pengchuan Zhang
,
Haotian Zhang
,
Jianwei Yang
,
Chunyuan Li
,
Yiwu Zhong
,
Lijuan Wang
,
Lu Yuan
,
Lei Zhang
,
Jenq-Neng Hwang
,
Kai-Wei Chang
,
Jianfeng Gao
PDF
Cite
Code
Project
Exploit the connectivity: Multi-object tracking with trackletnet
In this paper, we propose an innovative and effective tracking method called TrackletNet Tracker (TNT) that combines temporal and appearance information together as a unified framework.
Gaoang Wang
,
Yizhou Wang
,
Haotian Zhang
,
Renshu Gu
,
Jenq-Neng Hwang
PDF
Cite
Project
Eye in the sky: Drone-based object tracking and 3D localization
In this paper, a drone-based multi-object tracking and 3D localization scheme is proposed based on the deep learning based object detection.
Haotian Zhang
,
Gaoang Wang
,
Zhichao Lei
,
Jenq-Neng Hwang
PDF
Cite
Project
Cite
×