Multi-object tracking (MOT) is an important topic and critical task related to both static and moving camera applications, such as traffic flow analysis, autonomous driving and robotic vision. However, due to unreliable detection, occlusion and fast camera motion, tracked targets can be easily lost, which makes MOT very challenging. Most recent works exploit spatial and temporal information for MOT, but how to combine appearance and temporal features is still not well addressed. In this paper, we propose an innovative and effective tracking method called TrackletNet Tracker (TNT) that combines temporal and appearance information together as a unified framework. First, we define a graph model which treats each tracklet as a vertex. The tracklets are generated by associating detection results frame by frame with the help of the appearance similarity and the spatial consistency. To compensate camera movement, epipolar constraints are taken into consideration in the association. Then, for every pair of two tracklets, the similarity, called the connectivity in the paper, is measured by our designed multi-scale TrackletNet. Afterwards, the tracklets are clustered into groups and each group represents a unique object ID. Our proposed TNT has the ability to handle most of the challenges in MOT, and achieves promising results on MOT16 and MOT17 benchmark datasets compared with other state-of-the-art methods.