AllTracker: Efficient Dense Point Tracking at High Resolution

89 points 9 comments 19 hours ago
upghost

> The utility of optical flow (i.e., the instantaneous velocity of pixels [16]) toward this goal has long been obvious, yet it has remained challenging to upgrade flows into long-range tracks.

This sentence from the paper makes me feel a little bad that I don't understand why this goal is obvious. I am not tracking why we are tracking pixels.

Is this basically a competing technology with YOLO[1] or SAM[2]?

[1]: https://en.m.wikipedia.org/wiki/You_Only_Look_Once

[2]: https://ai.meta.com/sam2/

Edit: added annotations, should've done that initially

markisus

Back in my earlier days working on autonomous vehicles, I dreamed of something like this.

The issue with bounding boxes is missed detections, occlusions, and impoverished geometrical information. But if you have a hundred points being stably tracked on an object, it's now much easier to keep tracking it through partial occlusions, figure out its 3D geometry and kinematics, and even re-identify it coming in and out of occlusion.

daemonologist

No, this performs the same task as CoTracker or TAPIR, but intended for running at a higher resolution. Point tracking is useful both for keeping track of the position of a target and for "inside-out" positioning of the camera.

YOLO is mostly concerned with detecting objects of certain classes in a single image, and SAM is concerned with essentially classifying pixels as belonging to an object or not.

ipsum2

Using optical flow for point tracking is obvious, not that the goal of tracking pixels is obvious.

Regarding your actual question, there's many use cases.

- Tracking players or balls in sports

- Surveillance

sheepscreek

I’m not remotely familiar with either YOLO or SAM, but want to add my own question here. Does the utility of this invention have something to do with the tracking of subjects, like auto-focus for cameras and robotics (to keep the subject in view)?

upghost

Apologies, jargon meanings updated.

jcims

Object segmentation and tracking is such a natural and 'automatic' part of our visual perception that it's difficult to intuit how challenging it is to do with software.

thom

It takes freshly deployed humans a while to master, and only then with fairly high bandwidth training data, so I wouldn’t feel too bad about the complexity of implementing it in software.

jauntywundrkind

Crazy slick results. Nicely done team!

Made by @calebRussel