Speaker Two: That's the letter t. It makes the t sound. Speaker One: t, t, tiger. Aahh! Speaker Two: t, t, tractor. Speaker One: That's the letter p. It makes the p ...
Abstract: Open-vocabulary object detection (OVD) aims to detect novel object concepts by mining region-word correspondences from image-text pairs, yet current methods often produce false ...
Abstract: Referring Multi-Object Tracking (RMOT) aims to dynamically track an arbitrary number of referred targets in a video sequence according to the language expression. Previous methods mainly ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results