SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Unsupervised/Self-supervised
Visual Object Tracking
Yu Huang
Sunnyvale, California
Yu.Huang07@gmail.com
OUTLINE
• A Simple Framework For Contrastive Learning Of Visual Representations
• Tracking Objects As Points
• 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual
Tracking
• CL-MOT: A Contrastive Learning Framework For Multi-object Tracking
• Multi-object Tracking With Self-supervised Associating Network
• Self-supervised Learning For Multi-object Tracking
A Simple Framework For Contrastive Learning Of
Visual Representations
• SimCLR: a simple framework for contrastive learning of visual representations
• (1) composition of data augmentations plays a critical role in defining effective predictive tasks
• (2) introducing a learnable nonlinear transformation between the representation and the
contrastive loss substantially improves the quality of the learned representations
• (3) contrastive learning benefits from larger batch sizes and more training steps compared to
supervised learning.
A Simple Framework For Contrastive Learning Of
Visual Representations
A simple framework for contrastive learning of visual representations
Two separate data augmentation operators are
sampled from the same family of augmentations
and applied to each data example to obtain two
correlated views. A base encoder network f() and
a projection head g() are trained to maximize
agreement using a contrastive loss. After training
is completed, we throw away the projection head
g() and use encoder f() and representation h for
downstream tasks.
A Simple Framework For Contrastive Learning Of
Visual Representations
Data Augmentation
A Simple Framework For Contrastive Learning Of
Visual Representations
A Simple Framework For Contrastive Learning Of
Visual Representations
A Simple Framework For Contrastive Learning Of
Visual Representations
Tracking Objects As Points
• Tracking is dominated by pipelines that perform object detection followed by temporal
association, also known as tracking-by-detection.
• Centertrack, a simultaneous detection and tracking algorithm, is simpler, faster, and more accurate.
• It applies a detection model to a pair of images and detections from the prior frame.
• Given this minimal input, Centertrack localizes objects and predicts their associations with the
previous frame.
• Centertrack is simple, online (no peeking into the future), and real-time.
• Codes: https://github.com/xingyizhou/centertrack
Tracking Objects As Points
The network takes the current frame, the previous frame, and a heatmap rendered from tracked object
centers as inputs, and produces a center detection heatmap for the current frame, the bounding box size
map, and an offset map. At test time, object sizes and offsets are extracted from peaks in the heatmap.
Tracking Objects As Points
The training objective based on the focal loss
Size prediction is learned by regression
CenterNet regresses to a refined center local location using an analogous L1
loss. The overall loss of CenterNet is a weighted sum of all three loss terms:
focal loss, size, and local location regression.
Tracking Objects As Points
To associate detections through time, CenterTrack predicts a 2D displacement
as two additional output channels.
It learns this displacement using the same regression objective as size or
location refinement:
CenterTrack is first and foremost an object detector, and trained as such. The
architectural changed from CenterNet to CenterTrack are minor: four additional
input channels and two output channels.
This allows to fine-tune CenterTrack directly from a pretrained CenterNet detector.
Tracking Objects As Points
• Follow the CenterNet training protocol and train all predictions as multi-task learning.
• Training on static image data: simulate the previous frame by randomly scaling and
translating the current frame.
• To perform mono 3D tracking, it adopts the monocular 3D detection form of CenterNet.
• Specifically, train output heads to predict object depth, rotation (encoded as an 8-
dimensional vector), and 3D extent.
• Since the projection of the center of the 3D bounding box may not align with the center
of the object’s 2D bounding box, also predict a 2d-to-3d center offset.
• Backbone is DLA (Deep Layer Aggregation).
Tracking Objects As Points
Tracking Objects As Points
Tracking Objects As Points
Tracking Objects As Points
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
• It adapts the state-of-the-art supervised Siamese based trackers into unsupervised
ones by utilizing the fact that an image and any cropped region of it can form a
natural pair for self-training.
• It applies Anti-clutter weighting (AC) which can adaptively adjust the weight of
each training sample by determining whether the pair is informative or not.
• It proposes Adversarial Masking which helps the tracker to learn other context
information by adaptively blacking out salient regions of the target.
• Extend SiamFC (“Fully-convolutional Siamese Networks For Object Tracking”) to
S2SiamFC (Self-Supervised).
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
Illustration of the difference between (a) common
unsupervised learning approach and (b) the proposed
self-supervised learning approach. In (b), the regions
are partly overlapping and the positive samples are
highlighted in red and negative samples are
highlighted in black.
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
The challenges of self-supervised tracking are two-fold.
1) in the training phase, when you randomly crop a region from an image as the target
template and then extend the chosen region as the search image as a training pair, it
may lead to a potential issue which is about “background content tracking” due to
the randomness in the process of sampling a training pair from the same image.
2) The self-supervised tracking is challenging because only a limited amount of
appearance variations can be captured during the training phase.
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
The training pipeline mainly consists of two stages: 1) The training pairs are sampled from the same image
and calculate the loss between the raw template and the search region first. 2) The values with the positive
labels in response map are chosen to calculate the channel-wise saliency maps by backpropagation. One of
the thresholded saliency maps is chosen to mask the template image and feed the masked template into the
network again for learning appearance-robust features.
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
The SiamFC loss function Anticlutter weighting of each training sample
indicator function
Anti-clutter loss function
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
Illustrations of the concept about “background
tracking”. The predicted response map is
resized to 255x255 for better visualization. (a)
denotes a meaningful pair that has fewer large
positive values of predicted response map since
the template region is unique in the search
region. (b) denotes a meaningless pair and the
predicted response map tend to be flat (many
large positive values) since the template region
is a common pattern in the search region.
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
adversarial appearance masking module
Inspired by Grad-CAM, obtain the
saliency map in a self-guidance
manner by doing backpropagation
from the location of the ground truth
label which is positive in the response
map. Then choose one of those
saliency maps as a mask and force the
model to learn the other relevant
context info of the target. In this way,
the model is forced to correctly predict
the similarity when some important
details are not available.
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
𝑆2SiamFC: Self-supervised Fully Convolutional
Siamese Network For Visual Tracking
CL-MOT: A Contrastive Learning Framework For
Multi-object Tracking
• CL-MOT: A semi-supervised contrastive learning framework for MOT;
• Learn by clustering object embeddings from different views of static frames;
• Transfer an object detector to a tracker within this pre-text learning paradigm;
• Codes: https://github.Com/danielzgsilva/CL-MOT
CL-MOT: A Contrastive Learning Framework For
Multi-object Tracking
• Comprised of an encoder-decoder backbone, along with separate object detection and embedding
branches, this one-shot tracking network predicts object bounding boxes and appearance embeddings in
a single forward pass;
• Leverage a fully-convolutional resnet-34 as the backbone with the deep layer aggregation (DLA) variant;
• Replace convolutions in the up-sampling layers with deformable convolutions for adapting receptive field;
• Object detection as a center-based keypoint estimation task and regress to other properties such as
height and width.
• Therefore, three parallel regression heads are appended to the backbone network to predict an object
heatmap, object center offsets and bounding box sizes, respectively.
CL-MOT: A Contrastive Learning Framework For
Multi-object Tracking
• CL-MOT treats tracking as an online multi-object re-identification task;
• The network learns a feature space that discriminates between object instances in a single scene;
• Combine this representation learning framework with an association algorithm in object tracking;
• It "learns to track" objects in a self-supervised manner, forgoing the identity annotations;
• The object detection branch of CL-MOT is trained in a supervised manner;
• A focal loss to the estimated object heatmap, as well as an L1 regression loss to the size and offset predictions;
• Once trained, it runs online and in real time by leveraging an appearance-based association algo;
• To finalize the association step, bipartite matching is performed on the combined cost matrix by the
Hungarian algorithm.
CL-MOT: A Contrastive Learning Framework For
Multi-object Tracking
Multi-object Tracking With Self-supervised
Associating Network
• Tracking by detection: Feature based object re-identification;
• Self-supervised learning using a lot of short unlabeled videos;
• The re-identification network trained to solve the lack of training data problem;
• A self-supervised associating tracker(SSAT) which is a tracking algorithm that train the feature
extraction network without data constraints by a self-supervised manner, and utilize it directly to
re-identify targets to track without a separate downstream tasks.
Multi-object Tracking With Self-supervised
Associating Network
It considers all the frames of one short video as
image patches with the same ID and train the
network to classify N-class classification with N
number video clips. Then it utilizes output of
the backbone network, 512 channel embedding
as a feature of input image to associate
detections with tracks.
Multi-object Tracking With Self-supervised
Associating Network
• Assume that the frames of a sufficient short videos have similar appearance to each other, use only
short video clip to train the network.
• Some videos may be composed of completely different frames nonetheless, but it is expected that
if enough data is accumulated and learned, it will be solved by itself.
• Since self-supervised learning is free from labeling problems, it is possible to learn with a lot of
data.
• Set the short time to 10 seconds, secured a lot of short youtube videos of about 10 seconds and
learned this.
• In the case of a 30fps video, data in which 300 frames of images are assigned with one ID can be
obtained.
Multi-object Tracking With Self-supervised
Associating Network
• This process is very similar to face recognition task;
• Thus train the network by referring to Cosface (“Cosface: Large Margin Cosine Loss For Deep
Face Recognition”), simple and effective in face recognition;
• The backbone network is Resnet-50, which extracts 512 channel features;
• As the loss function, large margin cosine loss (LMCL) , also the loss function of Cosface.
• After training, when we apply our network to the mot task, a feature of 512 channels which is the
output of the backbone network is used to compare each patch with tracks.
Multi-object Tracking With Self-supervised
Associating Network
• A tracker based on Center-track,
which performed well in MOT;
• Center-track uses Center-net to obtain
or refine detection results and then
associating results by its own method;
• In addition, Un-super-track, the
unsupervised method, is also based
on Center-track;
• Note: the detector is trained
supervised, and the association
network is trained self-supervised.
Multi-object Tracking With Self-supervised
Associating Network
Multi-object Tracking With Self-supervised
Associating Network
Self-supervised Learning For Multi-object Tracking
• Under the detect-to-track framework: assume that an object detector, trained on image-level
bounding box annotations, is available, but train a tracking model using only unlabeled video;
• Dual-tracker consistency: a self-supervised training method; At a high-level, the approach creates
a self-supervisory signal by applying two instances of a tracker model (where the instances share
the same parameters) through two distinct input variations extracted from one video sequence; the
tracker is then trained to produce similar outputs through the video sequence under both inputs;
• Construct two distinct inputs for one video sequence, where each input is a variation of the video
sequence where different information has been hidden;
• Then apply two instances of a tracker independently on each input, and train the model to
produce consistent outputs.
Self-supervised Learning For Multi-object Tracking
• Adopt a tracker model that is similar to “multi-object tracking with neural gating using bilinear LSTM”;
• A self-supervisory signal for training an RNN tracker model through a three-step process:
• 1) during training, repeatedly randomly sample a video {I0,I1,…,In}; Let dk by the detections
automatically computed in Ik, and im is the window of ik corresponding to that box; Apply an input-hiding
scheme to select two input variations for the video segment, where each variation is a modified sequence
of detections in the frames;
• 2) apply two instances of the tracker model through each input variation to derive two probabilistic
tracking outputs, represented as transition matrices.
• 3) compare the transition matrices with dot product similarity to update the RNN parameters.
Self-supervised Learning For Multi-object Tracking
Tracker model architecture
Self-supervised Learning For Multi-object Tracking
• The tracker maintains a set of active tracks that have not yet left the camera frame;
• Given a video segment {I0,…,In}, and sets of detections Dk detected in each frame Ik, to initialize
the tracking process, create a track ti for each detection d0i in the first video frame I0;
• On each subsequent frame Ik, the model outputs a probability that each active track ti
corresponds to each detection.
• At inference time, apply the Hungarian algorithm to match active tracks with detections based on
these probabilities.
• For each detection in Ik that no track matches to, create a new active track for that detection.
• Similarly, if a track does not match to any detection or tage consecutive frames, remove it from the
active set; thus, tage is a threshold on the maximum age of a track since it last matched to some
detection.
Self-supervised Learning For Multi-object Tracking
• During training, repeatedly sample segments of up to 16 consecutive video frames.
• Apply one of two input-hiding schemes (occlusion-based and visual spatial), to extract two distinct input
variations from a sampled video segment.
• Then apply instances of the tracker model on each variation, where the instances share the same model
parameters.
• Dual-tracker consistency trains the model by enforcing it to produce similar outputs on both inputs.
• To represent tracker outputs, compute a transition matrix M, which element is the probability that the
active track ti matches each detection.
• When applying the model over video segments during training, update tracks with new detections
based on the scores output by the model on intermediate frames, but do not create additional active
tracks on frames after I0; thus, each active track ti corresponds directly to a detection in I0.
• Then, applying two instances of the tracker yields two transition matrices A and B.
• Train the model (CNN, RNN, and matching network) end-to-end to maximize the dot-product similarity
between these matrices.
Self-supervised Learning For Multi-object Tracking
Occlusion-based hiding produces input variations with
different subsequences of occluded frames where all
detections are hidden from the tracker. It also
independently applies the tracker before and after a
hand-off frame (I4 and I2), and merges the outputs
through the matrix product.
Visual-spatial hiding. One variation includes
only visual inputs, and the other includes
only spatial inputs.
Self-supervised Learning For Multi-object Tracking
Unsupervised/Self-supervvised visual object tracking

Mais conteúdo relacionado

Mais procurados

Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep LearningYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors IIIYu Huang
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataYu Huang
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIYu Huang
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IVYu Huang
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIYu Huang
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam iiYu Huang
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iiiYu Huang
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 

Mais procurados (20)

Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IV
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xiv
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam ii
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 

Semelhante a Unsupervised/Self-supervvised visual object tracking

Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Trackingijsrd.com
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptxManojGowdaKb
 
final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxshwetabhagat25
 
Dj31514517
Dj31514517Dj31514517
Dj31514517IJMER
 
Dj31514517
Dj31514517Dj31514517
Dj31514517IJMER
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
 

Semelhante a Unsupervised/Self-supervvised visual object tracking (20)

Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptx
 
final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptx
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
Object tracking
Object trackingObject tracking
Object tracking
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Lalal
LalalLalal
Lalal
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Presentation roi
Presentation roiPresentation roi
Presentation roi
 

Mais de Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 

Mais de Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 

Último

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 

Último (20)

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

Unsupervised/Self-supervvised visual object tracking

  • 1. Unsupervised/Self-supervised Visual Object Tracking Yu Huang Sunnyvale, California Yu.Huang07@gmail.com
  • 2. OUTLINE • A Simple Framework For Contrastive Learning Of Visual Representations • Tracking Objects As Points • 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking • CL-MOT: A Contrastive Learning Framework For Multi-object Tracking • Multi-object Tracking With Self-supervised Associating Network • Self-supervised Learning For Multi-object Tracking
  • 3. A Simple Framework For Contrastive Learning Of Visual Representations • SimCLR: a simple framework for contrastive learning of visual representations • (1) composition of data augmentations plays a critical role in defining effective predictive tasks • (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations • (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
  • 4. A Simple Framework For Contrastive Learning Of Visual Representations A simple framework for contrastive learning of visual representations Two separate data augmentation operators are sampled from the same family of augmentations and applied to each data example to obtain two correlated views. A base encoder network f() and a projection head g() are trained to maximize agreement using a contrastive loss. After training is completed, we throw away the projection head g() and use encoder f() and representation h for downstream tasks.
  • 5. A Simple Framework For Contrastive Learning Of Visual Representations Data Augmentation
  • 6. A Simple Framework For Contrastive Learning Of Visual Representations
  • 7. A Simple Framework For Contrastive Learning Of Visual Representations
  • 8. A Simple Framework For Contrastive Learning Of Visual Representations
  • 9. Tracking Objects As Points • Tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. • Centertrack, a simultaneous detection and tracking algorithm, is simpler, faster, and more accurate. • It applies a detection model to a pair of images and detections from the prior frame. • Given this minimal input, Centertrack localizes objects and predicts their associations with the previous frame. • Centertrack is simple, online (no peeking into the future), and real-time. • Codes: https://github.com/xingyizhou/centertrack
  • 10. Tracking Objects As Points The network takes the current frame, the previous frame, and a heatmap rendered from tracked object centers as inputs, and produces a center detection heatmap for the current frame, the bounding box size map, and an offset map. At test time, object sizes and offsets are extracted from peaks in the heatmap.
  • 11. Tracking Objects As Points The training objective based on the focal loss Size prediction is learned by regression CenterNet regresses to a refined center local location using an analogous L1 loss. The overall loss of CenterNet is a weighted sum of all three loss terms: focal loss, size, and local location regression.
  • 12. Tracking Objects As Points To associate detections through time, CenterTrack predicts a 2D displacement as two additional output channels. It learns this displacement using the same regression objective as size or location refinement: CenterTrack is first and foremost an object detector, and trained as such. The architectural changed from CenterNet to CenterTrack are minor: four additional input channels and two output channels. This allows to fine-tune CenterTrack directly from a pretrained CenterNet detector.
  • 13. Tracking Objects As Points • Follow the CenterNet training protocol and train all predictions as multi-task learning. • Training on static image data: simulate the previous frame by randomly scaling and translating the current frame. • To perform mono 3D tracking, it adopts the monocular 3D detection form of CenterNet. • Specifically, train output heads to predict object depth, rotation (encoded as an 8- dimensional vector), and 3D extent. • Since the projection of the center of the 3D bounding box may not align with the center of the object’s 2D bounding box, also predict a 2d-to-3d center offset. • Backbone is DLA (Deep Layer Aggregation).
  • 18. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking • It adapts the state-of-the-art supervised Siamese based trackers into unsupervised ones by utilizing the fact that an image and any cropped region of it can form a natural pair for self-training. • It applies Anti-clutter weighting (AC) which can adaptively adjust the weight of each training sample by determining whether the pair is informative or not. • It proposes Adversarial Masking which helps the tracker to learn other context information by adaptively blacking out salient regions of the target. • Extend SiamFC (“Fully-convolutional Siamese Networks For Object Tracking”) to S2SiamFC (Self-Supervised).
  • 19. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking Illustration of the difference between (a) common unsupervised learning approach and (b) the proposed self-supervised learning approach. In (b), the regions are partly overlapping and the positive samples are highlighted in red and negative samples are highlighted in black.
  • 20. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking The challenges of self-supervised tracking are two-fold. 1) in the training phase, when you randomly crop a region from an image as the target template and then extend the chosen region as the search image as a training pair, it may lead to a potential issue which is about “background content tracking” due to the randomness in the process of sampling a training pair from the same image. 2) The self-supervised tracking is challenging because only a limited amount of appearance variations can be captured during the training phase.
  • 21. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking The training pipeline mainly consists of two stages: 1) The training pairs are sampled from the same image and calculate the loss between the raw template and the search region first. 2) The values with the positive labels in response map are chosen to calculate the channel-wise saliency maps by backpropagation. One of the thresholded saliency maps is chosen to mask the template image and feed the masked template into the network again for learning appearance-robust features.
  • 22. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking The SiamFC loss function Anticlutter weighting of each training sample indicator function Anti-clutter loss function
  • 23. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking Illustrations of the concept about “background tracking”. The predicted response map is resized to 255x255 for better visualization. (a) denotes a meaningful pair that has fewer large positive values of predicted response map since the template region is unique in the search region. (b) denotes a meaningless pair and the predicted response map tend to be flat (many large positive values) since the template region is a common pattern in the search region.
  • 24. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking adversarial appearance masking module Inspired by Grad-CAM, obtain the saliency map in a self-guidance manner by doing backpropagation from the location of the ground truth label which is positive in the response map. Then choose one of those saliency maps as a mask and force the model to learn the other relevant context info of the target. In this way, the model is forced to correctly predict the similarity when some important details are not available.
  • 25. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking
  • 26. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking
  • 27. 𝑆2SiamFC: Self-supervised Fully Convolutional Siamese Network For Visual Tracking
  • 28. CL-MOT: A Contrastive Learning Framework For Multi-object Tracking • CL-MOT: A semi-supervised contrastive learning framework for MOT; • Learn by clustering object embeddings from different views of static frames; • Transfer an object detector to a tracker within this pre-text learning paradigm; • Codes: https://github.Com/danielzgsilva/CL-MOT
  • 29. CL-MOT: A Contrastive Learning Framework For Multi-object Tracking • Comprised of an encoder-decoder backbone, along with separate object detection and embedding branches, this one-shot tracking network predicts object bounding boxes and appearance embeddings in a single forward pass; • Leverage a fully-convolutional resnet-34 as the backbone with the deep layer aggregation (DLA) variant; • Replace convolutions in the up-sampling layers with deformable convolutions for adapting receptive field; • Object detection as a center-based keypoint estimation task and regress to other properties such as height and width. • Therefore, three parallel regression heads are appended to the backbone network to predict an object heatmap, object center offsets and bounding box sizes, respectively.
  • 30. CL-MOT: A Contrastive Learning Framework For Multi-object Tracking • CL-MOT treats tracking as an online multi-object re-identification task; • The network learns a feature space that discriminates between object instances in a single scene; • Combine this representation learning framework with an association algorithm in object tracking; • It "learns to track" objects in a self-supervised manner, forgoing the identity annotations; • The object detection branch of CL-MOT is trained in a supervised manner; • A focal loss to the estimated object heatmap, as well as an L1 regression loss to the size and offset predictions; • Once trained, it runs online and in real time by leveraging an appearance-based association algo; • To finalize the association step, bipartite matching is performed on the combined cost matrix by the Hungarian algorithm.
  • 31. CL-MOT: A Contrastive Learning Framework For Multi-object Tracking
  • 32. Multi-object Tracking With Self-supervised Associating Network • Tracking by detection: Feature based object re-identification; • Self-supervised learning using a lot of short unlabeled videos; • The re-identification network trained to solve the lack of training data problem; • A self-supervised associating tracker(SSAT) which is a tracking algorithm that train the feature extraction network without data constraints by a self-supervised manner, and utilize it directly to re-identify targets to track without a separate downstream tasks.
  • 33. Multi-object Tracking With Self-supervised Associating Network It considers all the frames of one short video as image patches with the same ID and train the network to classify N-class classification with N number video clips. Then it utilizes output of the backbone network, 512 channel embedding as a feature of input image to associate detections with tracks.
  • 34. Multi-object Tracking With Self-supervised Associating Network • Assume that the frames of a sufficient short videos have similar appearance to each other, use only short video clip to train the network. • Some videos may be composed of completely different frames nonetheless, but it is expected that if enough data is accumulated and learned, it will be solved by itself. • Since self-supervised learning is free from labeling problems, it is possible to learn with a lot of data. • Set the short time to 10 seconds, secured a lot of short youtube videos of about 10 seconds and learned this. • In the case of a 30fps video, data in which 300 frames of images are assigned with one ID can be obtained.
  • 35. Multi-object Tracking With Self-supervised Associating Network • This process is very similar to face recognition task; • Thus train the network by referring to Cosface (“Cosface: Large Margin Cosine Loss For Deep Face Recognition”), simple and effective in face recognition; • The backbone network is Resnet-50, which extracts 512 channel features; • As the loss function, large margin cosine loss (LMCL) , also the loss function of Cosface. • After training, when we apply our network to the mot task, a feature of 512 channels which is the output of the backbone network is used to compare each patch with tracks.
  • 36. Multi-object Tracking With Self-supervised Associating Network • A tracker based on Center-track, which performed well in MOT; • Center-track uses Center-net to obtain or refine detection results and then associating results by its own method; • In addition, Un-super-track, the unsupervised method, is also based on Center-track; • Note: the detector is trained supervised, and the association network is trained self-supervised.
  • 37. Multi-object Tracking With Self-supervised Associating Network
  • 38. Multi-object Tracking With Self-supervised Associating Network
  • 39. Self-supervised Learning For Multi-object Tracking • Under the detect-to-track framework: assume that an object detector, trained on image-level bounding box annotations, is available, but train a tracking model using only unlabeled video; • Dual-tracker consistency: a self-supervised training method; At a high-level, the approach creates a self-supervisory signal by applying two instances of a tracker model (where the instances share the same parameters) through two distinct input variations extracted from one video sequence; the tracker is then trained to produce similar outputs through the video sequence under both inputs; • Construct two distinct inputs for one video sequence, where each input is a variation of the video sequence where different information has been hidden; • Then apply two instances of a tracker independently on each input, and train the model to produce consistent outputs.
  • 40. Self-supervised Learning For Multi-object Tracking • Adopt a tracker model that is similar to “multi-object tracking with neural gating using bilinear LSTM”; • A self-supervisory signal for training an RNN tracker model through a three-step process: • 1) during training, repeatedly randomly sample a video {I0,I1,…,In}; Let dk by the detections automatically computed in Ik, and im is the window of ik corresponding to that box; Apply an input-hiding scheme to select two input variations for the video segment, where each variation is a modified sequence of detections in the frames; • 2) apply two instances of the tracker model through each input variation to derive two probabilistic tracking outputs, represented as transition matrices. • 3) compare the transition matrices with dot product similarity to update the RNN parameters.
  • 41. Self-supervised Learning For Multi-object Tracking Tracker model architecture
  • 42. Self-supervised Learning For Multi-object Tracking • The tracker maintains a set of active tracks that have not yet left the camera frame; • Given a video segment {I0,…,In}, and sets of detections Dk detected in each frame Ik, to initialize the tracking process, create a track ti for each detection d0i in the first video frame I0; • On each subsequent frame Ik, the model outputs a probability that each active track ti corresponds to each detection. • At inference time, apply the Hungarian algorithm to match active tracks with detections based on these probabilities. • For each detection in Ik that no track matches to, create a new active track for that detection. • Similarly, if a track does not match to any detection or tage consecutive frames, remove it from the active set; thus, tage is a threshold on the maximum age of a track since it last matched to some detection.
  • 43. Self-supervised Learning For Multi-object Tracking • During training, repeatedly sample segments of up to 16 consecutive video frames. • Apply one of two input-hiding schemes (occlusion-based and visual spatial), to extract two distinct input variations from a sampled video segment. • Then apply instances of the tracker model on each variation, where the instances share the same model parameters. • Dual-tracker consistency trains the model by enforcing it to produce similar outputs on both inputs. • To represent tracker outputs, compute a transition matrix M, which element is the probability that the active track ti matches each detection. • When applying the model over video segments during training, update tracks with new detections based on the scores output by the model on intermediate frames, but do not create additional active tracks on frames after I0; thus, each active track ti corresponds directly to a detection in I0. • Then, applying two instances of the tracker yields two transition matrices A and B. • Train the model (CNN, RNN, and matching network) end-to-end to maximize the dot-product similarity between these matrices.
  • 44. Self-supervised Learning For Multi-object Tracking Occlusion-based hiding produces input variations with different subsequences of occluded frames where all detections are hidden from the tracker. It also independently applies the tracker before and after a hand-off frame (I4 and I2), and merges the outputs through the matrix product. Visual-spatial hiding. One variation includes only visual inputs, and the other includes only spatial inputs.
  • 45. Self-supervised Learning For Multi-object Tracking