孙愉亚 のすべての投稿

週報(SUN YUYA)

(1)  I am still studying how to apply trajectory prediction to long-term object tracking.

(2) About the the display of achievements in August, I am finding some of my previous projects, involving object tracking, deep reinforcement learning, large language models, and optical character recognition.

週報(SUN YUYA)

This week, I continued to conduct experiments on trajectory prediction.
I conducted visualization research on the code for the paper “Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural
Network for Human Trajectory Prediction”.

The initial predicted trajectory is very normal.

But in the later stage, the predicted trajectory will lag behind the actual position of pedestrians.


And the predicted trajectory always appears in the middle of the image, which is also a problem.

I will find the reason to solve the problems.

週報(SUN YUYA)

Contine reading papers about long term tracking.

  1. Robust Long-Term Object Tracking via Improved Discriminative Model Prediction

The paper try to modify the superdimp to a long-term tracker. It present a global search method and

(1) Baseline tracker using random erasing.

Method: Erase a random small rectangular areas of image to confirm whether the prediction is reliable.

Evaluation:I hope it  works.

(2) Global search using random searching.

Method: First, we create global searching templates with a predetermined interval. Next, we adaptively determine the number of searches according to the ratio of the image size to the target size. Then, an object is detected within a randomly selected searching area.

(3) Score penalty.

However, the probability of an object disappearing and suddenly appearing at a distant location is very low. To prevent this sudden detection, we penalize a confidence score through spatio-temporal constraints, which is expressed as follows:

週報(SUN YUYA)

The details of some long-term trackers.

1 . SiamX: An Efficient Long-term Tracker Using Cross-level Feature Correlation and Adaptive Tracking Scheme.

The key is “ADAPTIVE TRACKING SCHEME”.

(1)Momentum Compensation.

Exploit the concept “fast motion” to judge whether the target object is lost.

“If the target displacements between consecutive frames exceeds target sizes, it considers the target object is at a fast-moving state. To avoid targets leaving the search regions, the search center drifts in the direction of momentum:”

conclusion: Fake paper. Its codes lack the long-term tracker.

2. Combining complementary trackers for enhanced long-term visual object tracking.

Running two trackers.

But we can use its score’s method to re-detect.

3. GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences

if  s1(f∗, x1) > s1(f∗, x2) and s2(f∗, x1) ≤ s2(f∗, x2) :

re-detect else: continue.

Key: motion residual. The key is “UHP-SOT”

 

4. High-Performance Long-Term Tracking with Meta-Updater

(1) appearance model (lstm)

(2) re-detection( the flag of DiMP ? )

Conclusion: Another fake paper. The most important point is DIMP !

 

5. UHP-SOT: An Unsupervised High-Performance Single Object Tracker(2017)

Methods: It has three trackers:

(1) Trajectories-based box prediction ( principal component analysis)

(2) Background motion modeling ( optical flow)

(3) Appearance model (normal tracker)

 

6. Object Tracking Using Background Subtraction and Motion Estimation in MPEG Videos (2005)

Key: Using four corner to compute the motion of background(Optical flow).

7. Fast Object Tracking Using Adaptive Block Matching(2005)

Key: Exploiting ‘Mode filter’ in order to straighten up noisy vectors (Optical flow) and thus eliminate this problem.

 

 

週報(SUN YUYA)

The long term object tracker requires the tracker to be able to retrieve lost targets. So I want to predict the possible locations where the target might appear based on the historical motion trajectory of the object.  Trajectory prediction requires the camera motion and current object tracking method can’t provide camera information,such as camera pose or motion.

In order to get depth map and camera poses, I am reading papers about slam with monocula camera, involving unsupervised learning.

  1. Future Person Localization in First-Person Videos

Purpose:  predicting future locations of people observed in first-person videos.

key point :a)  ego-motion  b) Scales of the target person. c) KCF for tracking d) feature concatenating.

evaluation: Excellent introductory work. But how to get ego-motion information?

2. Unsupervised Learning of Depth and Ego-Motion from Video

Purpose: Presenting an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences.

Key point: a) Visual synthesis b)  unsupervised learning.

Evaluation: Nice paper. But it still need camera intrinsics.

3. Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

Purpose: Presenting a novel method for simultaneously learning depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as a supervision signal.

Key opint: a) Generating camera intrinsics.

Evaluation: Nice paper. Rrovide code. But it may be too slow.

週報(SUN YUYA)

(1)I am still learning how to compute the trajectory of object in monocula camera. The traditional object tracking task is just a simple object detection, ignoring depth and camera pose.

We can translate the task to the slam with monocula camera but we don’t know the camera intrinsics.

There are some unsupervised learning methods and I am reading these papers.

週報(SUN YUYA)

(1)Writing the paperof ICIAE2024.

( 2 )   In the experiments about long term tracking, I found that the similarity between templates and current appearance is not reliable. Because what we need  is the function that can identify whether two images are the same object, rather than the similarity distance between two images.

During tracking, the same object can have different appearance. The similarity distance between 0 and 1 are not suitable for judge.

So we should find another way in the field of image classification.