Overview
As artificial intellegence continues to grow and become more sophisticated, one of the areas of study that is becoming more useful everyday is computer vision. Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. In this blog I will touch upon a few methods that are commonly used in computer vision.
The process for the computer being able to actually see the images we give to it will not really change as our methods of object tracking may change. Each pixel gets measured by its intensity of color or if the image is black and white, how dark or light each pixel is.
Optical Flow
Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene. Optical flow can also be defined as the distribution of apparent velocities of movement of brightness pattern in an image. Based on the Shi-tomasi corner detector, this is how the computer finds what to track when being fed an image. Using the Lucas-Kanade method, the computer is able to track the movement of the points detectd from the Shi-tomasi corner detector.
I will include some of my code snips to give you an idea of how it all works. For the full code, you can go to my Github page and check out my jupyter notebooks.
while(cap.isOpened()):
# Read the capture and get the first frame
ret, frame = cap.read()
# Convert all frame to Grayscale (previously we did only the first frame)
gray = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
# Calculate optical flow by Lucas-Kanade
next, status, error = cv2.calcOpticalFlowPyrLK(prev_gray, gray, prev, None, **lk_params)
# Select good feature for the previous position
good_old = prev[status==1]
# Select good feature for the next position
good_new = next[status==1]
# Draw optical flow track
for i , (new,old) in enumerate(zip(good_new, good_old)):
# Return coordinates for the new point
a,b = new.ravel()
# Return coordinates for the old point
c,d = old.ravel()
# Draw line between new and old position
mask = cv2.line(mask, (a,b), (c,d), color, 2)
# Draw filled circle
frame = cv2.circle(frame, (a,b), 3, color, -1)
# Overlay optical flow on original frame
output = cv2.add(frame, mask)
# Update previous frame
prev_gray = gray.copy()
# Update previous good features
prev = good_new.reshape(-1,1,2)
# Open new window and display the output
cv2.imshow('Optical Flow', output)
The end result of this code produces objects selected and tracked frame by frame.

Dense Optical Flow
In the previous section, Lucas-Kanade method computes optical flow for a sparse feature set (corners detected using Shi-Tomasi algorithm). OpenCV provides another algorithm to find the dense optical flow. It computes the optical flow for all the points in the frame. It is based on Gunner Farneback’s algorithm which is explained in “Two-Frame Motion Estimation Based on Polynomial Expansion” by Gunner Farneback in 2003. This finds the magnitude and direction and then color code the results for better visualization. Direction corresponds to Hue value of the image. Magnitude corresponds to Value plane. Below is some of the code specific to dense optical flow.
# Calculate dense optical flow by Farneback
flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5,
3, 15, 3, 5, 1.2, 0)
# Compute Magnitude and Angle
magn, angle = cv2.cartToPolar(flow[..., 0],
flow[..., 1])
# Set image hue depanding on the optical flow direction
mask[..., 0] = angle * 180 / np.pi / 2
# Normalize the magnitude
mask[..., 2] = cv2.normalize(magn, None, 0, 255, cv2.NORM_MINMAX)
See the results below of Charlie Chaplin dancing.


MeanShift
The intuition behind the meanshift is simple. Consider you have a set of points. (It can be a pixel distribution like histogram backprojection). You are given a small window ( may be a circle) and you have to move that window to the area of maximum pixel density (or maximum number of points). To do this you need to set up an initial tracking window and the region of interest for tracking. Once that region is specified, you need to create a histogram to target on each frame for the meanshift calculation and then normalize it. This is all the setup you need to run the meanshift algorithm. The results look something like this.


Camshift
The one problem with the meanshift algorithm, is that the tracking window stays the same size. This is not ideal when objects get closer or farther from the camera. This problem can be solved with the CAMshift algorithm. It applies meanshift first. Once meanshift converges, it updates the size of the window. It also calculates the orientation of best fitting ellipse to it. Again it applies the meanshift with new scaled search window and previous window location. The process is continued until required accuracy is met. It is almost same as meanshift, but it returns a rotated rectangle (that is our result) and box parameters (used to be passed as search window in next iteration).
Single Object Tracking
OpenCV has 8 trackers already built into its library which you can choose from to track objects from images or videos. The different types of tracking include BOOSTING Tracker, MIL Tracker, KCF Tracker, CSRT Tracker, MedianFlow Tracker, TLD Tracker, MOSSE Tracker, and GOTURN. Each one of these has its advantages and disadvantages and it is up to the user to figure out which one is best for track the certain objects in their image. For my function I gave users the option to pick which tracking method they wanted to use.
while True:
# Read the capture
ret, frame = cap.read()
# update tracker
success, roi = tracker.update(frame)
# roi -> from tuple to int
(x,y,w,h) = tuple(map(int, roi))
# Draw rects as tracker moves
if success:
# Sucess on tracking
pts1 = (x,y)
pts2 = (x+w, y+h)
cv2.rectangle(frame, pts1, pts2, (255,125,5), 3)
# else
else:
# Failure on tracking
cv2.putText(frame, 'Fail to track the object', (100,200),
cv2.FONT_HERSHEY_SCRIPT_SIMPLEX, 1, (25,125,225), 3)
# Display Tracker
cv2.putText(frame, tracker_name, (20,400),
cv2.FONT_HERSHEY_SCRIPT_SIMPLEX, 1, (255,255,0), 3)
# Display result
cv2.imshow(tracker_name, frame)
Multi-Object Tracking
This takes the algorithm one step further to track multiple objects from a single video at the same time. The code is not too different for this process and only needs a few modifications. After those modifications are done you can select any object in the video you want to track.
# Create multitracker
multitracker = cv2.MultiTracker_create()
# Initialize multitracker
for rect_box in rects:
multitracker.add(tracker_name(tracker_types),
frame,
rect_box)
#Video and Tracker
# while loop
while cap.isOpened():
success, frame = cap.read()
if not success:
break
# update location objects
success, boxes = multitracker.update(frame)
# draw the objectes tracked
for i, newbox in enumerate(boxes):
pts1 = (int(newbox[0]),
int(newbox[1]))
pts2 = (int(newbox[0] + newbox[2]),
int(newbox[1] + newbox[3]))
cv2.rectangle(frame,
pts1,
pts2,
colors[i],
2,
1)
# display frame
cv2.imshow('MultiTracker', frame)
Again these bit of code are not the entire structure and for the full code please visit my Github.
Data and Info Source by Coursera.