Manual to Object Detection with Machine Learning

Manual to Object Detection with Machine Learning

January 13, 2019 0 comments

I don’t know about you right there, but my house is often cluttered in the mornings. Trying to find my keys from the clutter sometimes takes up a lot of time and becomes quite an agonizing endeavor. Perhaps, if I could scan the room with some sort of computer algorithm, I would not have to waste minutes looking for my keys on those wretched mornings, right? And that’s where object detection comes in. Now, while we are still working on fine-tuning real-life object detection, the process is entirely possible on digital media thanks to the remarkable power of object detection algorithms.

So, what exactly is object detection? It refers to the process of identifying instances of various real-life objects in digital images and videos. In this procedure, the objects are determined by class, in essence, cars, buildings, humans, and so forth. Object detection is extensively used in performing computer vision tasks such as face detection, video object co-segmentation, VR travel and face recognition. Other than that, it is applied in the tracking of objects, for instance, a person in a particular video, a ball during a soccer match, and other appropriate scenarios.

There are two primary methods for object detection – deep learning and machine learning approaches. In this post, I will dwell on the latter and provide you with a simple guide on how you can implement the process under this modus operandi.


Doing Object Detection with Machine Learning

Let me walk you through three machine learning approaches to object detection:


•  The Viola-Jones Framework

Based on Haar features, this was the first framework of object detection to give competitive object detection rates in actual time. Paul Viola and Michael Jones proposed this approach way back in 2001. Though motivated fundamentally by the face detection problem, it can also be trained to identify other object classes. This framework boasts of cascade architecture, a variant of the AdaBoost learning algorithm, feature selection, and evaluation tools.

How can you use Viola-Jones in the tracking of objects? When analyzing videos with moving objects, you do not have to apply object detection to every frame. Rather, tracking algorithms such as the KLT algorithm can come in handy in identifying salient features inside of detection bounding boxes and tracking movements between frames.

Below is the syntax for detecting objects using this algorithm:

detector = vision.CascadeObjectDetector

detector = vision.CascadeObjectDetector(model)

detector = vision.CascadeObjectDetector(XMLFILE)

detector = vision.CascadeObjectDetector(Name,Value)


•  SIFT (Scale-Invariant Feature Transform)

SIFT is a fantastic algorithm for detecting features in computer vision used for both detecting and describing any local features in images. David Lowe published SIFT in 1999, and it is patented by the University of British Columbia in Canada. SIFT can be used in image stitching, navigation and robotic mapping, object recognition, gesture recognition, 3D modeling, individual wildlife identification, match moving, and video tracking. Click here to access the full SIFT code with keys and their associated descriptors.


•  HOG (Histogram of Oriented Gradients)

HOG is a feature descriptor used in image processing and computer vision for detecting objects. In this approach, all instances of gradient orientation in all localized parts of images are counted. This machine learning approach to object detection is pretty much the same as that of shape contexts, scale-invariant transform descriptors, and edge orientation histograms. The main difference is that HOG is computed on a noticeably denser grid of cells that are spaced out uniformly and uses the overlapping contrast normalization for better accuracy. HOG is implemented in five steps- the computation of gradient, orientation binning, computation of descriptor blocks, block normalization, and finally object recognition. Below is the syntax for images of unobstructed people standing upright:

peopleDetector = vision.PeopleDetector

peopleDetector = vision.PeopleDetector(model)

peopleDetector = vision.PeopleDetector(Name,Value)


Final Word

So far, we have discussed the different machine learning approaches in object detection. I have also provided some code snippets to help you get a better idea regarding the various object detection strategies by practically. If you feel like there is a need for further clarification, feel free to comment right below. Till then, friends!

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.