Computer Vision vs Human Vision: Filling the Void is Indeed Difficult

Researchers are working to make the gap between computer vision and human vision disappear

One thing that humans are trying to get from technology is the human-level of intelligence. Whether it is computer vision or chatbots, they want machines to see and speak like humans. When the outlook seems pretty simple, the inner mechanism of computer vision technology is complex, especially, when we are aiming to make them see like humans. Researchers are working to make the gap between computer vision and human vision disappear.

Machines have the ability to learn from datasets. The concept of machines learning something is surely one of the puzzle pieces that make up our intelligence. Machines get trained to act like humans with the help of such mimicked endeavour. The mechanism behind making machines replicates humans is by following human brain functionalities. However, discovering and remaking the human brain in a mechanical form is not easy, it involves irreplaceable complex systems. Some of the complicated spots in the human brain can never be replaced. This brings us to a place where we take machine learning to train certain machines to act like humans. Computer vision is one such technology that is aiming to perform as good as human vision. Computer vision technology is known for its similarities with visual information on human brain processes. But recent information suggests that that computer vision cannot perform equally to human vision as it is difficult to process visual information as humans do. Computers have to process visual information in data space formed by robustly detectable but less meaningful features such as colours, textures, etc.

Recently, a group of researchers from various German organizations and universities joined hands and addressed the challenges of evaluating the performance of deep learning in processing visual data. In the paper titled 'The Notorious Difficulty of Comparing Human and Machine Perception,' the researchers have highlighted the problems in current methods that compare deep neural networks and the human visual system. Before going through the comparison of computer vision technology and its perception on human vision and brain, let us first get the basics of how the human visual system and computer vision work individually.

The functionalities of human brain

Computer vision was taken very lightly in the past century. Humans thought that replicating the functionalities of the human visual system was very easy and they could achieve it with the help of technology. But very less did they know about the role of neural networks, deep learning algorithms in materializing the concept. Seeing is easy for us, and just like that, our visual ability such as binocular integration, deep perception, hand-eye coordination, visual pattern recognition, which are some of the most biologically complex tasks we undertake as humans, were taken for granted. One of the key abilities of human brain is that without realizing, we identify the object, doesn't matter if the object varies in size, rotation, illumination or position. We can even roughly recognize blurry images. But it is not the same with computer vision. Computer vision technology has to undergo many stages of training with the help of machine learning, deep learning algorithm and neural networks to achieve the visibility. But even after doing so much, the result could still be negative.

The functionalities of computer vision

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and video and deep learning models, machines can accurately identify and classify objects and might also react to what they see. Unlike how humans are embedded to see biologically, machines are added with external mechanism and technologies like computer vision technology to streamline the process. This means that computers can make inferences about images without human assistance, which is comparatively difficult and, sometimes, can't be achieved with full visibility.

An experiment to address the void between computer and human vision

Researchers from various German organizations and universities have worked on addressing the gap between computer vision and human vision. They have conducted a series of experiments to prove their point. The experiments dig beneath the surface of deep learning results and compare them to the working of the human visual system.

Experiment 1: In experiment 1, the researchers wanted to know how neural networks perceive contours. The human and AI participants were shown an image and asked whether it has closed contour or not. They wanted to learn if deep learning algorithm can identify closed and open shapes. The result showed that a well-trained neural network was able to grasp the idea of a closed contour. Even though the model was trained only on straight lines, it was capable of detecting closed contours right away.

Experiment 2: In the second phase of the experiment, the deep learning algorithm was made to answer that requires an understanding of the relations between different shapes in the picture. The test includes same-different tasks and spatial tasks that a human observer would easily identify and answer. Deep learning algorithms that were trained on the low-data regime found it difficult to draw a conclusion.

Experiment 3: Experiment 3 is a recognition gap that strains computer vision technology to detect an object or a thing from closer proximity when it is zoomed. We humans need to see a certain amount of overall shapes and patterns to be able to recognize an object in an image. The more we zoom in, the less likely we get what it is. Surprisingly, deep learning performs better than humans in such a case. Neural networks sometimes find minuscule features that are imperceptible to the human eye but remain detectable even when you zoom in very closely.

Computer Vision

Computer Vision vs Human Vision: Filling the Void is Indeed Difficult

Researchers are working to make the gap between computer vision and human vision disappear

The functionalities of human brain

The functionalities of computer vision

An experiment to address the void between computer and human vision

Related Stories