How Differently Do Humans and Computers Visually Identify Objects?

Image credit: Salvatore Vuono via freedigitalphotos.net

Most likely, humans and computers have different ways in identifying objects. When compared to the most prominent image identification and facial recognition technologies at present, the way humans recognize different objects appears to be considerably dissimilar. This is something a group of American and Israeli researchers tried to explore through an experimental study published in the journal Proceedings of the National Academy of Sciences. What is it that needs to be done to fully duplicate the human ability to identify things and faces? What do current computer visual identification technologies lack? What are the features and processes that should be emulated?

The Study

The study was conducted by Liav Assif and Ethan Fetaya of Weizmann Institute of Science, Shimon Ullman of the Massachusetts Institute of Technology, and Daniel Harari of the McGovern Institute for Brain Research. It explores the “phase transition” phenomenon in minimal images, wherein minor changes to an image have dramatic effects in the way it is identified or recognized.

Today’s leading technologies for identifying objects and faces have already made significant advancements by employing neural network models as well as biological and deep network models to achieve recognition capabilities that are already close to those of humans’. However, it remains unclear if these models are similar to how humans perceive and identify objects visually. It is yet to be determined if the human visual identification process can still introduce new ideas for further improvement. After all, scientists are still uncertain how humans really perform the visual identification of objects.

How the Study Was Conducted

The researchers set up a project on Amazon’s Mechanical Turk. In this project, they asked (with some compensation) workers to identify objects in pictures with different sizes and levels of blurriness. More than 14,000 workers participated and were made to view and identify a total of 3,553 patches of images. These patches of images were also shown to a computer that can perform visual object identification. The results were then compared.

The Findings

As expected, humans produced the better results. However, when the results were analyzed, it was found that humans tend to have a “drop-off point.” This drop-off point is a certain level of blurriness with which the human visual identification ability significantly drops. At this point, only a few of the human participants were able to identify the objects shown on the images. The computer, on the other hand, appeared to have no drop-off point.

The researchers suggest that what this means is that humans and computers employ different methods in identifying objects visually. There appears to be different processes or sequence of steps involved. The researchers suspect that humans are using a bottom-up approach then a top-down approach later on. Both approaches, in the sequence mentioned, seem to be used especially when humans don’t clearly identify an object yet at first glance.

A bottom-up approach means that an observer looks at the basic elementary elements of the object being observed first and combines these to identify the object. On the other hand, the top-bottom approach relies mainly on the observer’s knowledge, experience, as well as the context to identify something. Bottom-up is like identifying the little features first while top-bottom is about perceiving an image as a whole and immediately associating it with the knowledge and experience of the observer.

Moreover, the researchers found through psychophysical studies that when shown small images with few details, minute changes can drastically affect the way an image is identified by humans. Upon learning about this, they conducted simulations and found that current (computer) visual identification models are unable to distinguish small changes in the minimal images shown in the experiment, or are incapable of detecting little changes in small images. This is likely the reason, as the researchers suggest, why computers are still unable to come close to the accuracy of human visual recognition.

Image credit: deepagopi via freedigitalphotos.net

Significance of the Study

Computer vision and object identification technology has greatly progressed over the years. From merely using databases to compare images with collections of images to identify them, neural networks are now being used. Still, the ability of computers to identify objects is far from being as accurate as those of humans’. This study provides some useful insights in possibly improving current computer-based visual object identification technologies. Researchers working in the field of computer visual identification or facial recognition may want to consider reworking their algorithms or tweaking the sequence of steps used in their visual identification processes. If the goal is to emulate the human visual identification ability this study can offer ideas on what improvements to make.