In many ways, the story of computer vision is a story about artificial intelligence. Both disciplines imitate biological processes based on an understanding of how the brain works and each has been advanced by the emergence of artificial neural networks, better computing resources, and big data. In this article, we’ll take a deep dive into the origins of computer vision in the context of artificial intelligence. Not familiar with the field of computer vision yet? Check out this article for a brief introduction.
Most of us assume that modern minds created the concept of artificial intelligence, but our ancient ancestors also put forth theories of thinking robots. Nearly 3000 years ago in the Iliad, Homer described Hephaestus, the god of fire, fashioning mechanical serving maids from gold and endowing them with reason and learning. Centuries before self-driving cars, the ancient Greek author Apollonius of Rhodes imagined Talos, a bronze automaton tasked with defending the island of Crete. Tales of self-moving statues imbued with human intelligence abound in historical literature.
But these historical narratives don’t accurately describe the kinds of artificial intelligence being developed today – our Roombas are a far cry from Hephaestus’ golden maids! Despite the hype, we’re a long way away from creating general intelligence. What modern artificial intelligence systems are good at is performing a single task extremely well. While most programs can’t generalize knowledge beyond the tasks that they’re designed to perform, they often meet or exceed human levels of accuracy for their intended purpose.
The Foundations of Computer Vision
Much of what we know today about visual perception comes from neurophysiological research conducted on cats in the 1950s and 1960s. By studying how neurons react to various stimuli, two scientists observed that human vision is hierarchical. Neurons detect simple features like edges, then feed into more complex features like shapes, and then eventually feed into more complex visual representations.
Armed with this knowledge, computer scientists have focused on recreating human neurological structures in digital form. Like their biological counterparts, computer vision systems take a hierarchical approach to perceiving and analyzing visual stimuli. In the following sections, we’ll explain how a handful of experiments spawned today’s burgeoning industry of AI-enabled computer vision.
The field of artificial intelligence was founded at a summer seminar held on the campus of Dartmouth College in 1956, when scientists brought together several disparate fields to clarify and develop ideas about thinking machines. “This was the first evidence of an institutional tendency towards overconfident predictions that has plagued the field of AI since the beginning,” explains Hooman Shariati, a Machine Learning Developer at Motion Metrics who specializes in deep neural networks.
Computer vision began in earnest during the 1960s at universities that viewed the project as a stepping stone to artificial intelligence. Early researchers were extremely optimistic about the future of these related fields and promoted artificial intelligence as a technology that could transform the world. Some predicted that a machine as intelligent as a human being would be created within a generation. The hype earned researchers millions of dollars in public and private funding. Research centres popped up around the globe. But the international effort to develop artificial intelligence was throttled by a failure to live up to lofty expectations.
With their tremendous optimism, researchers had raised public expectations impossibly high while failing to appreciate the difficulty of the challenge they had set for themselves. When the promised results failed to live up to the hype, the field experienced intense critique and serious financial setbacks.
Early computing resources could not keep pace technically with the complexity of problems advanced by scientists, and even the most impressive projects could solve only trivial problems. Moreover, most researchers worked in isolated groups and lacked the scientific support to advance the field in a meaningful way.
In 1966, American computer scientist and co-founder of the MIT AI Lab Marvin Minsky received a summer grant to hire a first-year undergraduate student, Gerald Sussman, to spend the summer linking a camera to a computer and getting the computer to describe what it saw. “Needless to say, Sussman didn’t make the deadline,” says Hooman. “Vision turned out to be one of the most difficult and frustrating challenges in AI over the next four decades. As machine vision expert Berthold Horn once pointed out, Sussman opted never to work in vision again.”
By the mid-1970s, governments and corporations were losing faith in artificial intelligence. Funding dried up, and the period that followed became known as the ‘AI winter’. While there were small resurgences in the 1980s and 1990s, artificial intelligence was mostly relegated to the realm of science fiction and the term was avoided by serious computer scientists.
Breakthrough at the University of Toronto
As the internet became a mainstay, computer scientists gained access to more data than ever before. Computing hardware continued to improve as costs went down. Rudimentary neural networks and algorithms developed in the 1980s-90s improved. Now more than half a century old, the field of artificial intelligence finally had its breakthrough moment in 2012 at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
The ILSVRC is an annual image classification competition where research teams evaluate their algorithms on the given data set, and then compete to achieve higher accuracy on several visual recognition tasks. From 2010-2011, the error rate for ILSVRC winners hovered around 26%. Then, in 2012, a team from the University of Toronto entered a deep neural network called AlexNet that changed the game for artificial intelligence and computer vision projects.
Deep neural networks revolutionized the field of artificial intelligence. AlexNet achieved an error rate of 16.4% and, in years following, error rates at the ILSRVC fell to a few percent; now, deep neural networks are the golden standard for image recognition tasks. These achievements paved the way for artificial intelligence to infiltrate Silicon Valley.
The Future of AI-Based Computer Vision
Artificial intelligence has already been seamlessly integrated into many aspects of our daily lives. “AI has found tremendous success in many areas of research in recent years,” says Hooman. “Game-playing systems like AlphaGo have used reinforcement learning to teach themselves new strategies. Hearing aides use deep learning algorithms to filter out ambient noise. These technologies even power the natural language processing and translation, object recognition, and pattern matching systems that we take for granted on Google, Amazon, iTunes, and similar services.”
This trend shows no signs of slowing down – there are many small, repetitive tasks that we can automate to free up our time. Although we have made incredible strides in the field of artificial intelligence, we still need to be realistic about its applications for computer vision – it will be a long time before computers are able to interpret images as well as humans can.
In the short term, it’s more likely that artificial intelligence will be used to augment and extend human capabilities. “My best guess is that the boundary between humans and machines will blur in the foreseeable future,” says Hooman.
The healthcare sector is an obvious example of the positive ways that humans and machines can interface. “A bit of historical perspective suggests that innovations like cortical prosthetics will become mainstream,” he says. “These futuristic devices will one day be as ubiquitous and common as technical jogging shoes or prescription eyeglasses.”
This article is the second in a series of articles on computer vision in the context of artificial intelligence. Read this interview with our Director of Core Research and Development to learn about the ways that Motion Metrics uses deep learning.