Brief history of machine vision

First, it’s worth understanding the terminology. A distinction must be made between the computer vision and machine vision. Computer vision is both a theory and set of relevant technologies. It is about how machines can visually sense objective reality. Simply said, how computers see the world.

The first who began to talk about computer vision, except for science fiction writers, was the British scientist Oliver Selfridge. In his article, Eyes and Ears for Computers, written in 1955, he predicted our current reality. One of the main examples is facial recognition systems. Now, we post a party photo on a social network, and in a split second artificial intelligence recognizes a friend in the photo and offers to tag him.

The following stages can be distinguished in the development of machine vision:

1955: Massachusetts Institute of Technology (MIT) professor Oliver Selfridge published the article Eyes and Ears for Computers. In the article, the author put forward the theoretical idea of equipping a computer with sound and image recognition tools.

Oliver Gordon Selfridge

1958: the Cornell University psychologist Frank Rosenblatt created a computer implementation of the perceptron (from perception), a device simulating the pattern recognition by the human brain.

Frank Rosenblatt

The perceptron was first modeled in 1958, and its training required about half an hour of computer time on an IBM-704 computer. The hardware implementation was Mark I Perceptron built in 1960 and intended for visual image recognition. However, the machine vision tasks were rather theoretic since neither the technology nor the mathematical support to solve such complex task were yet available.

1960s: the first software image processing systems emerged (mainly to remove noise from aircraft and satellite photos), applied research in the field of printed character recognition began to develop. However, the development of this field of science was still limited because of the lack of cheap optical data input systems, limitations and rather narrow application area of computing systems. The rapid development of computer vision systems throughout the 60s can be explained by the expanding use of computers and obvious need for faster and more efficient human-computer communication. By the early 60s, computer vision tasks mainly covered the space research area that required processing of a large amount of digital information.

1970s: Lawrence Roberts, a graduate student at the Massachusetts Institute of Technology, put forward the concept of machine generation of three-dimensional object images based on their two-dimensional image analysis. At this stage, a more in-depth data analysis began. Various approaches to object recognition in an image began to develop, including structural, feature and texture.

Lawrence Gilman Roberts

1979: professor Hans-Hellmut Nagel from the University of Hamburg laid the basis of the theory of dynamic scene analysis allowing to recognize moving objects in a video stream.

In the late 1980s, robots were created capable of more or less satisfactorily assess the world around them and independently perform actions in the natural environment.

The 80s and 90s were marked by the emergence of a new generation of sensors of two-dimensional digital information fields of various physical natures. The development of new measuring systems and methods of real-time recording two-dimensional digital information fields allowed to obtain time-stable images generated by these sensors for analysis. Improvements in the sensor production technologies allowed to significantly reduce their cost and therefore expand their applications.

From the early 90s onward, the image processing sequence has been considered in accordance with the so-called modular paradigm in terms of algorithms. This paradigm proposed by D. Marr on the basis of a long-term study of the human visual perception mechanisms, states that image processing should be based on several successive levels of an ascending information line: from the iconic object representation (bitmap image, unstructured information) to symbolic representation (vector and attribute data in structured form, relational structures, etc.). In the middle 90s, the first commercial automatic vehicle navigation systems appeared. Effective computer means of motion analysis were developed at the end of the 20th century.

One of the first industrial machine vision systems, Autovision II, from Automatics, was demonstrated at an exhibition in 1983. A camera on a tripod was pointed down onto a backlit table to produce a clear image on the screen to be inspected for foreign matter.

Machine vision is a little different. It is about the knowledge and technology application. Machine vision helps make the production of goods and services more efficient using the same principles as computer vision. The first company producing solutions in this area is considered to be the Automatix, USA, that in the early 1980s put on the market several models of machines capable of soldering chips. They were equipped with analog cameras that transferred the image to the processor for processing. It calculated the image parameters and, based on them, sent commands to the system parts directly involved in production.

Machine vision is a technology that helps equipment see some production process, analyze data and make informed decisions. And all this in a split second.

How is this better than human vision?

Let's figure out how we ourselves see the world. Light particles (aka photons) are constantly reflected from different objects and fall on the eye retina. Each eye contains approximately 126 million photon-sensitive cells that decode information and send it to the brain. These cells are divided into two types: cones and rods. The first ones are responsible for color recognition, the latter allow us, in particular, to see at night detecting only shades of gray. We have three types of cones, some respond blue colors, others to green, and others to red. It makes a whole rainbow.

Our visual system, however, is not the most advanced on the Earth. The eyes of burrowing mantis shrimps, for example, are much more complex. They have as many as 16 types of cones and their eyes move independently of each other, and each one is divided into three more parts. At the same time, the burrowing mantis shrimps have a very small and primitive brain compared to ours. It cannot process large data, but receives a ready-made detailed transcript from the eyes. The humans, on the contrary, have a little simpler eyes but the most powerful brain among all species.

The computer vision uses both approaches. Some systems use ordinary digital (sometimes even analogue) cameras that respond to special sensors (that detect if something is wrong), receive a raw image, process it, recognize elements and patterns, make a decision and send a signal to others systems. And some have smart cameras. This is exactly the case of the burrowing mantis shrimp. Here, the cameras independently make part of the analysis to unload the system processors.

Who is more accurate, a machine or a human?

Just five years ago, machine vision technologies were much less advanced and successfully recognized only 65% to 70% of objects that fell into their field of view. This figure was high but still insufficient for machine vision to be entrusted with important tasks. Now machines already recognize up to 98% of objects. And they really recognize by not only recording the presence, but also identifying what exactly they see and then they can even decide what to do next.

Human perception systems still remain more flexible. For example, we interpret context better. Or rather, we are the only ones who know what it is. Machines thoroughly study situations that are new to them, but a person can always invent something to confuse the machine. At least for now. Therefore, the successful recognition rate remains at 98% and does not reach 100%.

However, machine vision systems have one undeniable advantage over human vision. We can usually concentrate on three to seven objects that we see. It depends on the particular person, but rarely much more. Computer vision systems record absolutely all objects and actions that enter their processors through images. The computer's attention cannot be distracted; everything that happens is of equal importance for it.

Here are the tasks that can be solved using machine vision

Imagine that a tray with 50 nuts is in front of you. Of these, 48 are normal, high-quality nuts, one has a scratch on the side, and other has a bulge on one of the edges. And there is a bolt among the nuts for some reason. Probably, you’ll detect abnormal and defective parts in a couple of seconds. However, a second tray with nuts immediately appears in front of you. And then another one. And so on for eight hours.

This is a common production operator shift. It is likely that this employee (regardless of skills) will lose concentration in a couple of hours thinking for a second about lunch or ending of last night TV show. Maybe he/she will be distracted by a colleague’s remark. In any case, most likely, sooner or later he/she will miss a couple of defective parts. This is normal: the oversight factor is probably already built into the performance indicators. However, if machine vision system controls production instead of a person, it can work equally reliably even for a whole year without interruption. The process is as follows: sensors scan all the parts and send a signal if something is wrong. Cameras running together with LEDs study the image carefully and transfer the images to the computer. It already has a large database with nut photos of this particular series and will instantly give a command to the robot downstream the conveyor to sort them.

This solution allows to save costs. And a production inspector can always be retrained as an operator of such system, his/her experience should come in handy to set up the machine. Today they are quite simple and user-friendly. To transfer your knowledge to the system and demonstrate it your work features, you do not really need to be a deep learning expert.

Another common application area is security. Working in the same way as with nuts, the machine vision system will instantly analyze the workshop and find a worker who forgot to wear a safety helmet. And then it will simply lock his/her machine or warn over the speaker.

The third machine vision application area is the Internet of Things. This is the name given to a set of technologies that enable various devices to interact with each other. For example, there are already refrigerators that use computer vision to detect spoiled food.

Such solutions can be implemented not only in factory floors, but also in warehouses, retail, banks, logistics and transport service systems, agriculture and livestock breeding etc. In the US market, machine vision systems began to be used earlier and more actively (due to the greater number of solutions offered) and they are now used in many industries from the automotive to pharmaceutical industries.

Нужна консультация?
Call me
8 (800) 201-3896