Image Recognition

What am I seeing on this picture?

Aside from intelligent text analysis through methods of machine learning, images and videos make a large part of content your might be dealing with. The problem with graphical data is, that an ordinary text crawler is only able to retrieve information stored in the file such as META data or the file name. The "dumb" program does not know what's the actual meaning of the picture or what's displayed in the first place. Machine learning closes this gap by recognizing the valuable information and creating patterns for data later to come.

The image recognition cognitive service has not been designed for the speedy detection of live images but rather for large quantities of data and to improve search capabilities. Picture an organization or company having a large pile of images (machines, drawings, etc.) and the trainee needs a certain image e.g. for a presentation. So, instead by going to another colleague , he might use the image recognition cognitive service to trace the image and let the machine find out what is actually showing.

Triangulation

How machine learning understands images

As said before, the main obstacle on how to use image data for information retrieval, is the the problem of missing or insufficient crawling content. A raster graphic such as a JPEG, PNG or GIF usually contains only pixel information which color a certain pixel at a certain position has. That would be the only data a search engine could crawl but it would be useless if someone tries to find an image of chair. In the past, search engines heavily relied on embedded image information, attached to the file itself - the so-called META information. There, a photographer could add information such as a description of what's seen on the picture, some geo information or camera settings. If he was too lazy to do this, a search engine wouldn't have a chance to create a meaningful hit. The last resort could only be the file name itself (e.g. a_dog.jpg, picture_of_my_house.png, etc.) but a again, it all depends on the creator of the file.

Machine learning provides new opportunities to tackle this problem. Instead of relying on insufficient data, the machine learning method tries to understand -  or to recognize - the image. Just like a toddler learns from its parents that the thing on the streets that moves fast, has four wheels and has other humans inside is a car, the machine needs an initial training to develop a method and patterns for future inputs (other cars). Sometimes, the pattern may vary in shape, size or color and the machine needs to adapt its method. But in the end, this approach is an intelligent way to efficiently add images to a search.

Once uploaded, the machine automatically tried to find meaningful patterns in it, looking for contrast and color variations and placing a triangular pattern on it. These mathematical figures are added to a model and at certain point of training, the machine starts to decide what pattern of a newly uploaded image might match with a previous one.

A quick demonstration

To illustrate how the image recognition works, we trained a model to recognize random images containing various image motifs. The filenames had been changed, erasing any hint and the META data has been deleted as well.

One might say, that the above images had been used to train the machine or the software only checks if the content of the file (the pixels and their location) corresponds to a database in the back. But this is not true. The bulk of the images used for training came from picture pools such as Wikipedia. That is the magic of the machine learning technology. Adding new input and matching it with already known patterns.

For a short demonstration, we created a simple tool to drag and drop images using a trained machine in the background to recognize the meaning of the pictures.

Limitations

Since the system needs training, the results sometimes might be blurry and require some adjustment. But this happens to the toddler as well. By the time it sees a car having more than four wheels (e.g. a truck) it might ask its parents if this is still a car or something else.

The results of the above picture might not be satisfying. The data science engineer needs to re-train the model, eradicating wrong findings.

And it isn't only shapes that matter for recognition. It's color as well. Changing the color of the crab, just for the sake of argument, from orange to green, irritates the model and may require new training.

What's the use case for the service?

Wherever there are large quantities of pictures (e.g. archives, newspapers, etc.) either coming from historical collections or from a steady stream of new input, images need to be sighted and categorized. Modern digital asset managements (DAM) have some mechanisms to do this but rarely rely on machine learning methods. Instead, the tedious task of keywording still remains on human force. This is were image recognition cognitive services will step in to deliver more accurate results, leaving time for working with the generated content.