Thanks to digital cameras and especially smart phones, everyday photography experienced a sharp increase during the last couple of years. In 2017 nearly one trillion (that's a number with twelve zeros) pictures had been taken - more than 100 billion more than 2016. A growing number out of these are images with faces on it, like selfies. Nowadays, cameras are very advanced in recognizing patterns like faces, mostly using differences in contrast or color and geometric occurrences.
Machine learning uses this technological approach to automatically detect and recognize what person is actually shown on the image. This requires some initial training but at a certain stage, the machine has learned to decide autonomously who is pictured. The evaluated data can be either added to the META data of the image or to a digital asset management (DAM). This way, not only the image search will be greatly enhanced but new relations between images could be made as well.
Picture the following situation: you work at big publishing house, which runs some sort of picture archive or media service department collecting and categorizing a constant stream of images coming from photographers from all over the world. Your job is the tagging of each image with keywords for the digital asset management (DAM) system. Rarely the image files do contain further information, making it nearly impossible to add them to a meaningful search. Your editors, however, do not have the time to comb through the vast amount of data for a fitting image of e.g. Angela Merkel meeting Vladimir Putin at last years summit in Moscow.
With the help of machine learning methods, a trained machine is able to autonomously recognize faces and add the findings to file itself or to the DAM system. The data scientist uses photos during training and teaches the machine who is shown. After a short while - the number of training files for a person is usually around 20 samples - the machine will start to work on its own using previous experiences and assumptions to evaluate probabilities. So, whenever you add a new image to system, it automatically recognizes the person (if trained before) and adds this information.
Each digital picture has a so called EXIF data embedded to the file, stating camera statistics like brand, model or lens used. Modern cameras feature a GPS chip adding latitudes and longitudes as well. Together with the automated tags the machine learning created, this makes a powerful combination. Your editor is now able to search the picture by inserting the name of the person and the location and instantly receives results without a human interference before.
The following machine has been trained with images from German politicians and celebrities. For a demonstration, we take photographs found on Wikipedia. All pictures have been renamed and stripped off from the META data or EXIF information.
As already pointed out before, on of the major use cases cold be the autonomous tagging of photographs picturing persons. This would greatly increase the accuracy in results while searching for images.
- content analysis
- content enrichment
- search improvement