While using machine learning methods with texts, the related words service allows you to automatically find connections between terms. Displayed like as a tag cloud, this cognitive service shows you not just synonyms but also elements that only might belong to the word at a second glance. Once trained, the machine digs through a large amount of data and creates a model allowing independent decisions whenever new input arrives or known is updated.
Let's say you have larger quantity of documents which need to be categorized to give them a structure. Instead of manually scanning and classifying content, you need a system that helps you to cope with the flood of data. The vast of information could come from an email server, files uploaded through a web form or data coming in from various sources. Insurance companies for instance struggle with the daily input from its customers. Whereas there are complaints, claim notifications or contract issues - the messages need to be addressed correctly and efficiently.
So, what happens exactly when the related words cognitive service is applied to data? At an initial stage a data scientist or machine learning expert needs to train the machine and form a model by evaluating the results. At a certain stage, the machine is ready to decide on its own based on previous experiences and assumptions and only needs little human adjustment. To stick with our above insurance example, the content of a document scanned on receipt and all relevant words are mapped into a multi-dimensional vector space. Each term has a unique location and the model clusters all related words.
What the picture above clearly shows, is that the related words to the dominant term necessarily do not belong to each other. Some connections might only be understandable from the human perspective. Although an attic might be a part of a house, from a computer's view, it might be actually not related to each other. With the help of machine learning techniques, the computer indepentendly starts to "understand" the content, its intention and the overall meaning of the information.
The beauty of the idea to vectorize language into a mathematical model has some additional advantages. It's not just that you find relations between terms, you are also able to convert your model into different languages. Technically, you only transfer your model from one language space to another and look for similar patterns.
This works for languages with another cultural background like Arabic, Korean or Sanskrit as well. The model only needs a sufficient amount of data for training and to create a solid vector space. The German Patent- and Trademark Office uses this cognitive service in its patent search but other fields of applications have similar fundamental tasks.
To illustrate the related words cognitive service we use a word to browse through a catalogue of German patents. The service autonomously browses through the data and finds its location in the vectorized model. Next to the search word are related documents which corrrespond to the initial search term. By dragging the marker you can browse though results altering the scope.
- content categorization
- knowldege management
- automated content analysis