Named Entities

What is the text all about?

With the Named Entities Cognitive Service you are able to automatically recognize the overall meaning of a text. The machine learning method scans the text, spots key terms and from a relation to each other detects the meaning of it. 

Let's pretend you have a text about a topic which could use the same term but produces different meanings. Jaguar is a good example in this case, because it can be a wild cat and a car brand as well. For the example we use the Wikipedia entry for the wild cat and later for the car brand to illustrate how the system works. The machine has been trained with alternative content from the online encyclopedia and created a model to recognize relevancy.

Jaguar - The wild cat species

The text we use is from here and will be pasted into the search box.

"The jaguar (Panthera onca) is a wild cat species and the only extant member of the genus Panthera native to the Americas. The jaguar's present range extends from Southwestern United States and Mexico across much of Central America and south to Paraguay and northern Argentina. Though there are single cats now living within the western United States, the species has largely been extirpated from the United States since the early 20th century. It is listed as Near Threatened on the IUCN Red List; and its numbers are declining. Threats include loss and fragmentation of habitat. [...]"

The model is trained to recognize locations and relevant terms and automatically links them to a Wikipedia entry.

What, on the first glimpse, seems to be a no brainer turns out to be really sophisticated. The model independently chose terms in the text that might be relevant because it understood the actual meaning. It did not link words like "cougar", "predator" or "leopard", but rather favored words like "near threatened", "food chain" or "indigenous American". Technically you could say: you checked whether there are pages on Wikipedia, took the text and randomly linked them if there is any. But this is not true. Keep in mind that there is a machine in the back that decided on its own after an initial training.

Jaguar - The car

What would the model do when you insert another article on an Jaguar? This time the car instead of the cat? Would the model recognize the difference at all? Once again, we take an article out of Wikipedia:

"Jaguar (UK: /ˈdʒæɡjuər/, US: /ˈdʒæɡwɑːr/) is the luxury vehicle brand of Jaguar Land Rover,[2][1] a British multinational car manufacturer with its headquarters in Whitley, Coventry, England and owned by the Indian company Tata Motors since 2008.[3] Jaguar Cars was the company that was responsible for the production of Jaguar cars until its operations were fully merged with those of Land Rover to form Jaguar Land Rover on 1 January 2013."

Conventionally, the program would have a problem to recognize the difference between a cat and a car simply by searching for the word "jaguar". But since there are other automotive related words are found in the text, the machine concludes it's about automobiles and not biology. Interestingly, the model was trained not just to find locations but also organizations (e.g. company groups) and persons of historical importance all linking to a Wikipedia entry.

But why does that matter?

What's the use case for the service?

Although this service might be a nice gimmick when you want to know more about jaguars (cars or cats), what's the real relevance for this? Well, with the name entities service you would be able to categorize and link your data in an unified approach to collect the knowledge in your organization. Once all of your data is connected to the enterprise search, the machine would start to "understand" the meaning of the content. A document could be classified automatically according to its content. Data retrieved from the file could be inserted into the META data to be used in other applications. The fields of use are widely dispersed and the service should be viewed as a helper to cope with the ever growing requirements of information management.