Russian NER (with inflexional gazetteer and orthomatcher)

A named entity recognition pipeline that identifies basic entity types, such as Person, Location, Organization, Money amounts, Time and Date expressions. It works on documents in the Russian language.

This version of the pipeline includes an inflexional gazetteer to recognise more morphological variants of target names, and an orthomatcher to perform basic coreference resolution based on orthographic similarity.

Default annotations
:Person Standard named entity types
:Address Includes email and IP addresses as well as street addresses
Additional annotations available if selected
:Money Monetary amounts
:Percent Expressions representing percentages
:Token The individual tokens of the text, with "category" feature for POS
:SpaceToken The spaces between tokens
:Sentence Sentences detected by the sentence splitter
:Lookup Individual gazetteer lookups – for those lookups that come from the inflectional gazetteer this includes a "lemma" feature giving the base word form
:MSD "Morpho-Syntactic Description" for selected tokens, including features for "lemma" (the base form of inflected words) and "type" (roughly equivalent to a part of speech tag in English, though more complex as it encodes features such as gender, grammatical case, etc.)
1,200 free requests / day
Larger batches GBP0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job