Russian NER (with inflexional gazetteer and orthomatcher)
A named entity recognition pipeline that identifies basic entity types, such as Person, Location, Organization, Money amounts, Time and Date expressions. It works on documents in the Russian language.
This version of the pipeline includes an inflexional gazetteer to recognise more morphological variants of target names, and an orthomatcher to perform basic coreference resolution based on orthographic similarity.
Default annotations | |
:Person | Standard named entity types |
:Location | |
:Organization | |
:Date | |
:Address | Includes email and IP addresses as well as street addresses |
Additional annotations available if selected | |
:Money | Monetary amounts |
:Percent | Expressions representing percentages |
:Token | The individual tokens of the text, with "category" feature for POS |
:SpaceToken | The spaces between tokens |
:Sentence | Sentences detected by the sentence splitter |
:Lookup | Individual gazetteer lookups – for those lookups that come from the inflectional gazetteer this includes a "lemma" feature giving the base word form |
:MSD | "Morpho-Syntactic Description" for selected tokens, including features for "lemma" (the base form of inflected words) and "type" (roughly equivalent to a part of speech tag in English, though more complex as it encodes features such as gender, grammatical case, etc.) |
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.