Universal Dependencies POS Tagger for en / English
A POS tagger for en / English using the Universal Dependencies POS tagset.
This tagger is based on a simple maximum entropy model trained on the corpora from the universal dependencies collection using the GATE Learning Framework plugin.
The model is trained on all available corpora, except the test corpus. Evaluation on the UD_English_test set gives 0.9377 accuracy. Accuracy on out-of-vocabulary words (words not seen in the trainin set) is 0.7735 (case-sensitive) / 0.8099 (not case-sensitive). Evaluation on the Penn test set gives 0.9054 accuracy. Accuracy on out-of-vocabulary words (words not seen in the trainin set) is 0.8597 (case-sensitive) / 0.8769 (not case-sensitive).
Default annotations | |
:Token | Tokens generated with the default tokeniser. The universal dependencies POS tag is stored in feature "upos". |
Additional annotations available if selected | |
:Sentence | The sentence annotation created by the default regular expression sentence splitter |
:SpaceToken | As generated with the default tokeniser. |
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.