OpenNLP English Pipeline

The English tokeniser, sentence splitter, POS tagger, phrase chunker and named-entity recogniser from Apache OpenNLP. The components are based on the maxent machine learning algorithm, and produce Token and Sentence annotations in a form compatible with other standard GATE tools.

Default annotations
`:Person`	Standard named entity types
`:Location`
`:Organization`
`:Date`
Additional annotations available if selected
`:Money`	Monetary amounts
`:Percentage`	Expressions representing percentages
`:Time`	Time expressions
`:Sentence`	Sentences detected by the sentence splitter
`:Token`	The individual tokens of the text, with "category" feature for POS and "chunk" feature for for the I/O/B-style chunk tags. Complete chunks derived from the tags are also available as their respective annotation types (e.g. a sequence of tokens tagged B-NP, I-NP, I-NP gives rise to an "NP" annotation spanning the sequence).

1,200 free requests / day

Larger batches £0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/opennlp-english-pipeline

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job