OpenNLP Dutch Pipeline
The Dutch tokeniser, sentence splitter, POS tagger, phrase chunker and named-entity recogniser from Apache OpenNLP. The components are based on the maxent machine learning algorithm, and produce Token and Sentence annotations in a form compatible with other standard GATE tools.
|Standard named entity types
|Additional annotations available if selected
|Expressions representing percentages
|Sentences detected by the sentence splitter
|The individual tokens of the text, with "category" feature for POS and "chunk" feature for for the I/O/B-style chunk tags. Complete chunks derived from the tags are also available as their respective annotation types (e.g. a sequence of tokens tagged B-NP, I-NP, I-NP gives rise to an "NP" annotation spanning the sequence).
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.