spaCy English Named Entity Recognition
An application which annotates documents using the spaCy Named Entity Recognition component for English with the en_core_web_sm model.
Default annotations | |
:PERSON | People, including fictional. |
:GPE | Countries, cities, states. |
:LOC | Non-GPE locations, mountain ranges, bodies of water. |
:ORG | Companies, agencies, institutions, etc. |
:NORP | Nationalities or religious or political groups. |
:DATE | Absolute or relative dates or periods. |
:TIME | Absolute or relative dates or periods. |
:FAC | Buildings, airports, highways, bridges, etc. |
Additional annotations available if selected | |
:PRODUCT | Objects, vehicles, foods, etc. (Not services.) |
:EVENT | Named hurricanes, battles, wars, sports events, etc. |
:WORK_OF_ART | Titles of books, songs, etc. |
:LAW | Named documents made into laws. |
:LANGUAGE | Any named language. |
:PERCENT | Percentage, including "%". |
:MONEY | Monetary values, including unit. |
:QUANTITY | Measurements, as of weight or distance. |
:ORDINAL | "first", "second", etc. |
:CARDINAL | Numerals that do not fall under another type. |
:Token | The individual tokens of the text, with "category" feature for POS |
:SpaceToken | The spaces between tokens |
:Sentence | Sentences detected by the sentence splitter |
:NounChunk | Noun chunks detected by the noun chunker |
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.