TwitIE Named Entity Recognizer for Tweets
TwitIE is a named entity recognition pipeline specially tuned for use with Twitter data. It performs
- tokenisation, sentence splitting and part-of-speech tagging using a model trained specifically for Tweets
- normalisation of abbreviations and shortened word forms frequently found in Tweets ("brb", "ttyl", "gr8", "2day", etc.)
- tagging of Twitter-specific entities such as hashtags and @mentions, as well as URLs and emoticons
- general named-entity recognition, to identify basic entity types such as Person, Location, Organization, Money amounts, Time and Date expressions.
| Default annotations | |
| :Person | Standard named entity types |
| :Location | |
| :Organization | |
| :Date | |
| :Address | Includes email and IP addresses as well as street addresses |
| :Token | The individual tokens of the text, with "category" feature for POS |
| :Emoticon | Emoticons such as :-) |
| :Hashtag | Hashtags, including the leading # character |
| :URL | URL mentions |
| :UserID | The username part of @user mentions, not including the leading @ sign |
| Additional annotations available if selected | |
| :Money | Monetary amounts |
| :Percent | Expressions representing percentages |
| :SpaceToken | The spaces between tokens |
| :Sentence | Sentences detected by the sentence splitter |
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.
