TwitIE Named Entity Recognizer for Tweets

TwitIE is a named entity recognition pipeline specially tuned for use with Twitter data. It performs

tokenisation, sentence splitting and part-of-speech tagging using a model trained specifically for Tweets
normalisation of abbreviations and shortened word forms frequently found in Tweets ("brb", "ttyl", "gr8", "2day", etc.)
tagging of Twitter-specific entities such as hashtags and @mentions, as well as URLs and emoticons
general named-entity recognition, to identify basic entity types such as Person, Location, Organization, Money amounts, Time and Date expressions.

Default annotations
`:Person`	Standard named entity types
`:Location`
`:Organization`
`:Date`
`:Address`	Includes email and IP addresses as well as street addresses
`:Token`	The individual tokens of the text, with "category" feature for POS
`:Emoticon`	Emoticons such as `:-)`
`:Hashtag`	Hashtags, including the leading # character
`:URL`	URL mentions
`:UserID`	The username part of @user mentions, not including the leading @ sign
Additional annotations available if selected
`:Money`	Monetary amounts
`:Percent`	Expressions representing percentages
`:SpaceToken`	The spaces between tokens
`:Sentence`	Sentences detected by the sentence splitter

1,200 free requests / day

Larger batches £0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/twitie-named-entity-recognizer-for-tweets

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job