Part-of-Speech Tagger for Tweets

TwitIE is a named entity recognition pipeline specially tuned for use with Twitter data. This pipeline is a cut down version of TwitIE which tokenizes and performs Part-of-Speech labelling of tweet data.

Tweet
An annotation spanning a single Tweet
Token
The annotation that covers each individual word
kind
the type of token; word, punctuation, number etc.
length
the length, in characters, of the token
string
the text of the token
category
the Part-of-Speech label of the token
UserID
The username part of @user mentions, not including the leading @ sign
user
the username, not including the leading @ sign
Emoticon
Emoticons such as :-)
normalized
the normalized form of the emoticon, i.e. the value for both :) and :-) is :)
URL
Used to annotate any URL occuring within the tweet
string
the URL address
Hashtag
Hashtags, including the leading # character
1,200 free requests / day
Larger batches £0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/twitie-posTagger-pipeline

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job