Twitter user classification

A pipeline to attempt to classify the author of a tweet as either a person, location or organization, based on clues found in their "user" profile metadata within the tweet. Within each broad "major type" a number of narrower "minor type" categories are also used.

Output is given as an annotation AuthorClassification spanning the whole document, and when Twitter JSON is selected as the output format the classification is also added as a property "gate_classification" to the top-level "user" object in the tweet.

Note: because this pipeline operates on the user profile information and not on the actual text of the document, it can only run on documents that are in Twitter JSON format. It will not produce useful output on plain text or other non-JSON documents.

1,200 free requests / day
Larger batches £0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/sobigdata-user-classification

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job