GATE Hate

A service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text.

It will also tag UK members of parliament for the 2015, 2017 and 2019 general elections, and candidates for the 2017 and 2019 elections. Where an individual has run for election or been elected multiple times, multiple "Politician" annotations will appear with different "minorType" features. In this way, a person's recent political career can be tracked. The current parliament is the 58th parliament, with previous parliaments counting down, so that MPs with a minorType feature of "mp55" are those that were MPs before the general election in 2015.

The service will also tag a range of politically relevant topics, as well as entities such as persons, locations and organizations and Twitter entities such as hashtags and user mentions. It is designed to run on tweets in the original Twitter JSON input format, on which it will also produce metadata such as whether the tweet is a reply or a retweet. Upload your own or harvest some with our Twitter Collector. However it can be run on any text.

Note that this is an updated and more generic version of our original GATE Hate for Politics app and should be used in preference to that service, unless you are wanting to replicate previous published work, as improvements will only be included in this version going forward.

Default annotations
:Abuse Abusive phrases. Includes a feature, "type", indicating the type of abuse, such as racist, religious etc., and a "target" feature indicating whether the abuse is intended for the addressee or some other party.
:Topic Mentions of topics relevant to UK politics, based largely on the topic classification used on gov.uk.
:Politician Recognised UK politicians such as MPs, parliamentary candidates and other significant individuals such as party leaders who are not MPs. MP mentions include a feature distinguishing them by parliament, e.g. as of 2020 we are in the 58th parliament. Where an individual has been elected to more than one parliament, they receive multiple annotations indicating this.
:Party UK political parties.
:Hashtag From the original tweet, if it was run on tweets.
:UserID From the original tweet, if it was run on tweets.
:URL From the original tweet, if it was run on tweets.
:Organization Entities found by GATE's ANNIE named entity recogniser.
:Person Entities found by GATE's ANNIE named entity recogniser.
:Address Entities found by GATE's ANNIE named entity recogniser.
:Date Entities found by GATE's ANNIE named entity recogniser.
:Location Entities found by GATE's ANNIE named entity recogniser.
:Money Entities found by GATE's ANNIE named entity recogniser.
:Percent Entities found by GATE's ANNIE named entity recogniser.
Additional annotations available if selected
:AbusePhrase Abusive phrases that may, or may not, contribute to the creation of an Abuse annotation.
:Sentence Sentences
:SentenceSentiment Sentences which convay sentiment. These have multiple features detailing the relevant words and polarity etc.
:Tweet Original tweet data
:TweetSegment When a JSON objcect from Twitter contains multiple Tweets, for example with a quote tweet, each individual tweet gets a TweetSegment annotation.
:Author Information on the Author of a Tweet.
:ReplyTo Information on the twitter user being replied to.
:RetweetedAuthor Information on the twitter user being retweeted.
:SlurLookup A slur term found in the document, but which may not have contribured to the detected abuse.
:OffensiveLookup An offensive term found in the document, but which may not have contribured to the detected abuse.
:SensitiveLookup A sensitive term found in the document, but which may not have contribured to the detected abuse.

When the input is Twitter JSON and the output is saved as GATE XML or sent to Mímir, the following additional information is extracted from the tweet metadata and made available as document-level features:

author
The screen name of the Tweet author
tweet_id
The ID of this tweet
tweet_uri
URL of the tweet in the form https://twitter.com/{author}/status/{id}
tweet_kind
original, retweet or reply
retweet_of_screen_name
If it's a retweet, who is it a retweet of?
retweet_of_status_id
If it's a retweet, the status ID of the tweet it's a retweet of.
in_reply_to_screen_name
If it's a reply, who is it a reply to?
in_reply_to_status_id
If it's a reply, the status ID of the tweet it's a reply to.
timestamp
ISO8601-formatted representation of the tweet "created_at" timestamp
hour_timestamp and minute_timestamp
Numeric representation of the timestamp to hour (YYYYMMDDHH) or minute (YYYYMMDDHHmm) granularity
1,200 free requests / day
Larger batches £0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/gate-hate-generic

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job