GATE Hate
A service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text.
It will also tag UK members of parliament for the 2015, 2017 and 2019 general elections, and candidates for the 2017 and 2019 elections. Where an individual has run for election or been elected multiple times, multiple "Politician" annotations will appear with different "minorType" features. In this way, a person's recent political career can be tracked. The current parliament is the 58th parliament, with previous parliaments counting down, so that MPs with a minorType feature of "mp55" are those that were MPs before the general election in 2015.
The service will also tag a range of politically relevant topics, as well as entities such as persons, locations and organizations and Twitter entities such as hashtags and user mentions. It is designed to run on tweets in the original Twitter JSON input format, on which it will also produce metadata such as whether the tweet is a reply or a retweet. Upload your own or harvest some with our Twitter Collector. However it can be run on any text.
Note that this is an updated and more generic version of our original GATE Hate for Politics app and should be used in preference to that service, unless you are wanting to replicate previous published work, as improvements will only be included in this version going forward.
Default annotations | |
:Abuse | Abusive phrases. Includes a feature, "type", indicating the type of abuse, such as racist, religious etc., and a "target" feature indicating whether the abuse is intended for the addressee or some other party. |
:Topic | Mentions of topics relevant to UK politics, based largely on the topic classification used on gov.uk. |
:Politician | Recognised UK politicians such as MPs, parliamentary candidates and other significant individuals such as party leaders who are not MPs. MP mentions include a feature distinguishing them by parliament, e.g. as of 2020 we are in the 58th parliament. Where an individual has been elected to more than one parliament, they receive multiple annotations indicating this. |
:Party | UK political parties. |
:Hashtag | From the original tweet, if it was run on tweets. |
:UserID | From the original tweet, if it was run on tweets. |
:URL | From the original tweet, if it was run on tweets. |
:Organization | Entities found by GATE's ANNIE named entity recogniser. |
:Person | Entities found by GATE's ANNIE named entity recogniser. |
:Address | Entities found by GATE's ANNIE named entity recogniser. |
:Date | Entities found by GATE's ANNIE named entity recogniser. |
:Location | Entities found by GATE's ANNIE named entity recogniser. |
:Money | Entities found by GATE's ANNIE named entity recogniser. |
:Percent | Entities found by GATE's ANNIE named entity recogniser. |
Additional annotations available if selected | |
:AbusePhrase | Abusive phrases that may, or may not, contribute to the creation of an Abuse annotation. |
:Sentence | Sentences |
:SentenceSentiment | Sentences which convay sentiment. These have multiple features detailing the relevant words and polarity etc. |
:Tweet | Original tweet data |
:TweetSegment | When a JSON objcect from Twitter contains multiple Tweets, for example with a quote tweet, each individual tweet gets a TweetSegment annotation. |
:Author | Information on the Author of a Tweet. |
:ReplyTo | Information on the twitter user being replied to. |
:RetweetedAuthor | Information on the twitter user being retweeted. |
:SlurLookup | A slur term found in the document, but which may not have contribured to the detected abuse. |
:OffensiveLookup | An offensive term found in the document, but which may not have contribured to the detected abuse. |
:SensitiveLookup | A sensitive term found in the document, but which may not have contribured to the detected abuse. |
When the input is Twitter JSON and the output is saved as GATE XML or sent to Mímir, the following additional information is extracted from the tweet metadata and made available as document-level features:
- author
- The screen name of the Tweet author
- tweet_id
- The ID of this tweet
- tweet_uri
- URL of the tweet in the form
https://twitter.com/{author}/status/{id}
- tweet_kind
- original, retweet or reply
- retweet_of_screen_name
- If it's a retweet, who is it a retweet of?
- retweet_of_status_id
- If it's a retweet, the status ID of the tweet it's a retweet of.
- in_reply_to_screen_name
- If it's a reply, who is it a reply to?
- in_reply_to_status_id
- If it's a reply, the status ID of the tweet it's a reply to.
- timestamp
- ISO8601-formatted representation of the tweet "created_at" timestamp
- hour_timestamp and minute_timestamp
- Numeric representation of the timestamp to hour (YYYYMMDDHH) or minute (YYYYMMDDHHmm) granularity
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.