Political Abuse Monitor
A service that tags abusive utterances in any text. It includes a feature, "type", indicating the type of abuse if any, such as sexist, racist etc, and a "target" feature that indicates if the abuse was aimed at the addressee or some other party. This can be run on any English language text. You can check also which words or phrases were deemed potentially abusive via the SlurLookup, SensitiveLookup and OffensiveLookup annotations.
It will also tag UK members of parliament for the 2015, 2017 and 2019 general elections, candidates for the 2017 and 2019 elections, and members of the Irish Dáil Éireann from 20th February 2020. Where an individual has run for election or been elected multiple times, multiple "Politician" annotations will appear with different "minorType" features. In this way, a person's recent political career can be tracked. For example, the current UK parliament is the 58th, with previous parliaments counting down, so that MPs with a minorType feature of "mp55" are those that were MPs before the general election in 2015.
The service will also tag a range of politically relevant topics, as well as entities such as persons, locations and organizations and Twitter entities such as hashtags and user mentions. It is designed to run on tweets in the original Twitter JSON input format, on which it will also produce metadata such as whether the tweet is a reply or a retweet. Upload your own or harvest some with our Twitter Collector. However it can be run on any text.
Default annotations | |
:Abuse | Abusive phrases. Includes a feature, "type", indicating the type of abuse, such as racist, religious etc., and a "target" feature indicating whether the abuse is intended for the addressee or some other party. |
:Topic | Mentions of topics relevant to politics, based largely on the topic classification used on gov.uk and so is UK focused. |
:Politician | Recognised UK politicians such as MPs, parliamentary candidates and other significant individuals such as party leaders who are not MPs. MP mentions include a feature distinguishing them by parliament, e.g. as of 2020 we are in the 58th parliament. Where an individual has been elected to more than one parliament, they receive multiple annotations indicating this. |
:Party | UK political parties. |
:MP | The subset of politicians who are UK MPs. |
:DailMP | The subset of politicians who are members of the Irish Dáil Éireann. |
:Hashtag | From the original tweet, if it was run on tweets. |
:UserID | From the original tweet, if it was run on tweets. |
:URL | From the original tweet, if it was run on tweets. |
Additional annotations available if selected | |
:Sentence | Sentences |
:Tweet | Original tweet data |
:SlurLookup | Potential slurs |
:SensitiveLookup | Sensitive terms which might be related to abuse |
:OffensiveLookup | Offensive terms |
:Organization | Entities found by GATE's ANNIE named entity recogniser. |
:Person | Entities found by GATE's ANNIE named entity recogniser. |
:Address | Entities found by GATE's ANNIE named entity recogniser. |
:Date | Entities found by GATE's ANNIE named entity recogniser. |
:Location | Entities found by GATE's ANNIE named entity recogniser. |
:Money | Entities found by GATE's ANNIE named entity recogniser. |
:Percent | Entities found by GATE's ANNIE named entity recogniser. |
When the input is Twitter JSON and the output is saved as GATE XML or sent to Mímir, the following additional information is extracted from the tweet metadata and made available as document-level features:
- author
- The screen name of the Tweet author
- tweet_id
- The ID of this tweet
- tweet_uri
- URL of the tweet in the form
https://twitter.com/{author}/status/{id}
- tweet_kind
- original, retweet or reply
- retweet_of_screen_name
- If it's a retweet, who is it a retweet of?
- retweet_of_status_id
- If it's a retweet, the status ID of the tweet it's a retweet of.
- in_reply_to_screen_name
- If it's a reply, who is it a reply to?
- in_reply_to_status_id
- If it's a reply, the status ID of the tweet it's a reply to.
- timestamp
- ISO8601-formatted representation of the tweet "created_at" timestamp
- hour_timestamp and minute_timestamp
- Numeric representation of the timestamp to hour (YYYYMMDDHH) or minute (YYYYMMDDHHmm) granularity
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.