Rumour veracity classifier
User generated content such as tweets often make claims that are unsubstantiated and possibly untrue. This service attempts to classify whether a text is discussing a rumour that is likely to be true, likely to be false, or if the rumour is unverified or the classification is unclear. Our approach makes use of only the tweet content, which it passes through LSTM units that learn to distinguish between the three classes we aim to predict (true, false or unverifiable). However, the unique part of our approach is that prior to passing the tweet to the LSTM layer, it first looks within the tweet for some recurring information that is typically used by others to spread rumours, and makes adjustments on the input -- words carrying useful information are kept as they are, and others are downgraded in terms of contribution. This is achieved through attention layer implementation. We evaluated our approach on the RumourEval shared task 2017 test data and achieved over 60% accuracy, which is currently the state-of-the-art performance for this task.
|:Veracity||Annotation spanning the whole text with features "rumour_label" (the raw label "true", "false" or "unverified" from the classifier), "status" (a more human-oriented version of the rumour_label) and "confidence" (the confidence score)|
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.