Offensive Language Classifier

This classifier is a fine-tuned Roberta-base model using the simpletransformers toolkit. We use the OLIDv1 dataset from OffensEval 2019 as training data. This dataset contains tweets classified as offensive or non-offensive.

Default annotations
:OffensiveLanguage an annotation which spans the entire document and has the following features:
  • isOffensive, true if the classifier has determined the text is offensive, false otherwise
  • probability, the probability with which the classifier assigned the isOffensive label
1,200 free requests / day
Larger batches GBP0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job