Multilingual News Genre Classifier

This multilingual classifier is capable of identifying news genre in 104 languages, but it will perform the best on the six languages for which it was trained on: English, French, German, Italian, Polish and Russian.

It is developed as part of the vera.ai project and is based on our submission to the SemEval 2023 Task 3, where it performed the best on English data.

The classifier recognises 3 news genre categories:

  • Opinionated News
  • Objective reporting
  • Satire

The maximum input document length is 512 words. This means if the article is longer than 512 tokens, that the classifier sequentially selects the sentences from the beginning and the end of the article, preserving the original order, until the length of 512 tokens is reached, and the text in the middle is not analysed.

In addition to the overall genre classification, the classifier is able to indicate which sentences in the document (within the 512 words used) it considered to be the most important. To request this information, enable the :Important_Sentence annotation type.

Default annotations
Each input text returns a single genre label along with the score that represents the probability of that genre estimated by the classifier.
News topic Definition
:OPINION An article is considered as an opinion piece if it expresses what someone thinks or feels about a topic. It is a person's attempt to persuade readers to adopt a particular position on an event or to change another's thinking, feeling, or actions. Opinions do not necessarily rest in fact or knowledge, though the most respected opinions generally do.
:REPORTING An article aims at objective news reporting when it involves discovering all relevant facts, selecting and presenting the important facts and weaving a comprehensive story. The generic structure of reporting news acts to naturalise and to obscure the operation of underlying ideological positions.
:SATIRE A satirical piece is a factually incorrect article, with the intent not to deceive, but rather to call out, to ridicule, or to expose behaviour that is shameful, corrupt, or otherwise 'bad'. It deliberately exposes real-world individuals, organisations, and events to ridicule. Satirical pieces use a variety of rhetorical devices, such as hyperbole, absurdity, and obscenity, in order to shock or to unease readers and tend to mimic true articles, incorporating irony in an attempt to provide humorous insights.
Additional annotations available if selected
:Important_Sentence Identifies the sentences in the document that were most influential to the classifier in making its decision.
1,200 free requests / day
Batch processing not available

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

https://cloud-api.gate.ac.uk/process/news-genre-classifier

Create API Key