Multilingual News Topic Classifier
This multilingual classifier is capable of identifying common news topics in 104 languages, but it will perform the best on the six languages for which it was trained on: English, French, German, Italian, Polish and Russian.
It is developed as part of the EDMO Ireland project and is based on our submission to the SemEval 2023 Task 3, where it performed the best on English data.
The classifier recognises 9 topic categories:
- Economy and Resources
- Religious, Ethical and Cultural
- Fairness, Equality and Rights
- Law and Justice System
- Crime and Punishment
- Security, Defense and Well-being
- Health and Safety
- International Relations
The maximum input document length is 512 words. This means that any text following the first 512 words will be ignored by the classifier.
|Each news topic annotation has a score that represents the probability assigned to the label by the classifier.|
|:Economy_and_Resources||costs, benefits or other financial implications; availability of physical, human or financial resources, and capacity of current systems|
|:Religious_Ethical_and_Cultural||religious or ethical implications; traditions, customs or values in relation to a policy issue|
|:Fairness_Equality_and_Rights||balance or distribution of rights, responsibilities, and resources|
|:Law_and_Justice_System||rights, freedoms, and authority of individuals, corporations, and government|
|:Crime_and_Punishment||effectiveness and implications of laws and their enforcement|
|:Security_Defense_and_Well_being||threats to welfare of the individual, community, or nation; threats and opportunities for the individualâs wealth, happiness, and well-being|
|:Health_and_Safety||health care, sanitation, public safety|
|:Politics||considerations related to politics and politicians, including lobbying, elections, and attempts to sway voters; attitudes and opinions of the general public, including polling and demographics; specific politics aimed at addressing problems|
|:International_Relations||international reputation or foreign policy|
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is: