French Named Entity Recognizer for Tweets

A named entity recognition pipeline that identifies basic entity types, such as Person, Location and Organization expressions. It works on tweets in the French language.

French TwitIE is a named entity recognition pipeline specially tuned for use with French Twitter data. It performs

  • tokenisation, sentence splitting and part-of-speech tagging;
  • normalisation of abbreviations and shortened word forms frequently found in tweets;
  • tagging of Twitter-specific entities such as hashtags and @mentions, as well as URLs and emoticons, and
  • general named entity recognition, to identify basic entity types such as Person, Location and Organization.

Default annotations
:Person Standard named entity types
:Emoticon Emoticons such as :-)
:Hashtag Hashtags, including the leading # character
:URL URL mentions
:UserID The username part of @user mentions, not including the leading @ sign
Additional annotations available if selected
:Token The individual tokens of the text, with "category" feature for POS
:SpaceToken The spaces between tokens
:Sentence Sentences detected by the sentence splitter
1,200 free requests / day
Larger batches GBP0.80 / CPU hour

Use this pipeline

Single documents

You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.

The API endpoint for this pipeline is:

Create API Key

Batches of documents

You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.

Reserve a job