Tweet processing with GATE Cloud
With GATE Cloud, anyone can access the same core tweet processing technologies used in the latest international research by the GATE team at the University of Sheffield. This page gives an overview of tools made available through the cloud, solving a range of problems.
Finding and storing tweets
Twitter provides a powerful method for discovering and downloading tweets in real time, but using this requires code development to access the Twitter API. We have simplified this process with the Twitter Collector service.
The Twitter Collector can fetch tweets that match selected keywords, that are posted by selected users, or that are tagged as being from specific places, in real time. The collector can effortlessly fetch up to 60 tweets a second – the maximum that Twitter provides.
Basic processing is carried out using GATE as tweets are gathered, producing statistics that help you understand exactly what you’re getting. Once collected, tweets can be placed in bundles for direct processing in GATE Cloud, in your own Amazon S3 bucket, or downloaded directly for easy access.
The Twitter Collector is not currently available, we are awaiting clarification from Twitter on the future of the APIs on which it depends.
Finding named-entities
Named-entities are instances in the text of a tweet of a person, location or organisation. GATE Cloud can be combined with simple data mining to answer questions such as "Who is tweeting about whom?" and "Which brands are getting the most attention with which endorsers?".
We offer a experimentally validated process to automatically find named entities in tweets. The pipeline also does all of the following:
- Splits tweets into individual words, sentences and parts of speech (such as noun, adjective and so on)
- Normalises abbreviations and shortened word forms, so they don’t interfere with data mining
- Tags hashtags, mentions, URLs and emoticons
Learn attitudes towards climate change
At the University of Sheffield, we analysed earth hour on Twitter (https://www.decarbonet.eu/eh16/) . Our Earth Hour pipeline, developed for the Decarbonet European Commission project, discovers attitudes towards climate change, and climate change interventions.
Tweets are classified according to their sentiment, or a feeling towards a specific target by a specific person, and can include positive, negative, neutral attitudes as well as fine-grained emotions such as fear, anger and joy.
Terms related to climate change are also connected to Linked Open Data ontologies using ClimaTerm. Linguistic features such as conditionals, questions, directives and so on are also included.