Results 1 -
3 of
3
A Support Platform for Event Detection using Social Intelligence
"... This paper describes a system designed to support event detection over Twitter. The system operates by querying the data stream with a user-specified set of keywords, filtering out non-English messages, and probabilistically geolocating each message. The user can dynamically set a probability thresh ..."
Abstract
- Add to MetaCart
This paper describes a system designed to support event detection over Twitter. The system operates by querying the data stream with a user-specified set of keywords, filtering out non-English messages, and probabilistically geolocating each message. The user can dynamically set a probability threshold over the geolocation predictions, and also the time interval to present data for. 1
Processing Informal, Romanized Pakistani Text Messages
"... Regardless of language, the standard character set for text messages (SMS) and many other social media platforms is the Roman alphabet. There are romanization conventions for some character sets, but they are used inconsistently in informal text, such as SMS. In this work, we convert informal, roman ..."
Abstract
- Add to MetaCart
Regardless of language, the standard character set for text messages (SMS) and many other social media platforms is the Roman alphabet. There are romanization conventions for some character sets, but they are used inconsistently in informal text, such as SMS. In this work, we convert informal, romanized Urdu messages into the native Arabic script and normalize non-standard SMS language. Doing so prepares the messages for existing downstream processing tools, such as machine translation, which are typically trained on well-formed, native script text. Our model combines information at the word and character levels, allowing it to handle out-of-vocabulary items. Compared with a baseline deterministic approach, our system reduces both word and character error rate by over 50%. 1
Automatically Constructing a Normalisation Dictionary for Microblogs
"... Microblog normalisation methods often utilise complex models and struggle to differentiate between correctly-spelled unknown words and lexical variants of known words. In this paper, we propose a method for constructing a dictionary of lexical variants of known words that facilitates lexical normali ..."
Abstract
- Add to MetaCart
Microblog normalisation methods often utilise complex models and struggle to differentiate between correctly-spelled unknown words and lexical variants of known words. In this paper, we propose a method for constructing a dictionary of lexical variants of known words that facilitates lexical normalisation via simple string substitution (e.g. tomorrow for tmrw). We use context information to generate possible variant and normalisation pairs and then rank these by string similarity. Highlyranked pairs are selected to populate the dictionary. We show that a dictionary-based approach achieves state-of-the-art performance

