Results 1 - 10
of
113
You Are What You Tweet: Analyzing Twitter for Public Health
"... Analyzing user messages in social media can measure different population characteristics, including public health measures. For example, recent work has correlated Twitter messages with influenza rates in the United States; but this has largely been the extent of mining Twitter for public health. In ..."
Abstract
-
Cited by 140 (15 self)
- Add to MetaCart
Analyzing user messages in social media can measure different population characteristics, including public health measures. For example, recent work has correlated Twitter messages with influenza rates in the United States; but this has largely been the extent of mining Twitter for public health. In this work, we consider a broader range of public health applications for Twitter. We apply the recently introduced Ailment Topic Aspect Model to over one and a half million health related tweets and discover mentions of over a dozen ailments, including allergies, obesity and insomnia. We introduce extensions to incorporate prior knowledge into this model and apply it to several tasks: tracking illnesses over times (syndromic surveillance), measuring behavioral risk factors, localizing illnesses by geographic region, and analyzing symptoms and medication usage. We show quantitative correlations with public health data and qualitative evaluations of model output. Our results suggest that Twitter has broad applicability for public health research.
Beyond Trending Topics: Real-World Event Identification on Twitter
"... User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this ..."
Abstract
-
Cited by 97 (3 self)
- Add to MetaCart
User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages. Our approach relies on a rich family of aggregate statistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter. 1
Open Domain Event Extraction from Twitter
"... Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events h ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
(Show Context)
Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events has focused largely on newswire text; Twitter’s unique characteristics present new challenges and opportunities for open-domain event extraction. This paper describes TwiCal— the first open-domain event-extraction and categorization system for Twitter. We demonstrate that accurately extracting an open-domain calendar of significant events from Twitter is indeed feasible. In addition, we present a novel approach for discovering important event categories and classifying extracted events based on latent variable models. By leveraging large volumes of unlabeled data, our approach achieves a 14 % increase in maximum F1 over a supervised baseline. A continuously updating demonstration of our system can be viewed at
Predicting the political alignment of twitter users
- In Proceedings of the IEEE Third International Conference on Social Computing (SocialCom
, 2011
"... Abstract—The widespread adoption of social media for po-litical communication creates unprecedented opportunities to monitor the opinions of large numbers of politically active individuals in real time. However, without a way to distinguish between users of opposing political alignments, conflicting ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
(Show Context)
Abstract—The widespread adoption of social media for po-litical communication creates unprecedented opportunities to monitor the opinions of large numbers of politically active individuals in real time. However, without a way to distinguish between users of opposing political alignments, conflicting signals at the individual level may, in the aggregate, obscure partisan differences in opinion that are important to political strategy. In this article we describe several methods for predicting the political alignment of Twitter users based on the content and structure of their political communication in the run-up to the 2010 U.S. midterm elections. Using a data set of 1,000 manually-annotated individuals, we find that a support vector machine (SVM) trained on hashtag metadata outperforms an SVM trained on the full text of users ’ tweets, yielding predictions of political affiliations with 91 % accuracy. Applying latent semantic analysis to the content of users ’ tweets we identify hidden structure in the data strongly associated with political affiliation, but do not find that topic detection improves prediction performance. All of these content-based methods are outperformed by a classifier based on the segregated community structure of political information diffusion networks (95 % accuracy). We conclude with a practical application of this machinery to web-based political advertising, and outline several approaches to public opinion monitoring based on the techniques developed herein. I.
Event Summarization using Tweets
"... Twitter has become exceedingly popular, with hundreds of millions of tweets being posted every day on a wide variety of topics. This has helped make real-time search applications possible with leading search engines routinely displaying relevant tweets in response to user queries. Recent research ha ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
(Show Context)
Twitter has become exceedingly popular, with hundreds of millions of tweets being posted every day on a wide variety of topics. This has helped make real-time search applications possible with leading search engines routinely displaying relevant tweets in response to user queries. Recent research has shown that a considerable fraction of these tweets are about “events”, and the detection of novel events in the tweet-stream has attracted a lot of research interest. However, very little research has focused on properly displaying this real-time information about events. For instance, the leading search engines simply display all tweets matching the queries in reverse chronological order. In this paper we argue that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets. We formalize the problem of summarizing event-tweets and give a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models. In addition, through extensive experiments on real-world data we show that our model significantly outperforms some intuitive and competitive baselines.
Hip and Trendy: Characterizing Emerging Trends on Twitter
"... Twitter, Facebook, and other related systems that we call social awareness streams are rapidly changing the information and communication dynamics of our society. These systems, where hundreds of millions of users share short messages in real time, expose the aggregate interests and attention of glo ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
(Show Context)
Twitter, Facebook, and other related systems that we call social awareness streams are rapidly changing the information and communication dynamics of our society. These systems, where hundreds of millions of users share short messages in real time, expose the aggregate interests and attention of global and local communities. In particular, emerging temporal trends in these systems, especially those related to a single geographic area, are a significant and revealing source of information for, and about, a local community. This study makes two essential contributions for interpreting emerging temporal trends in these information systems. First, based on a large dataset of Twitter messages from one geographic area, we develop a taxonomy of the trends present in the data. Second, we identify important dimensions according to which trends can be categorized, as well as the key distinguishing features of trends that can be derived from their associated messages. We quantitatively examine the computed features for different categories of trends, and establish that significant differences can be detected across categories. Our study advances the understanding of trends on Twitter and other social awareness streams, which will enable powerful applications and activities, including user-driven real-time information services for local communities.
Emerging Topic Detection using Dictionary Learning Shiva Prasad
"... Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source of data from which invaluable information and insights maybe gleaned. Given the vast volume of such social media data being continually generated, one of the challenges is ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
(Show Context)
Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source of data from which invaluable information and insights maybe gleaned. Given the vast volume of such social media data being continually generated, one of the challenges is to automatically tease apart the emerging topics of discussion from the constant background chatter. Such emerging topics can be identified by the appearance of multiple posts on a unique subject matter, which is distinct from previous online discourse. We address the problem of identifying emerging topics through the use of dictionary learning. We propose a two stage approach respectively based on detection and clustering of novel user-generated content. We derive a scalable approach by using the alternating directions method to solve the resulting optimization problems. Empirical results show that our proposed approach is more effective than several baselines in detecting emerging topics in traditional news story and newsgroup data. We also demonstrate the practical application to social media analysis, based on a study on streaming data from Twitter. 1.
Smoothing Techniques for Adaptive Online Language Models: Topic Tracking in Tweet Streams
"... We are interested in the problem of tracking broad topics such as “baseball ” and “fashion ” in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in t ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
We are interested in the problem of tracking broad topics such as “baseball ” and “fashion ” in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in the tweet stream, which serve as proxies for topic labels. Simple perplexity-based classifiers are then applied to filter the tweet stream for topics of interest. Within this framework, we evaluate, both intrinsically and extrinsically, smoothing techniques for integrating “foreground ” models (to capture recency) and “background ” models (to combat sparsity), as well as different techniques for retaining history. Experiments show that unigram language models smoothed using a normalized extension of stupid backoff and a simple queue for history retention performs well on the task.