Results 1 - 10
of
82
I wanted to predict elections with Twitter and all I got was this lousy paper: A balanced survey on election prediction using Twitter data
, 2012
"... Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such kind of studies, electoral prediction is maybe the most attractive, and at this moment there is a growing body of literature on such a topic. This is not only an ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
(Show Context)
Predicting X from Twitter is a popular fad within the Twitter research subculture. It seems both appealing and relatively easy. Among such kind of studies, electoral prediction is maybe the most attractive, and at this moment there is a growing body of literature on such a topic. This is not only an interesting research problem but, above all, it is extremely difficult. However, most of the authors seem to be more interested in claiming positive results than in providing sound and reproducible methods. It is also especially worrisome that many recent papers seem to only acknowledge those studies supporting the idea of Twitter predicting elections, instead of conducting a balanced literature review showing both sides of the matter. After reading many of such papers I have decided to write such a survey myself. Hence, in this paper, every study relevant to the matter of electoral prediction using social media is commented. From this review it can be concluded that the predictive power of Twitter regarding elections has been greatly exaggerated, and that hard research problems still lie ahead.
Who Does What on the Web: A Large-scale Study of Browsing Behavior
"... As the Web has become integrated into daily life, understanding how individuals spend their time online impacts domains ranging from public policy to marketing. It is difficult, however, to measure even simple aspects of browsing behavior via conventional methods—including surveys and site-level ana ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
(Show Context)
As the Web has become integrated into daily life, understanding how individuals spend their time online impacts domains ranging from public policy to marketing. It is difficult, however, to measure even simple aspects of browsing behavior via conventional methods—including surveys and site-level analytics— due to limitations of scale and scope. In part addressing these limitations, large-scale Web panel data are a relatively novel means for investigating patterns of Internet usage. In one of the largest studies of browsing behavior to date, we pair Web histories for 250,000 anonymized individuals with user-level demographics—including age, sex, race, education, and income—to investigate three topics. First, we examine how behavior changes as individuals spend more time online, showing that the heaviest users devote nearly twice as much of their time to social media relative to typical individuals. Second, we revisit the digital divide, finding that the frequency with which individuals turn to the Web for research, news, and healthcare is strongly related to educational background, but not as closely tied to gender and ethnicity. Finally, we demonstrate that browsing histories are a strong signal for inferring user attributes, including ethnicity and household income, a result that may be leveraged to improve ad targeting. 1
“How Old Do You Think I Am?”: A Study of Language and Age in Twitter
"... In this paper we focus on the connection between age and language use, exploring age prediction of Twitter users based on their tweets. We discuss the construction of a fine-grained annotation effort to assign ages and life stages to Twitter users. Using this dataset, we explore age prediction in th ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper we focus on the connection between age and language use, exploring age prediction of Twitter users based on their tweets. We discuss the construction of a fine-grained annotation effort to assign ages and life stages to Twitter users. Using this dataset, we explore age prediction in three different ways: classifying users into age categories, by life stages, and predicting their exact age. We find that an automatic system achieves better performance than humans on these tasks and that both humans and the automatic systems have difficulties predicting the age of older people. Moreover, we present a detailed analysis of variables that change with age. We find strong patterns of change, and that most changes occur at young ages.
What’s in a name? Using first names as features for gender inference in Twitter”, In
- Symposium on Analyzing Microtext,
, 2013
"... Abstract Despite significant work on the problem of inferring a Twitter user's gender from her online content, no systematic investigation has been made into leveraging the most obvious signal of a user's gender: first name. In this paper, we perform a thorough investigation of the link b ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Abstract Despite significant work on the problem of inferring a Twitter user's gender from her online content, no systematic investigation has been made into leveraging the most obvious signal of a user's gender: first name. In this paper, we perform a thorough investigation of the link between gender and first name in English tweets. Our work makes several important contributions. The first and most central contribution is two different strategies for incorporating the user's self-reported name into a gender classifier. We find that this yields a 20% increase in accuracy over a standard baseline classifier. These classifiers are the most accurate gender inference methods for Twitter data developed to date. In order to evaluate our classifiers, we developed a novel way of obtaining gender-labels for Twitter users that does not require analysis of the user's profile or textual content. This is our second contribution. Our approach eliminates the troubling issue of a label being somehow derived from the same text that a classifier will use to infer the label. Finally, we built a large dataset of gender-labeled Twitter users and, crucially, have published this dataset for community use. To our knowledge, this is the first gender-labeled Twitter dataset available for researchers. Our hope is that this will provide a basis for comparison of gender inference methods.
Using twitter to examine smoking behavior and perceptions of emerging tobacco products
- Journal of medical Internet research
, 2013
"... Background: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users ’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Background: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users ’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. Objective: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. Methods: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between
The Tweets They are a-Changin’: Evolution of Twitter Users and Behavior
"... The microblogging site Twitter is now one of the most popular Web destinations. Due to the relative ease of data access, there has been significant research based on Twitter data, ranging from measuring the spread of ideas through society to predicting the behavior of real-world phenomena such as th ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
The microblogging site Twitter is now one of the most popular Web destinations. Due to the relative ease of data access, there has been significant research based on Twitter data, ranging from measuring the spread of ideas through society to predicting the behavior of real-world phenomena such as the stock market. Unfortu-nately, relatively little work has studied the changes in the Twitter ecosystem itself; most research that uses Twitter data is typically based on a small time-window of data, generally ranging from a few weeks to a few months. Twitter is known to have evolved significantly since its founding, and it remains unclear whether prior results still hold, and whether the (often implicit) as-sumptions of proposed systems are still valid. In this paper, we take a first step towards answering these question by focusing on the evolution of Twit-ter’s users and their behavior. Using a set of over 37 bil-lion tweets spanning over seven years, we quantify how the users, their behavior, and the site as a whole have evolved. We observe and quantify a number of trends including the spread of Twitter across the globe, the rise of spam and malicious behavior, the rapid adoption of tweeting conventions, and the shift from desktop to mo-bile usage. Our results can be used to interpret and cal-ibrate previous Twitter work, as well as to make future projections of the site as a whole.
A meta-analysis of state-of-the-art electoral prediction from Twitter data
, 2012
"... NOTICE: This is the author’s version of a work accepted for publication by SAGE Publications. Changes resulting from the publishing process, including peer review, editing, corrections, structural formatting and other quality control mechanisms, may not be reflected in this document. Changes may hav ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
NOTICE: This is the author’s version of a work accepted for publication by SAGE Publications. Changes resulting from the publishing process, including peer review, editing, corrections, structural formatting and other quality control mechanisms, may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was published
Finger On The Pulse: Identifying Deprivation Using Transit Flow Analysis
"... A common metaphor to describe the movement of people within a city is that of blood flowing through the veins of a living organism. We often speak of the ‘pulse of the city’ when referring to flow patterns we observe. Here we extend this metaphor by hypothesising that by monitoring the flow of peopl ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
(Show Context)
A common metaphor to describe the movement of people within a city is that of blood flowing through the veins of a living organism. We often speak of the ‘pulse of the city’ when referring to flow patterns we observe. Here we extend this metaphor by hypothesising that by monitoring the flow of people through a city we can assess the city’s health, as a nurse takes a patient’s heart-rate and blood pressure during a routine health check. Using an automated fare collection dataset of journeys made on the London rail system, we build a classification model that identifies areas of high deprivation as measured by the Indices of Multiple Deprivation, and achieve a precision, sensitivity and specificity of 0.805, 0.733 and 0.810, respectively. We conclude with a discussion of the potential benefits this work provides to city planning, policymaking, and citizen engagement initiatives.
The Twitter of Babel: Mapping World Languages through Microblogging Platforms
, 1212
"... Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characte ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data “proxies ” of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities. 1
A Demographic Analysis of Online Sentiment during Hurricane Irene
"... We examine the response to the recent natural disaster Hurricane Irene on Twitter.com. We collect over 65,000 Twitter messages relating to Hurricane Irene from August 18th to August 31st, 2011, and group them by location and gender. We train a sentiment classifier to categorize messages based on lev ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We examine the response to the recent natural disaster Hurricane Irene on Twitter.com. We collect over 65,000 Twitter messages relating to Hurricane Irene from August 18th to August 31st, 2011, and group them by location and gender. We train a sentiment classifier to categorize messages based on level of concern, and then use this classifier to investigate demographic differences. We report three principal findings: (1) the number of Twitter messages related to Hurricane Irene in directly affected regions peaks around the time the hurricane hits that region; (2) the level of concern in the days leading up to the hurricane’s arrival is dependent on region; and (3) the level of concern is dependent on gender, with females being more likely to express concern than males. Qualitative linguistic variations further support these differences. We conclude that social media analysis provides a viable, real-time complement to traditional survey methods for understanding public perception towards an impending disaster.