Results 1 - 10
of
26
Modeling the Impact of Lifestyle on Health at Scale
"... Research in computational epidemiology to date has concentrated on estimating summary statistics of populations and simulated scenarios of disease outbreaks. Detailed studies have been limited to small domains, as scaling the methods involved poses considerable challenges. By contrast, we model the ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
Research in computational epidemiology to date has concentrated on estimating summary statistics of populations and simulated scenarios of disease outbreaks. Detailed studies have been limited to small domains, as scaling the methods involved poses considerable challenges. By contrast, we model the associations of a large collection of social and environmental factors with the health of particular individuals. Instead of relying on surveys, we apply scalable machine learning techniques to noisy data mined from online social media and infer the health state of any given person in an automated way. We show that the learned patterns can be subsequently leveraged in descriptive as well as predictive fine-grained models of human health. Using a unified statistical model, we quantify the impact of social status, exposure to pollution, interpersonal interactions, and other important lifestyle factors on one’s health. Our model explains more than 54 % of the variance in people’s health (as estimated from their online communication), and predicts the future health status of individuals with 91 % accuracy. Our methods complement traditional studies in life sciences, as they enable us to perform large-scale and timely measurement, inference, and prediction of previously elusive factors that affect our everyday lives.
Carmen: A Twitter Geolocation System with Applications to Public Health
"... Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structu ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.
Estimating county health statistics with twitter
- In CHI
, 2014
"... Understanding the relationships among environment, behav-ior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, lit-tle work has been done to determine if other health ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Understanding the relationships among environment, behav-ior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, lit-tle work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insur-ance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a signifi-cant correlation with 6 of the 27 health statistics. When com-pared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statis-tics, suggesting that this new methodology can complement existing approaches.
Towards Understanding Global Spread of Disease from Everyday Interpersonal Interactions
"... Monitoring and forecast of global spread of infectious diseases is difficult, mainly due to lack of finegrained and timely data. Previous work in computational epidemiology has shown that mining data from the web can improve the predictability of high-level aggregate patterns of epidemics. By contra ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Monitoring and forecast of global spread of infectious diseases is difficult, mainly due to lack of finegrained and timely data. Previous work in computational epidemiology has shown that mining data from the web can improve the predictability of high-level aggregate patterns of epidemics. By contrast, this paper explores how individuals contribute to the global spread of disease. We consider the important task of predicting the prevalence of flulike illness in a given city based on interpersonal interactions of the city’s residents with the outside world. We use the geo-tagged status updates of traveling Twitter users to infer properties of the flow of individuals between cities. While previous research considered only the raw volume of passengers, we estimate a number of latent variables, including the number of sick (symptomatic) travelers and the number of sick individuals to whom each traveler was exposed. We show that AI techniques provide insights into the mechanisms of disease spread and significantly improve predictability of future flu outbreaks. Our experiments involve over 51,000 individuals traveling between 75 cities prior and during a severe ongoing flu epidemic (October 2012- January 2013). Our model leverages the text and interpersonal interactions recorded in over 6.5 million online status updates without any active user participation, enabling scalable public health applications.
nEmesis: Which Restaurants Should You Avoid Today?
"... Computational approaches to health monitoring and epidemiology continue to evolve rapidly. We present an end-to-end system, nEmesis, that automatically identifies restaurants posing public health risks. Leveraging a language model of Twitter users ’ online communication, nEmesis finds individuals wh ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Computational approaches to health monitoring and epidemiology continue to evolve rapidly. We present an end-to-end system, nEmesis, that automatically identifies restaurants posing public health risks. Leveraging a language model of Twitter users ’ online communication, nEmesis finds individuals who are likely suffering from a foodborne illness. People’s visits to restaurants are modeled by matching GPS data embedded in the messages with restaurant addresses. As a result, we can assign each venue a “health score ” based on the proportion of customers that fell ill shortly after visiting it. Statistical analysis reveals that our inferred health score correlates (r = 0.30) with the official inspection data from the Department of Health and Mental Hygiene (DOHMH). We investigate the joint associations of multiple factors mined from online data with the DOHMH violation scores and find that over 23 % of variance can be explained by our factors. We demonstrate that readily accessible online data can be used to detect cases of foodborne illness in a timely manner. This approach offers an inexpensive way to enhance current methods to monitor food safety (e.g., adaptive inspections) and identify potentially problematic venues in near-real time.
Crowdphysics: Planned and Opportunistic Crowdsourcing for Physical Tasks
"... Research on human computation and crowdsourcing has concentrated on tasks that can be accomplished remotely over the Internet. We introduce a general class of problems we call crowdphysics (CP)—crowdsourcing tasks that require people to collaborate and synchronize both in time and physical space. As ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Research on human computation and crowdsourcing has concentrated on tasks that can be accomplished remotely over the Internet. We introduce a general class of problems we call crowdphysics (CP)—crowdsourcing tasks that require people to collaborate and synchronize both in time and physical space. As an illustrative example, we focus on a crowdpowered delivery service—a specific CP instance where people go about their daily lives, but have the opportunity to carry packages to be delivered to specific locations or individuals. Each package is handed off from person to person based on overlaps in time and space until it is delivered. We formulate CP tasks by reduction to a graph-planning problem, and analyze the performance using a large sample of geotagged tweets as a proxy for people’s location. We show that packages can be delivered with remarkable speed and coverage. These results hold for the case when we know people’s future locations and also when routing without global knowledge, making only local greedy decisions. To our knowledge, this is the first empirical evidence that dynamic networks of mobile individuals are highly navigable.
Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models
"... Abstract—Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemic ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. We propose a temporal topic model to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approach helps fill the gap between phenomenolog-ical methods for disease surveillance and epidemiological models. We validate this approach by modeling the flu using Twitter in multiple countries of South America. We demonstrate that our model can consistently outperform plain vocabulary assessment in flu case-count predictions, and at the same time get better flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models. I.
Using Matched Samples to Estimate the Effects of Exercise on Mental Health from Twitter
"... Recent work has demonstrated the value of social me-dia monitoring for health surveillance (e.g., tracking influenza or depression rates). It is an open question whether such data can be used to make causal inferences (e.g., determining which activities lead to increased de-pression rates). Even in ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Recent work has demonstrated the value of social me-dia monitoring for health surveillance (e.g., tracking influenza or depression rates). It is an open question whether such data can be used to make causal inferences (e.g., determining which activities lead to increased de-pression rates). Even in traditional, restricted domains, estimating causal effects from observational data is highly susceptible to confounding bias. In this work, we estimate the effect of exercise on mental health from Twitter, relying on statistical matching methods to re-duce confounding bias. We train a text classifier to esti-mate the volume of a user’s tweets expressing anxiety, depression, or anger, then compare two groups: those who exercise regularly (identified by their use of physi-cal activity trackers like Nike+), and a matched control group. We find that those who exercise regularly have significantly fewer tweets expressing depression or anx-iety; there is no significant difference in rates of tweets expressing anger. We additionally perform a sensitiv-ity analysis to investigate how the many experimental design choices in such a study impact the final conclu-sions, including the quality of the classifier and the con-struction of the control group. 1
Modeling Fine-Grained Dynamics of Mood at Scale ⇤
"... Mental health affects all aspects of people’s lives, yet it remains difficult to obtain accurate data about influential factors. This work investigates quantifying, infering, and predicting—via social media data—the day-to-day mental state of individuals. We develop a statistical model of the affect ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Mental health affects all aspects of people’s lives, yet it remains difficult to obtain accurate data about influential factors. This work investigates quantifying, infering, and predicting—via social media data—the day-to-day mental state of individuals. We develop a statistical model of the affective state (mood) of specific individuals with up to hourly temporal resolution. This model enables us to quantify, in a unified way, aggregate mood trends, as well as patterns specific to individuals and groups of friends. It finds key features of mood variation over time and allows us to decompose a person’s emotional state into a weighted sum of contributing factors—shedding new light on how mood affects, and is affected by environment. We then show that individuals ’ mood can be accurately predicted days into the future based on online behavior. 1.