Results 1 - 10
of
65
Modeling spread of disease from social interactions
- In Sixth AAAI International Conference on Weblogs and Social Media (ICWSM
, 2012
"... Research in computational epidemiology to date has concentrated on coarse-grained statistical analysis of populations, often synthetic ones. By contrast, this paper focuses on fine-grained modeling of the spread of infectious diseases throughout a large real-world social network. Specifically, we st ..."
Abstract
-
Cited by 38 (10 self)
- Add to MetaCart
Research in computational epidemiology to date has concentrated on coarse-grained statistical analysis of populations, often synthetic ones. By contrast, this paper focuses on fine-grained modeling of the spread of infectious diseases throughout a large real-world social network. Specifically, we study the roles that social ties and interactions between specific individuals play in the progress of a contagion. We focus on public Twitter data, where we find that for every healthrelated message there are more than 1,000 unrelated ones. This class imbalance makes classification particularly challenging. Nonetheless, we present a framework that accurately identifies sick individuals from the content of online communication. Evaluation on a sample of 2.5 million geo-tagged Twitter messages shows that social ties to infected, symptomatic people, as well as the intensity of recent co-location, sharply increase one’s likelihood of contracting the illness in the near future. To our knowledge, this work is the first to model the interplay of social activity, human mobility, and the spread of infectious disease in a large real-world population. Furthermore, we provide the first quantifiable estimates of the characteristics of disease transmission on a large scale without active user participation—a step towards our ability to model and predict the emergence of global epidemics from day-to-day interpersonal interactions.
Predicting disease transmission from geo-tagged micro-blog data
- In Twenty-Sixth AAAI Conference on Artificial Intelligence
, 2012
"... Researchers have begun to mine social network data in order to predict a variety of social, economic, and health related phenomena. While previous work has focused on predicting aggregate properties, such as the prevalence of seasonal influenza in a given country, we consider the task of finegrained ..."
Abstract
-
Cited by 26 (10 self)
- Add to MetaCart
Researchers have begun to mine social network data in order to predict a variety of social, economic, and health related phenomena. While previous work has focused on predicting aggregate properties, such as the prevalence of seasonal influenza in a given country, we consider the task of finegrained prediction of the health of specific people from noisy and incomplete data. We construct a probabilistic model that can predict if and when an individual will fall ill with high precision and good recall on the basis of his social ties and co-locations with other people, as revealed by their Twitter posts. Our model is highly scalable and can be used to predict general dynamic properties of individuals in large realworld social networks. These results provide a foundation for research on fundamental questions of public health, including the identification of non-cooperative disease carriers (“Typhoid Marys”), adaptive vaccination policies, and our understanding of the emergence of global epidemics from day-today interpersonal interactions.
Geographic Dissection of the Twitter Network
"... Geography plays an important role in shaping societal interactions in the offline world. However, as more and more social interactions occur online via social networking sites like Twitter and Facebook, users can interact with others unconstrained by their geolocations, raising the question: does of ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Geography plays an important role in shaping societal interactions in the offline world. However, as more and more social interactions occur online via social networking sites like Twitter and Facebook, users can interact with others unconstrained by their geolocations, raising the question: does offline geography still matter in online social networks? In this paper, we attempt to address this question by dissecting the Twitter social network based on users ’ geolocations and investigating how users ’ geolocation impacts their participation in Twitter, including their connections to others and the information they exchange with them. Our in-depth analysis reveals that geography continues to have a significant impact on user interactions in the Twitter social network. The influence of geography could be potentially explained by the shared national, linguistic, and cultural backgrounds of users from the same geographic neighborhood.
Mining User Mobility Features for Next Place Prediction in Location-based Services
"... Abstract—Mobile location-based services are thriving, providing an unprecedented opportunity to collect fine grained spatiotemporal data about the places users visit. This multi-dimensional source of data offers new possibilities to tackle established research problems on human mobility, but it also ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Mobile location-based services are thriving, providing an unprecedented opportunity to collect fine grained spatiotemporal data about the places users visit. This multi-dimensional source of data offers new possibilities to tackle established research problems on human mobility, but it also opens avenues for the development of novel mobile applications and services. In this work we study the problem of predicting the next venue a mobile user will visit, by exploring the predictive power offered by different facets of user behavior. We first analyze about 35 million check-ins made by about 1 million Foursquare users in over 5 million venues across the globe, spanning a period of five months. We then propose a set of features that aim to capture the factors that may drive users ’ movements. Our features exploit information on transitions between types of places, mobility flows between venues, and spatio-temporal characteristics of user check-in patterns. We further extend our study combining all individual features in two supervised learning models, based on linear regression and M5 model trees, resulting in a higher overall prediction accuracy. We find that the supervised methodology based on the combination of multiple features offers the highest levels of prediction accuracy: M5 model trees are able to rank in the top fifty venues one in two user check-ins, amongst thousands of candidate items in the prediction list. I.
A Random Walk Around the City: New Venue Recommendation in Location-Based Social Networks
"... Abstract—The popularity of location-based social networks available on mobile devices means that large, rich datasets that contain a mixture of behavioral (users visiting venues), social (links between users), and spatial (distances between venues) information are available for mobile location recom ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The popularity of location-based social networks available on mobile devices means that large, rich datasets that contain a mixture of behavioral (users visiting venues), social (links between users), and spatial (distances between venues) information are available for mobile location recommendation systems. However, these datasets greatly differ from those used in other online recommender systems, where users explicitly rate items: it remains unclear as to how they capture user preferences as well as how they can be leveraged for accurate recommendation. This paper seeks to bridge this gap with a three-fold contribution. First, we examine how venue discovery behavior characterizes the large check-in datasets from two different location-based social services, Foursquare and Gowalla: by using large-scale datasets containing both user check-ins and social ties, our analysis reveals that, across 11 cities, between 60% and 80 % of users ’ visits are in venues that were not visited in the previous 30 days. We then show that, by making constraining assumptions about user mobility, state-of-the-art filtering algorithms, including latent space models, do not produce high quality recommendations. Finally, we propose a new model based on personalized random walks over a user-place graph that, by seamlessly combining social network and venue visit frequency data, obtains between 5 and 18 % improvement over other models. Our results pave the way to a new approach for place recommendation in location-based social systems. I.
Far Out: Predicting Long-Term Human Mobility
"... Much work has been done on predicting where is one going to be in the immediate future, typically within the next hour. By contrast, we address the open problem of predicting human mobility far into the future, a scale of months and years. We propose an efficient nonparametric method that extracts s ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Much work has been done on predicting where is one going to be in the immediate future, typically within the next hour. By contrast, we address the open problem of predicting human mobility far into the future, a scale of months and years. We propose an efficient nonparametric method that extracts significant and robust patterns in location data, learns their associations with contextual features (such as day of week), and subsequently leverages this information to predict the most likely location at any given time in the future. The entire process is formulated in a principled way as an eigendecomposition problem. Evaluation on a massive dataset with more than 32,000 days worth of GPS data across 703 diverse subjects shows that our model predicts the correct location with high accuracy, even years into the future. This result opens a number of interesting avenues for future research and applications.
Modeling the Impact of Lifestyle on Health at Scale
"... Research in computational epidemiology to date has concentrated on estimating summary statistics of populations and simulated scenarios of disease outbreaks. Detailed studies have been limited to small domains, as scaling the methods involved poses considerable challenges. By contrast, we model the ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
Research in computational epidemiology to date has concentrated on estimating summary statistics of populations and simulated scenarios of disease outbreaks. Detailed studies have been limited to small domains, as scaling the methods involved poses considerable challenges. By contrast, we model the associations of a large collection of social and environmental factors with the health of particular individuals. Instead of relying on surveys, we apply scalable machine learning techniques to noisy data mined from online social media and infer the health state of any given person in an automated way. We show that the learned patterns can be subsequently leveraged in descriptive as well as predictive fine-grained models of human health. Using a unified statistical model, we quantify the impact of social status, exposure to pollution, interpersonal interactions, and other important lifestyle factors on one’s health. Our model explains more than 54 % of the variance in people’s health (as estimated from their online communication), and predicts the future health status of individuals with 91 % accuracy. Our methods complement traditional studies in life sciences, as they enable us to perform large-scale and timely measurement, inference, and prediction of previously elusive factors that affect our everyday lives.
Location-based reasoning about complex multiagent behavior
- In Journal of Artificial Intelligence Research. AI Access Foundation
, 2011
"... Recent research has shown that surprisingly rich models of human activity can be learned from GPS (positional) data. However, most effort to date has concentrated on modeling single individuals or statistical properties of groups of people. Moreover, prior work focused solely on modeling actual succ ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Recent research has shown that surprisingly rich models of human activity can be learned from GPS (positional) data. However, most effort to date has concentrated on modeling single individuals or statistical properties of groups of people. Moreover, prior work focused solely on modeling actual successful executions (and not failed or attempted executions) of the activities of interest. We, in contrast, take on the task of understanding human interactions, attempted interactions, and intentions from noisy sensor data in a fully relational multi-agent setting. We use a real-world game of capture the flag to illustrate our approach in a well-defined domain that involves many distinct cooperative and competitive joint activities. We model the domain using Markov logic, a statistical-relational language, and learn a theory that jointly denoises the data and infers occurrences of high-level activities, such as a player capturing an enemy. Our unified model combines constraints imposed by the geometry of the game area, the motion model of the players, and by the rules and dynamics of the game in a probabilistically and logically sound fashion. We show that while it may be impossible to directly detect a multi-agent activity due to sensor noise or malfunction, the occurrence of the activity can still be inferred by considering both its impact on the
Beyond “Local”, “Categories ” and “Friends”: Clustering foursquare Users with Latent “Topics”
"... In this work, we use foursquare check-ins to cluster users via topic modeling, a technique commonly used to classify text documents according to latent “themes”. Here, however, the latent variables which group users can be thought of not as themes but rather as factors which drive check in behaviors ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
In this work, we use foursquare check-ins to cluster users via topic modeling, a technique commonly used to classify text documents according to latent “themes”. Here, however, the latent variables which group users can be thought of not as themes but rather as factors which drive check in behaviors, allowing for a qualitative understanding of influences on user check ins. Our model is agnostic of geo-spatial location, time, users ’ friends on social networking sites and the venue categories- we treat the existence of and intricate interactions between these factors as being latent, allowing them to emerge entirely from the data. We instantiate our model on data from New York and the San Francisco Bay Area and find evidence that the model is able to identify groups of people which are of different types (e.g. tourists), communities (e.g. users tightly clustered in space) and interests (e.g. people who enjoy athletics).
Text-Based Twitter User Geolocation Prediction
"... Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gaze ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its author’s location. However, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. We present an integrated geolocation prediction framework and investigate what factors impact on prediction accuracy. First, we evaluate a range of feature selection methods to obtain “location indicative words”. We then evaluate the impact of non-geotagged tweets, language, and user-declared metadata on geolocation prediction. In addition, we evaluate the impact of temporal variance on model generalisation, and discuss how users differ in terms of their geolocatability. We achieve state-of-the-art results for the text-based Twitter user geolocation task, and also provide the most extensive exploration of the task to date. Our findings provide valuable insights into the design of robust, practical text-based geolocation prediction systems. 1.