Results 11 - 20
of
119
Learning Cross-modality Similarity for Multinomial Data
"... Many applications involve multiple-modalities such as text and images that describe the problem of interest. In order to leverage the information present in all the modalities, one must model the relationships between them. While some techniques have been proposed to tackle this problem, they either ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
Many applications involve multiple-modalities such as text and images that describe the problem of interest. In order to leverage the information present in all the modalities, one must model the relationships between them. While some techniques have been proposed to tackle this problem, they either are restricted to words describing visual objects only, or require full correspondences between the different modalities. As a consequence, they are unable to tackle more realistic scenarios where a narrative text is only loosely related to an image, and where only a few image-text pairs are available. In this paper, we propose a model that addresses both these challenges. Our model can be seen as a Markov random field of topic models, which connects the documents based on their similarity. As a consequence, the topics learned with our model are shared across connected documents, thus encoding the relations between different modalities. We demonstrate the effectiveness of our model for image retrieval from a loosely related text. 1.
Social Influence Locality for Modeling Retweeting Behaviors
"... We study an interesting phenomenon of social influence locality in a large microblogging network, which suggests that users ’ behaviors are mainly influenced by close friends in their ego networks. We provide a formal definition for the notion of social influence locality and develop two instantiati ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
We study an interesting phenomenon of social influence locality in a large microblogging network, which suggests that users ’ behaviors are mainly influenced by close friends in their ego networks. We provide a formal definition for the notion of social influence locality and develop two instantiation functions based on pairwise influence and structural diversity. The defined influence locality functions have strong predictive power. Without any additional features, we can obtain a F1-score of 71.65 % for predicting users ’ retweet behaviors by training a logistic regression classifier based on the defined functions. Our analysis also reveals several intriguing discoveries. For example, though the probability of a user retweeting a microblog is positively correlated with the number of friends who have retweeted the microblog, it is surprisingly negatively correlated with the number of connected circles that are formed by those friends. 1
Probabilistic Mining of Socio-Geographic Routines from Mobile Phone Data
"... There is relatively little work on the investigation of large-scale human data in terms of multimodality for human activity discovery. In this paper we suggest that human interaction data, or human proximity, obtained by mobile phone Bluetooth sensor data, can be integrated with human location data, ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
(Show Context)
There is relatively little work on the investigation of large-scale human data in terms of multimodality for human activity discovery. In this paper we suggest that human interaction data, or human proximity, obtained by mobile phone Bluetooth sensor data, can be integrated with human location data, obtained by mobile cell tower connections, to mine meaningful details about human activities from large and noisy datasets. We propose a model, called bag of multimodal behavior, that integrates the modeling of variations of location over multiple time-scales, and the modeling of interaction types from proximity. Our representation is simple yet robust to characterize real-life human behavior sensed from mobile phones, which are devices capable of capturing large-scale data known to be noisy and incomplete. We use an unsupervised approach, based on probabilistic topic models, to discover latent human activities in terms of the joint interaction and location behaviors of 97 individuals over the course of approximately a 10 month period using data from MIT’s Reality Mining project. Some of the human activities discovered with our multimodal data representation include “going out from 7pm-midnight alone ” and “working from 11am-5pm with 3-5 other people”, further finding that this activity dominantly occurs on specific days of the week. Our methodology also finds dominant work patterns occurring on other days of the week. We further demonstrate the feasibility of the topic modeling framework to discover human routines to predict missing multimodal phone data on specific times of the day. Copyright (c) 2008 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
Entity disambiguation with hierarchical topic models
- In the SIGKDD international conference on Knowledge discovery and data mining
, 2011
"... Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learn ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
Disambiguating entity references by annotating them with unique ids from a catalog is a critical step in the enrichment of unstructured content. In this paper, we show that topic models, such as Latent Dirichlet Allocation (LDA) and its hierarchical variants, form a natural class of models for learning accurate entity disambiguation models from crowd-sourced knowledge bases such as Wikipedia. Our main contribution is a semi-supervised hierarchical model called Wikipedia-based Pachinko Allocation Model (WPAM) that exploits: (1) All words in the Wikipedia corpus to learn word-entity associations (unlike existing approaches that only use words in a small fixed window around annotated entity references in Wikipedia pages), (2) Wikipedia annotations to appropriately bias the assignment of entity labels to annotated (and co-occurring unannotated) words during model learning, and (3) Wikipedia’s category hierarchy to capture co-occurrence patterns among entities. We also propose a scheme for pruning spurious nodes from Wikipedia’s crowd-sourced category hierarchy. In our experiments with multiple real-life datasets, we show that WPAM outperforms state-of-the-art baselines by as much as 16 % in terms of disambiguation accuracy.
What’s Worthy of Comment? Content and Comment Volume in Political Blogs ∗
"... In this paper we aim to model the relationship between the text of a political blog post and the comment volume—that is, the total amount of response—that a post will receive. We seek to accurately identify which posts will attract a high-volume response, and also to gain insight about the community ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
In this paper we aim to model the relationship between the text of a political blog post and the comment volume—that is, the total amount of response—that a post will receive. We seek to accurately identify which posts will attract a high-volume response, and also to gain insight about the community of readers and their interests. We design and evaluate variations on a latentvariable topic model that links text to comment volume.
Mining contentions from discussions and debates
- In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12, 841–849
"... Social media has become a major source of information for many applications. Numerous techniques have been proposed to analyze network structures and text contents. In this paper, we focus on fine-grained mining of contentions in discussion/debate forums. Contentions are perhaps the most important f ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
Social media has become a major source of information for many applications. Numerous techniques have been proposed to analyze network structures and text contents. In this paper, we focus on fine-grained mining of contentions in discussion/debate forums. Contentions are perhaps the most important feature of forums that discuss social, political and religious issues. Our goal is to discover contention and agreement indicator expressions, and contention points or topics both at the discussion collection level and also at each individual post level. To the best of our knowledge, limited work has been done on such detailed analysis. This paper proposes three models to solve the problem, which not only model both contention/agreement expressions and discussion topics, but also, more importantly, model the intrinsic nature of discussions/debates, i.e., interactions among discussants or debaters and topic sharing among posts through quoting and replying relations. Evaluation results using real-life discussion/debate posts from several domains demonstrate the effectiveness of the proposed models.
Mining group nonverbal conversational patterns using probabilistic topic models
- IEEE TRANS. MULTIMED
, 2010
"... The automatic discovery of group conversational behavior is a relevant problem in social computing. In this paper, we present an approach to address this problem by defining a novel group descriptor called bag of group-nonverbal-patterns (NVPs) defined on brief observations of group interaction, an ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
(Show Context)
The automatic discovery of group conversational behavior is a relevant problem in social computing. In this paper, we present an approach to address this problem by defining a novel group descriptor called bag of group-nonverbal-patterns (NVPs) defined on brief observations of group interaction, and by using principled probabilistic topic models to discover topics. The proposed bag of group NVPs allows fusion of individual cues and facilitates the eventual comparison of groups of varying sizes. The use of topic models helps to cluster group interactions and to quantify how different they are from each other in a formal probabilistic sense. Results of behavioral topics discovered on the Augmented Multi-Party Interaction (AMI) meeting corpus are shown to be meaningful using human annotation with multiple observers. Our method facilitates “group behavior-based” retrieval of group conversational segments without the need of any previous labeling.
Reducing the sampling complexity of topic models.
- In 20th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining,
, 2014
"... ABSTRACT Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scal ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics k d in the document. For large document collections and in structured hierarchical models k d k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP At its core is the idea that dense, slowly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker's alias method.
H.: An unsupervised approach to modeling personalized contexts of mobile users
- In: IEEE International Conference on Data Mining (ICDM
, 2010
"... Abstract—Mobile context modeling is a process of recogniz-ing and reasoning about contexts and situations in a mobile environment, which is critical for the success of context-aware mobile services. While there are prior work on mobile context modeling, the use of unsupervised learning techniques fo ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
(Show Context)
Abstract—Mobile context modeling is a process of recogniz-ing and reasoning about contexts and situations in a mobile environment, which is critical for the success of context-aware mobile services. While there are prior work on mobile context modeling, the use of unsupervised learning techniques for mobile context modeling is still under-explored. Indeed, unsupervised techniques have the ability to learn personalized contexts which are difficult to be predefined. To that end, in this paper, we propose an unsupervised approach to modeling personalized contexts of mobile users. Along this line, we first segment the raw context data sequences of mobile users into context sessions where a context session contains a group of adjacent context records which are mutually similar and usually reflect the similar contexts. Then, we exploit topic models to learn personalized contexts in the form of probabilistic distributions of raw context data from the context sessions. Finally, experimental results on real-world data show that the proposed approach is efficient and effective for mining personalized contexts of mobile users. Keywords-mobile context modeling; unsupervised approach; I.
CQARank: Jointly Model Topics and Expertise in Community Question Answering
, 2013
"... Community Question Answering (CQA) websites, where peo-ple share expertise on open platforms, have become large reposi-tories of valuable knowledge. To bring the best value out of these knowledge repositories, it is critically important for CQA services to know how to find the right experts, retriev ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Community Question Answering (CQA) websites, where peo-ple share expertise on open platforms, have become large reposi-tories of valuable knowledge. To bring the best value out of these knowledge repositories, it is critically important for CQA services to know how to find the right experts, retrieve archived similar questions and recommend best answers to new questions. To tackle