Results 1 - 10
of
162
Topic sentiment mixture: modeling facets and opinions in weblogs
- In Proc. of the 16th Int. Conference on World Wide Web
, 2007
"... In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtop ..."
Abstract
-
Cited by 181 (11 self)
- Add to MetaCart
(Show Context)
In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their associated sentiments. It could also provide general sentiment models that are applicable to any ad hoc topics. With a specifically designed HMM structure, the sentiment models and topic models estimated with TSM can be utilized to extract topic life cycles and sentiment dynamics. Empirical experiments on different Weblog datasets show that this approach is effective for modeling the topic facets and sentiments and extracting their dynamics from Weblog collections. The TSM model is quite general; it can be applied to any text collections with a mixture of topics and sentiments, thus has many potential applications, such as search result summarization, opinion tracking, and user behavior prediction.
A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs
, 2006
"... Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal ..."
Abstract
-
Cited by 100 (9 self)
- Add to MetaCart
Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal theme patterns simultaneously. The proposed model discovers spatiotemporal theme patterns by (1) extracting common themes from weblogs; (2) generating theme life cycles for each given location; and (3) generating theme snapshots for each given time period. Evolution of patterns can be discovered by comparative analysis of theme life cycles and theme snapshots. Experiments on three different data sets show that the proposed approach can discover interesting spatiotemporal theme patterns effectively. The proposed probabilistic model is general and can be used for spatiotemporal text mining on any domain with time and location information.
Automatic labeling of multinomial topic models
- in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, 2007
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 90 (2 self)
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
MONIC -- Modeling and monitoring cluster transitions
- KDD'06
, 2006
"... There is much recent work on detecting and tracking change in clusters, often based on the study of the spatiotemporal properties of a cluster. For the many applications where cluster change is relevant, among them customer relationship management, fraud detection and marketing, it is also necessary ..."
Abstract
-
Cited by 66 (9 self)
- Add to MetaCart
There is much recent work on detecting and tracking change in clusters, often based on the study of the spatiotemporal properties of a cluster. For the many applications where cluster change is relevant, among them customer relationship management, fraud detection and marketing, it is also necessary to provide insights about the nature of cluster change: Is a cluster corresponding to a group of customers simply disappearing or are its members migrating to other clusters? Is a new emerging cluster reflecting a new target group of customers or does it rather consist of existing customers whose preferences shift? To answer such questions, we propose the framework MONIC for modeling and tracking of cluster transitions. Our cluster transition model encompasses changes that involve more than one cluster, thus allowing for insights on cluster change in the whole clustering. Our transition tracking mechanism is not based on the topological properties of clusters, which are only available for some types of clustering, but on the contents of the underlying data stream. We present our first results on monitoring cluster transitions over the ACM digital library.
FacetNet: A Framework for Analyzing Communities and Their Evolutions in Dynamic Networks
"... We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals’ roles and s ..."
Abstract
-
Cited by 64 (14 self)
- Add to MetaCart
We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals’ roles and social status in the network as well as changes to individuals ’ research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are first detected for each time slice, and then compared to determine correspondences. We argue that this approach is inappropriate in applications with noisy data. In this paper, we propose FacetNet for analyzing communities and their evolutions through a robust unified
Online LDA: Adaptive Topic Model for Mining Text Streams with Application on Topic Detection and
- Tracking, Proceedings of IEEE International Conference on Data Mining (ICDM08
, 2008
"... This paper presents Online Topic Model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams and their changes over time. Our approach allows the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA) model, to wo ..."
Abstract
-
Cited by 57 (2 self)
- Add to MetaCart
(Show Context)
This paper presents Online Topic Model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams and their changes over time. Our approach allows the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA) model, to work in an online fashion such that it incrementally builds an up-to-date model (mixture of topics per document and mixture of words per topic) when a new document (or a set of documents) appears. A solution based on the Empirical Bayes method is proposed. The idea is to incrementally update the current model according to the information inferred from the new stream of data with no need to access previous data. The dynamics of the proposed approach also provide an efficient mean to track the topics over time and detect the emerging topics in real time. Our method is evaluated both qualitatively and quantitatively using benchmark datasets. In our experiments, the OLDA has discovered interesting patterns by just analyzing a fraction of data at a time. Our tests also prove the ability of OLDA to align the topics across the epochs with which the evolution of the topics over time is captured. The OLDA is also comparable to, and sometimes better than, the original LDA in predicting the likelihood of unseen documents. 1
Mining correlated bursty topic patterns from coordinated text streams
- ACM SIGKDD conference (KDD
, 2007
"... Previous work on text mining has almost exclusively focused on a single stream. However, we often have available mul-tiple text streams indexed by the same set of time points (called coordinated text streams), which oer new opportu-nities for text mining. For example, when a major event happens, all ..."
Abstract
-
Cited by 54 (6 self)
- Add to MetaCart
(Show Context)
Previous work on text mining has almost exclusively focused on a single stream. However, we often have available mul-tiple text streams indexed by the same set of time points (called coordinated text streams), which oer new opportu-nities for text mining. For example, when a major event happens, all the news articles published by dierent agen-cies in dierent languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining corre-lated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we dene and study this novel text mining problem. We propose a general probabilistic algorithm which can eectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely dierent vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can eectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events cov-ered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the ma-jor research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period.
Detecting communities and their evolutions in dynamic social networks -- a Bayesian approach
- MACH LEARN
, 2010
"... ..."
Connecting the Dots Between News Articles
"... The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today’s society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amount ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
(Show Context)
The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today’s society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this paper, we investigate methods for automatically connecting the dots – providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate. We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm’s effectiveness in helping users understanding the news.