Results 1 -
4 of
4
B (2014) Aspect extraction with automated prior knowledge learning
- In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers
"... Abstract Aspect extraction is an important task in sentiment analysis. Topic modeling is a popular method for the task. However, unsupervised topic models often generate incoherent aspects. To address the issue, several knowledge-based models have been proposed to incorporate prior knowledge provid ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract Aspect extraction is an important task in sentiment analysis. Topic modeling is a popular method for the task. However, unsupervised topic models often generate incoherent aspects. To address the issue, several knowledge-based models have been proposed to incorporate prior knowledge provided by the user to guide modeling. In this paper, we take a major step forward and show that in the big data era, without any user input, it is possible to learn prior knowledge automatically from a large amount of review data available on the Web. Such knowledge can then be used by a topic model to discover more coherent aspects. There are two key challenges: (1) learning quality knowledge from reviews of diverse domains, and (2) making the model fault-tolerant to handle possibly wrong knowledge. A novel approach is proposed to solve these problems. Experimental results using reviews from 36 domains show that the proposed approach achieves significant improvements over state-of-the-art baselines.
Mining topics in documents: standing on the shoulders of big data
- In Proceedings of the 20th ACM SIGKDD international
, 2014
"... ABSTRACT Topic modeling has been widely used to mine topics from documents. However, a key weakness of topic modeling is that it needs a large amount of data (e.g., thousands of documents) to provide reliable statistics to generate coherent topics. However, in practice, many document collections do ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Topic modeling has been widely used to mine topics from documents. However, a key weakness of topic modeling is that it needs a large amount of data (e.g., thousands of documents) to provide reliable statistics to generate coherent topics. However, in practice, many document collections do not have so many documents. Given a small number of documents, the classic topic model LDA generates very poor topics. Even with a large volume of data, unsupervised learning of topic models can still produce unsatisfactory results. In recently years, knowledge-based topic models have been proposed, which ask human users to provide some prior domain knowledge to guide the model to produce better topics. Our research takes a radically different approach. We propose to learn as humans do, i.e., retaining the results learned in the past and using them to help future learning. When faced with a new task, we first mine some reliable (prior) knowledge from the past learning/modeling results and then use it to guide the model inference to generate more coherent topics. This approach is possible because of the big data readily available on the Web. The proposed algorithm mines two forms of knowledge: must-link (meaning that two words should be in the same topic) and cannot-link (meaning that two words should not be in the same topic). It also deals with two problems of the automatically mined knowledge, i.e., wrong knowledge and knowledge transitivity. Experimental results using review documents from 100 product domains show that the proposed approach makes dramatic improvements over state-of-the-art baselines.
Latent Aspect Mining via Exploring Sparsity and Intrinsic Information∗
"... We investigate latent aspect mining problem that aims at automatically discovering aspect information from a collec-tion of review texts in a domain in an unsupervised manner. One goal is to discover a set of aspects which are previously unknown for the domain, and predict the user’s ratings on each ..."
Abstract
- Add to MetaCart
(Show Context)
We investigate latent aspect mining problem that aims at automatically discovering aspect information from a collec-tion of review texts in a domain in an unsupervised manner. One goal is to discover a set of aspects which are previously unknown for the domain, and predict the user’s ratings on each aspect for each review. Another goal is to detect key terms for each aspect. Existing works on predicting aspec-t ratings fail to handle the aspect sparsity problem in the review texts leading to unreliable prediction. We propose a new generative model to tackle the latent aspect mining problem in an unsupervised manner. By considering the user and item side information of review texts, we introduce two latent variables, namely, user intrinsic aspect interest and item intrinsic aspect quality facilitating better modeling of aspect generation leading to improvement on the accuracy and reliability of predicted aspect ratings. Furthermore, we provide an analytical investigation on the Maximum A Pos-terior (MAP) optimization problem used in our proposed model and develop a new block coordinate gradient descent algorithm to efficiently solve the optimization with closed-form updating formulas. We also study its convergence anal-ysis. Experimental results on the two real-world product review corpora demonstrate that our proposed model out-performs existing state-of-the-art models.
Clustering Aspect-related Phrases by Leveraging Sentiment Distribution Consistency
"... Clustering aspect-related phrases in terms of product’s property is a precursor pro-cess to aspect-level sentiment analysis which is a central task in sentiment analy-sis. Most of existing methods for address-ing this problem are context-based models which assume that domain synonymous phrases share ..."
Abstract
- Add to MetaCart
(Show Context)
Clustering aspect-related phrases in terms of product’s property is a precursor pro-cess to aspect-level sentiment analysis which is a central task in sentiment analy-sis. Most of existing methods for address-ing this problem are context-based models which assume that domain synonymous phrases share similar co-occurrence con-texts. In this paper, we explore a novel idea, sentiment distribution consistency, which states that different phrases (e.g. “price”, “money”, “worth”, and “cost”) of the same aspect tend to have consistent sentiment distribution. Through formal-izing sentiment distribution consistency as soft constraint, we propose a novel unsu-pervised model in the framework of Poste-rior Regularization (PR) to cluster aspect-related phrases. Experiments demonstrate that our approach outperforms baselines remarkably. 1