Results 1 - 10
of
61
Opinion Mining and Sentiment Analysis
"... An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, active ..."
Abstract
-
Cited by 149 (3 self)
- Add to MetaCart
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include materialon summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided. 1
Opinion spam and analysis
- In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM
, 2008
"... Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarizat ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques. An important issue that has been neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in the context of product reviews, which are opinion rich and are widely used by consumers and product manufacturers. In the past two years, several startup companies also appeared which aggregate opinions from product reviews. It is thus high time to study spam in reviews. To the best of our knowledge, there is still no published study on this topic, although Web spam and email spam have been investigated extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million reviewers from amazon.com, we show that opinion spam in reviews is widespread. This paper analyzes such spam activities and presents some novel techniques to detect them.
Identifying comparative sentences in text documents
- In Proc. of the 29th SIGIR
, 2006
"... This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the sub ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the subjective opinion of the author. An important application area of sentiment/opinion identification is business intelligence as a product manufacturer always wants to know consumers ’ opinions on its products. Comparisons on the other hand can be subjective or objective. Furthermore, a comparison is not concerned with an object in isolation. Instead, it compares the object with others. An example opinion sentence is “the sound quality of CD player X is poor”. An example comparative sentence is “the sound quality of CD player X is not as good as that of CD player Y”. Clearly, these two sentences give different information. Their language constructs are quite different too. Identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation, which may even be more important than opinions on each individual object. This paper proposes to study the comparative sentence identification problem. It first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents. Experiment results using three types of documents, news articles, consumer reviews of products, and Internet forum postings, show a precision of 79% and recall of 81%. More detailed results are given in the paper.
Sentiment analysis and subjectivity
- Handbook of Natural Language Processing, Second Edition. Taylor and Francis Group, Boca
, 2010
"... Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, appraisals or feelings toward entities, eve ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, appraisals or feelings toward entities, events and their properties. The concept of opinion is very broad. In this chapter, we only focus on opinion expressions that convey people’s positive or negative sentiments. Much of the existing research on textual information processing has been focused on mining and retrieval of factual information, e.g., information retrieval, Web search, text classification, text clustering and many other text mining and natural language processing tasks. Little work had been done on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others ’ opinions. This is not only true for individuals but also true for organizations. One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web. Before the Web, when an individual needed to make a decision, he/she typically asked for opinions from friends and families. When an organization wanted to find the opinions or sentiments of the general public about its products and services, it conducted opinion polls, surveys, and focus groups. However, with the Web, especially with the explosive growth of the usergenerated
Cross-domain sentiment classification via spectral feature alignment
- In WWW
, 2010
"... Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of users publishing sentiment data (e.g., reviews, blogs). Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling wo ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of users publishing sentiment data (e.g., reviews, blogs). Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling work can be time-consuming and expensive. Meanwhile, users often use some different words when they express sentiment in different domains. If we directly apply a classifier trained in one domain to other domains, the performance will be very low due to the differences between these domains. In this work, we develop a general solution to sentiment classification when we do not have any labels in a target domain but have some labeled data in a different domain, regarded as source domain. In this cross-domain sentiment classification setting, to bridge the gap between the domains, we propose a spectral feature
Top 10 algorithms in data mining
, 2007
"... Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining a ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
Flickr: Who is Looking
- In WI ’07: Proc. of the Intl. Conf. on Web Intelligence
, 2007
"... This article presents a characterization of user behavior on Flickr, a popular on-line photo sharing service that allows users to store, search, sort and share their photos. Based on a sub-set of photos being uploaded during a 10 day window, we track the interest of users in those photos over a peri ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This article presents a characterization of user behavior on Flickr, a popular on-line photo sharing service that allows users to store, search, sort and share their photos. Based on a sub-set of photos being uploaded during a 10 day window, we track the interest of users in those photos over a period of 50 days. In particular we investigate the user behavior on temporal, social, and spatial dimensions. Results show that the users are able to discover new photos within hours after being uploaded and that 50 % of the photo views are generated within the first two days. The social networking behavior of users, and photo pooling are identified as the two major indicators related to a photo’s popularity. Finally we show that the geographic distribution is more focussed around a geographic location for the infrequently viewed photos, than for the photos that attract a large number of views. 1.
Blogosphere: Research Issues, Tools, and Applications
"... Weblogs, or Blogs, have facilitated people to express their thoughts, voice their opinions, and share their experiences and ideas. Individuals experience a sense of community, a feeling of belonging, a bonding that members matter to one another and their niche needs will be met through online intera ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Weblogs, or Blogs, have facilitated people to express their thoughts, voice their opinions, and share their experiences and ideas. Individuals experience a sense of community, a feeling of belonging, a bonding that members matter to one another and their niche needs will be met through online interactions. Its open standards and low barrier to publication have transformed information consumers to producers. This has created a plethora of open-source intelligence, or “collective wisdom ” that acts as the storehouse of overwhelming amounts of knowledge about the members, their environment and the symbiosis between them. Nonetheless, vast amounts of this knowledge still remain to be discovered and exploited in its suitable way. In this paper, we introduce various state-of-the-art research issues, review some key elements of research such as tools and methodologies in Blogosphere, and present a case study of identifying the influential bloggers in a community to exemplify the integration of some major aspects discussed in this paper. Towards the end, we also compare and contrast the blogosphere and social networks and the research therein. 1. INTRODUCTION TO
Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints
"... In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce a meaningful summary, these words and phrases, which are domain synonyms, nee ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce a meaningful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. This paper proposes a constrained semisupervised learning method to solve the problem. Experimental results using reviews from five different domains show that the proposed method is competent for the task. It outperforms the original EM and the state-of-the-art
M.: Comparing Natural Language Identification Methods based on Markov Processes
- In: Slovko, International Seminar on Computer Treatment of Slavic and East European Languages
, 2007
"... Abstract. We discover and experiment with categorization-based methods to natural language identification. Two approaches to language identification based on Markov processes are compared, both methods treat the incoming text on the character level. We performed series of experiments with the aim to ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. We discover and experiment with categorization-based methods to natural language identification. Two approaches to language identification based on Markov processes are compared, both methods treat the incoming text on the character level. We performed series of experiments with the aim to make certain of high precision in language identification task of selected methods and also with the objective to compare them against themselves. Experimental evaluation was based on largescaled Multilingual Reuters Corpus with various European and Slavic languages. Our research results showed that both methods are comparable in the task of natural language identification achieving recall as high as 99,75%. 1

