Results 1 - 10
of
44
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis
- In Proceedings of HLT-EMNLP
, 2005
"... This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large sub ..."
Abstract
-
Cited by 454 (15 self)
- Add to MetaCart
(Show Context)
This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline. 1
Sentiment classification of movie reviews using contextual valence shifters.
- Computational Intelligence,
, 2006
"... We present two methods for determining the sentiment expressed by a movie review. The semantic orientation of a review can be positive, negative, or neutral. We examine the effect of valence shifters on classifying the reviews. We examine three types of valence shifters: negations, intensifiers, an ..."
Abstract
-
Cited by 114 (1 self)
- Add to MetaCart
We present two methods for determining the sentiment expressed by a movie review. The semantic orientation of a review can be positive, negative, or neutral. We examine the effect of valence shifters on classifying the reviews. We examine three types of valence shifters: negations, intensifiers, and diminishers. Negations are used to reverse the semantic polarity of a particular term, while intensifiers and diminishers are used to increase and decrease, respectively, the degree to which a term is positive or negative. The first method classifies reviews based on the number of positive and negative terms they contain. We use the General Inquirer to identify positive and negative terms, as well as negation terms, intensifiers, and diminishers. We also use positive and negative terms from other sources, including a dictionary of synonym differences and a very large Web corpus. To compute corpus-based semantic orientation values of terms, we use their association scores with a small group of positive and negative terms. We show that extending the term-counting method with contextual valence shifters improves the accuracy of the classification. The second method uses a Machine Learning algorithm, Support Vector Machines. We start with unigram features and then add bigrams that consist of a valence shifter and another word. The accuracy of classification is very high, and the valence shifter bigrams slightly improve it. The features that contribute to the high accuracy are the words in the lists of positive and negative terms. Previous work focused on either the term-counting method or the Machine Learning method. We show that combining the two methods achieves better results than either method alone.
Sentiment analysis in multiple languages
- ACM Transactions on Information Systems
, 2008
"... The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of web forum opinions in multiple languages. The utility of stylistic and syntactic feat ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The Entropy Weighted Genetic Algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of the key features. The proposed features and techniques are evaluated on a benchmark movie review data set and U.S. and Middle Eastern web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracy over 95 % on the benchmark data set and over 93 % for both the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all test beds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document level classification of sentiments.
Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis
- Computational Linguistics
, 2009
"... Many approaches to automatic sentiment analysis begin with a large lexicon of words marked with their prior polarity (also called semantic orientation). However, the contextual polarity of the phrase in which a particular instance of a word appears may be quite different from the word’s prior polari ..."
Abstract
-
Cited by 95 (2 self)
- Add to MetaCart
Many approaches to automatic sentiment analysis begin with a large lexicon of words marked with their prior polarity (also called semantic orientation). However, the contextual polarity of the phrase in which a particular instance of a word appears may be quite different from the word’s prior polarity. Positive words are used in phrases expressing negative sentiments, or vice versa. Also, quite often words that are positive or negative out of context are neutral in context, meaning they are not even being used to express a sentiment. The goal of this work is to automatically distinguish between prior and contextual polarity, with a focus on understanding which features are important for this task. Because an important aspect of the problem is identifying when polar terms are being used in neutral contexts, features for distinguishing between neutral and polar instances are evaluated, as well as features for distinguishing between positive and negative contextual polarity. The evaluation includes assessing the performance of features across multiple machine learning algorithms. For all learning algorithms except one, the combination of all features together gives the best performance. Another facet of the evaluation considers how the presence of neutral instances affects the performance of features for distinguishing between positive and negative polarity. These experiments show that the presence of neutral instances greatly degrades the performance of these features, and that perhaps the best way to improve performance across all polarity classes is to improve the system’s ability to identify when an instance is neutral. 1.
Which side are you on?: identifying perspectives at the document and sentence levels
- In Proceedings of the Tenth Conference on Computational Natural Language Learning
, 2006
"... In this paper we investigate a new problem of identifying the perspective from which a document is written. By perspective we mean a point of view, for example, from the perspective of Democrats or Republicans. Can computers learn to identify the perspective of a document? Not every sentence is writ ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
(Show Context)
In this paper we investigate a new problem of identifying the perspective from which a document is written. By perspective we mean a point of view, for example, from the perspective of Democrats or Republicans. Can computers learn to identify the perspective of a document? Not every sentence is written strongly from a perspective. Can computers learn to identify which sentences strongly convey a particular perspective? We develop statistical models to capture how perspectives are expressed at the document and sentence levels, and evaluate the proposed models on articles about the Israeli-Palestinian conflict. The results show that the proposed models successfully learn how perspectives are reflected in word usage and can identify the perspective of a document with high accuracy. 1
Recognizing strong and weak opinion clauses
- Computational Intelligence
, 2006
"... There has been a recent swell of interest in the automatic identification and extraction of opinions and emotions in text. In this paper, we present the first experimental results classifying the intensity of opinions and other types of subjectivity and classifying the subjectivity of deeply nested ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
(Show Context)
There has been a recent swell of interest in the automatic identification and extraction of opinions and emotions in text. In this paper, we present the first experimental results classifying the intensity of opinions and other types of subjectivity and classifying the subjectivity of deeply nested clauses. We use a wide range of features, including new syntactic features developed for opinion recognition. We vary the learning algorithm and the feature organization to explore the effect this has on the classification task. In 10-fold cross-validation experiments using support vector regression, we achieve improvements in mean-squared error over baseline ranging from 49% to 51%. Using boosting, we achieve improvements in accuracy ranging from 23% to 96%.
Interactive feature selection
- In Proc. of IJCAI
, 2005
"... We execute a careful study of the effects of feature selection and human feedback on features in active learn-ing settings. Our experiments on a variety of text categorization tasks indicate that there is significant potential in improving classifier performance by feature reweighting, beyond that a ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
We execute a careful study of the effects of feature selection and human feedback on features in active learn-ing settings. Our experiments on a variety of text categorization tasks indicate that there is significant potential in improving classifier performance by feature reweighting, beyond that achieved via selective sampling alone (standard active learning) if we have access to an oracle that can point to the important (most predictive) fea-tures. Consistent with previous findings, we find that feature selection based on the labeled training set has little effect. But our experiments on human subjects indicate that human feedback on feature relevance can identify a sufficient proportion (65%) of the most relevant features. Furthermore, these experiments show that feature labeling takes much less (about 1/5th) time than document labeling. We propose an algorithm that interleaves labeling features and documents which significantly accelerates active learning. Feature feedback can complement traditional active learning in applications like filtering, personalization, and recommendation. 1.
Active learning with feedback on both features and instances
- Journal of Machine Learning Research
, 2006
"... We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text categorization. Our experiments on a variety of categorizati ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text categorization. Our experiments on a variety of categorization tasks indicate that there is significant potential in improving classifier performance by feature re-weighting, beyond that achieved via membership queries alone (traditional active learning) if we have access to an oracle that can point to the important (most predictive) features. Our experiments on human subjects indicate that human feedback on feature relevance can identify a sufficient proportion of the most relevant features (over 50 % in our experiments). We find that on average, labeling a feature takes much less time than labeling a document. We devise an algorithm that interleaves labeling features and documents which significantly accelerates standard active learning in our simulation experiments. Feature feedback can complement traditional active learning in applications such as news filtering, e-mail classification, and personalization, where the human teacher can have significant knowledge on the relevance of features.
Sentiment mining in WebFountain
- in Proceedings of the 21st International Conference on Data Engineering (ICDE ’05
, 2005
"... WebFountain is a platform for very large-scale text an-alytics applications that allows uniform access to a wide variety of sources. It enables the deployment of a variety of document-level and corpus-level miners in a scalable man-ner, and feeds information that drives end-user applications through ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
WebFountain is a platform for very large-scale text an-alytics applications that allows uniform access to a wide variety of sources. It enables the deployment of a variety of document-level and corpus-level miners in a scalable man-ner, and feeds information that drives end-user applications through a set of hosted Web services. Sentiment (or opinion) mining is one of the most useful analyses for various end-user applications, such as reputa-tion management. Instead of classifying the sentiment of an entire document about a subject, our sentiment miner de-termines sentiment of each subject reference using natural language processing techniques. In this paper, we describe the fully functional system environment and the algorithms, and report the performance of the sentiment miner. The per-formance of the algorithms was verified on online product review articles, and more general documents including Web pages and news articles. 1
Using “annotator rationales” to improve machine learning for text categorization
- Proc. NAACL HLT
, 2007
"... We propose a new framework for supervised machine learning. Our goal is to learn from smaller amounts of supervised training data, by collecting a richer kind of training data: annotations with “rationales.” When annotating an example, the human teacher will also highlight evidence supporting this a ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
We propose a new framework for supervised machine learning. Our goal is to learn from smaller amounts of supervised training data, by collecting a richer kind of training data: annotations with “rationales.” When annotating an example, the human teacher will also highlight evidence supporting this annotation—thereby teaching the machine learner why the example belongs to the category. We provide some rationale-annotated data and present a learning method that exploits the rationales during training to boost performance significantly on a sample task, namely sentiment classification of movie reviews. We hypothesize that in some situations, providing rationales is a more fruitful use of an annotator’s time than annotating more examples. 1