Results 1 - 10
of
199
Semi-Supervised Learning Literature Survey
, 2006
"... We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter ..."
Abstract
-
Cited by 268 (7 self)
- Add to MetaCart
We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis
- In Proceedings of HLT-EMNLP
, 2005
"... This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large sub ..."
Abstract
-
Cited by 129 (7 self)
- Add to MetaCart
This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline. 1
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales
- In Proc. 43st ACL
, 2005
"... We address the rating-inference problem, wherein rather than simply decide whether a review is “thumbs up ” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multi-point scale (e.g., one to five “stars”). This task represents an int ..."
Abstract
-
Cited by 115 (1 self)
- Add to MetaCart
We address the rating-inference problem, wherein rather than simply decide whether a review is “thumbs up ” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multi-point scale (e.g., one to five “stars”). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, “three stars ” is intuitively closer to “four stars ” than to “one star”. We first evaluate human performance at the task. Then, we apply a metaalgorithm, based on a metric labeling formulation of the problem, that alters a given-ary classifier’s output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem. 1
Pagerank without hyperlinks: structural re-ranking using links induced by language models
- In Proceedings of SIGIR
, 2005
"... Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider gener ..."
Abstract
-
Cited by 66 (10 self)
- Add to MetaCart
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.
Get out the vote: Determining support or opposition from Congressional floor-debate transcripts
- In Proceedings of EMNLP
, 2006
"... We investigate whether one can determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to proposed legislation. To address this problem, we exploit the fact that these speeches occur as part of a discussion; this allows us to use sou ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
We investigate whether one can determine from the transcripts of U.S. Congressional floor debates whether the speeches represent support of or opposition to proposed legislation. To address this problem, we exploit the fact that these speeches occur as part of a discussion; this allows us to use sources of information regarding relationships between discourse segments, such as whether a given utterance indicates agreement with the opinion expressed by another. We find that the incorporation of such information yields substantial improvements over classifying speeches in isolation. 1
Topic sentiment mixture: modeling facets and opinions in weblogs
- In Proc. of the 16th Int. Conference on World Wide Web
, 2007
"... In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtop ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their associated sentiments. It could also provide general sentiment models that are applicable to any ad hoc topics. With a specifically designed HMM structure, the sentiment models and topic models estimated with TSM can be utilized to extract topic life cycles and sentiment dynamics. Empirical experiments on different Weblog datasets show that this approach is effective for modeling the topic facets and sentiments and extracting their dynamics from Weblog collections. The TSM model is quite general; it can be applied to any text collections with a mixture of topics and sentiments, thus has many potential applications, such as search result summarization, opinion tracking, and user behavior prediction.
Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns
- In HLT/EMNLP 2005
, 2005
"... Recent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extr ..."
Abstract
-
Cited by 42 (11 self)
- Add to MetaCart
Recent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extraction task and adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a). While CRFs model source identification as a sequence tagging task, AutoSlog learns extraction patterns. Our results show that the combination of these two methods performs better than either one alone. The resulting system identifies opinion sources with 79.3 % precision and 59.5 % recall using a head noun matching measure, and 81.2 % precision and 60.6% recall using an overlap measure. 1
Structured Models for Fine-to-Coarse Sentiment Analysis
- Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
, 2007
"... In this paper we investigate a structured model for jointly classifying the sentiment of text at varying levels of granularity. Inference in the model is based on standard sequence classification techniques using constrained Viterbi to ensure consistent solutions. The primary advantage of such a mod ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
In this paper we investigate a structured model for jointly classifying the sentiment of text at varying levels of granularity. Inference in the model is based on standard sequence classification techniques using constrained Viterbi to ensure consistent solutions. The primary advantage of such a model is that it allows classification decisions from one level in the text to influence decisions at another. Experiments show that this method can significantly reduce classification error relative to models trained in isolation. 1
Pulse: Mining Customer Opinions from Free Text
- In Proc. of the 6th International Symposium on Intelligent Data Analysis
, 2005
"... Abstract. We present a prototype system, code-named Pulse, for mining topics and sentiment orientation jointly from free text customer feedback. We describe the application of the prototype system to a database of car reviews. Pulse enables the exploration of large quantities of customer free text. ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Abstract. We present a prototype system, code-named Pulse, for mining topics and sentiment orientation jointly from free text customer feedback. We describe the application of the prototype system to a database of car reviews. Pulse enables the exploration of large quantities of customer free text. The user can examine customer opinion “at a glance ” or explore the data at a finer level of detail. We describe a simple but effective technique for clustering sentences, the application of a bootstrapping approach to sentiment classification, and a novel user-interface. 1
Using random walks for question-focused sentence retrieval
- In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP
, 2005
"... We consider the problem of questionfocused sentence retrieval from complex news articles describing multi-event stories published over time. Annotators generated a list of questions central to understanding each story in our corpus. Because of the dynamic nature of the stories, many questions are ti ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
We consider the problem of questionfocused sentence retrieval from complex news articles describing multi-event stories published over time. Annotators generated a list of questions central to understanding each story in our corpus. Because of the dynamic nature of the stories, many questions are time-sensitive (e.g. “How many victims have been found?”) Judges found sentences providing an answer to each question. To address the sentence retrieval problem, we apply a stochastic, graph-based method for comparing the relative importance of the textual units, which was previously used successfully for generic summarization. Currently, we present a topic-sensitive version of our method and hypothesize that it can outperform a competitive baseline, which compares the similarity of each sentence to the input question via IDF-weighted word overlap. In our experiments, the method achieves a TRDR score that is significantly higher than that of the baseline. 1

