Results 1 - 10
of
338
Opinion observer: analyzing and comparing opinions on the web
- In WWW2005: the 4th international conference on World Wide Web, 2005
"... The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, i ..."
Abstract
-
Cited by 277 (12 self)
- Add to MetaCart
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, it proposes a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is also implemented. The system is such that with a single glance of its visualization, the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features. This comparison is useful to both potential customers and product manufacturers. For a potential customer, he/she can see a visual side-by-side and feature-by-feature comparison of consumer opinions on these products, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Second, a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews. Such features form the basis for the above comparison. Experimental results show that the technique is highly effective and outperform existing methods significantly.
Mining Opinion Features in Customer Reviews
- In Proceedings of Nineteeth National Conference on Artificial Intellgience (AAAI
, 2004
"... It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews c ..."
Abstract
-
Cited by 192 (3 self)
- Add to MetaCart
(Show Context)
It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds. This makes it difficult for a potential customer to read them in order to make a decision on whether to buy the product. In this project, we aim to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we are only interested in the specific features of the product that customers have opinions on and also whether the opinions are positive or negative. We do not summarize the reviews by selecting or rewriting a subset of the original sentences from the reviews to capture their main points as in the classic text summarization. In this paper, we only focus on mining opinion/product features that the reviewers have commented on. A number of techniques are presented to mine such features. Our experimental results show that these techniques are highly effective.
Espresso: Leveraging generic patterns for automatically harvesting semantic relations
, 2006
"... In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and inst ..."
Abstract
-
Cited by 183 (2 self)
- Add to MetaCart
In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and instance reliability enabling the filtering algorithm. We present an empirical comparison of Espresso with various state of the art systems, on different size and genre corpora, on extracting various general and specific relations. Experimental results show that our exploitation of generic patterns substantially increases system recall with small effect on overall precision. 1
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 139 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method
- INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES
, 2000
"... Technical terms (henceforth called terms), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machinereadable special language corpora. The method, (C-value/NC-value), combines linguistic and stati ..."
Abstract
-
Cited by 132 (12 self)
- Add to MetaCart
Technical terms (henceforth called terms), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machinereadable special language corpora. The method, (C-value/NC-value), combines linguistic and statistical information. The rst part, C-value enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular typeofmulti-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms), 2) the incorporation of information from term context words to the extraction of terms.
Improved automatic keyword extraction given more linguistic knowledge
- Proc. of the 2003 Conf. on Empirical Methods in NLP
, 2003
"... In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (s ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the POS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.
TERMIGHT: identifying and translating technical terminology”.
- 4th Conference on Applied Natural Language Processing,
, 1994
"... ..."
Comparing corpora
- International Journal of Corpus Linguistics
"... Corpus linguistics lacks strategies for describing and compar-ing corpora. Currently most descriptions of corpora are textual, and questions such as ‘what sort of a corpus is this?’, or ‘how does this corpus compare to that? ’ can only be answered impressionistically. This paper considers various wa ..."
Abstract
-
Cited by 107 (6 self)
- Add to MetaCart
Corpus linguistics lacks strategies for describing and compar-ing corpora. Currently most descriptions of corpora are textual, and questions such as ‘what sort of a corpus is this?’, or ‘how does this corpus compare to that? ’ can only be answered impressionistically. This paper considers various ways in which different corpora can be compared more objectively. First we address the issue, ‘which words are particularly characteristic of a corpus?’, reviewing and critiquing the statistical methods which have been applied to the question and proposing the use of the Mann-Whitney ranks test. Results of two corpus com-parisons using the ranks test are presented. Then, we consider measures for corpus similarity. After discussing limitations of the idea of corpus similarity, we present a method for evaluat-ing corpus similarity measures. We consider several measures and establish that a χ2-based one performs best. All methods considered in this paper are based on word and ngram fre-quencies; the strategy is defended. 1
Disambiguating Proteins, Genes, and RNA in Text: A Machine Learning Approach
, 2001
"... We present an automated system for assigning protein, gene, or mRNA class labels to biological terms in free text. Three machine learning algorithms and several extended ways for defining contextual features for disambiguation are examined, and a fully unsupervised manner for obtaining training exam ..."
Abstract
-
Cited by 78 (1 self)
- Add to MetaCart
We present an automated system for assigning protein, gene, or mRNA class labels to biological terms in free text. Three machine learning algorithms and several extended ways for defining contextual features for disambiguation are examined, and a fully unsupervised manner for obtaining training examples is proposed. We train and evaluate our system over a collection of 9 million words of molecular biology journal articles, obtaining accuracy rates up to 85%.