Results 1 - 10
of
17
Opinion Observer: Analyzing and Comparing Opinions on the Web
- In WWW ’05: Proceedings of the 14th international conference on World Wide Web
, 2005
"... The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, i ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, it proposes a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is also implemented. The system is such that with a single glance of its visualization, the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features. This comparison is useful to both potential customers and product manufacturers. For a potential customer, he/she can see a visual side-by-side and feature-by-feature comparison of consumer opinions on these products, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Second, a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews. Such features form the basis for the above comparison. Experimental results show that the technique is highly effective and outperform existing methods significantly.
Wikify!: linking documents to encyclopedic knowledge
- In CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
, 2007
"... This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations. providing the users a quick way of accessing additional information. Wikipedia contributors perform these annotations by hand following a Wikipedia“manual of style,”which gives guidelines concerning the selection of important concepts in a text, as well as the assignment of links to appropriate related articles. For instance, Figure 1 shows an example of a Wikipedia page, including the definition for one of the meanings of the word “plant.”
Opinion extraction and summarization on the Web
- In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-2006), Nectar Paper Track
, 2006
"... The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sources containing such opinions, e.g., product reviews, forums, discussion groups, and blogs. Techniques are now being developed to exploit these sources to help organizations and individuals to gain ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sources containing such opinions, e.g., product reviews, forums, discussion groups, and blogs. Techniques are now being developed to exploit these sources to help organizations and individuals to gain such important information easily and quickly. In this paper, we first discuss several aspects of the problem in the AI context, and then present some results of our existing work published in KDD-04 and WWW-05.
A Survey of Paraphrasing and Textual Entailment Methods
, 2010
"... Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads ( ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
A symbolic approach to automatic multiword term structering. Computer Speech Language (CSL
- Computer Speech and Language (CSL), Special issue on Multiword Expressions, Elsevier, 20p. [Forthcoming
, 2005
"... This paper presents a three-level structuring of multiword terms (MWTs) basing on lexical inclusion, WordNet similarity and a clustering approach. Term clustering by automatic data analysis methods offers an interesting way of organizing a domain’s knowledge structures, useful for several informatio ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper presents a three-level structuring of multiword terms (MWTs) basing on lexical inclusion, WordNet similarity and a clustering approach. Term clustering by automatic data analysis methods offers an interesting way of organizing a domain’s knowledge structures, useful for several information-oriented tasks like science and technology watch, textmining, computer-assisted ontology population, Question Answering(Q-A). This paper explores how this three-level term structuring brings to light the knowledge structures from a corpus of genomics and compares the mapping of the domain topics against a hand-built ontology (the GENIA ontology). Ways of integrating the results into a Q-A system are discussed.
Towards the Web of Concepts: Extracting Concepts from Large Datasets
"... Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form the backbone of the next generation of search technology, we develop a novel technique to extract concepts from large dat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form the backbone of the next generation of search technology, we develop a novel technique to extract concepts from large datasets. We approach the problem of concept extraction from corpora as a market-basket problem, adapting statistical measures of support and confidence. We evaluate our concept extraction algorithm on datasets containing data from a large number of users (e.g., the AOL query log data set), and we show that a high-precision concept set can be extracted. 1.
ATA -- Automatic Term Acquisition
- In Proceedings of the Workshop on Extraction of Knowledge from Databases
, 2001
"... Terminological acquisition is an important issue when learning about Natural Language Processing (NLP) due to the constant terminological renewal caused by technological changes. Terms play a key role in several NLP activities such as machine translation, automatic indexing, text understanding, and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Terminological acquisition is an important issue when learning about Natural Language Processing (NLP) due to the constant terminological renewal caused by technological changes. Terms play a key role in several NLP activities such as machine translation, automatic indexing, text understanding, and information retrieval. This is especially true at this time when corpora in electronic format keep growing in number and variety. In this work we start by using morphological and syntactic information to locate candidate noun phrases, and then we use statistical information to improve result accuracy.
Comparative evaluation of modular automatic summarisation systems using cast
, 2006
"... This work or any part thereof has not previously been presented in any form to the University or to any other body whether for the purposes of assessment, publication or for any other purpose (unless otherwise indicated). Save for any express acknowledgments, references and/or bibliographies cited i ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This work or any part thereof has not previously been presented in any form to the University or to any other body whether for the purposes of assessment, publication or for any other purpose (unless otherwise indicated). Save for any express acknowledgments, references and/or bibliographies cited in the work, I confirm that the intellectual content of the work is the result of my own efforts and of no other person. The right of Constantin Orăsan to be identified as author of this work is asserted in accordance with ss.77 and 78 of the Copyright, Designs and Patents Act 1988. At this date copyright is owned by the author.
Contextual Acquisition of Information Categories: what has been done and what can be done automatically?
, 2002
"... ..."
Unification of multi-lingual scientific terminological resources using the ISO 16642 standard. The TermSciences initiative
"... The TermSciences initiative aims at building a multi-purpose and multi-lingual knowledge system from different source vocabularies produced by major French research institutions and which were initially intended to be used for indexing and cataloguing scientific literature. Since the construction of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The TermSciences initiative aims at building a multi-purpose and multi-lingual knowledge system from different source vocabularies produced by major French research institutions and which were initially intended to be used for indexing and cataloguing scientific literature. Since the construction of language resource repositories is cost-effective and time-consuming, the producers of these vocabularies wished to both share their terminological material and develop common tools for the collaborative management of the integrated resource. Sharing terminologies poses some problems because of the heterogeneous nature of the source data (i.e., coverage, granularity and compositionality of concepts, etc.), and to the discrepancy between partner needs (i.e., simple diffusion of the terminological material, use of the shared material to enhance information engineering tasks, etc.). This paper presents the TermSciences portal 1, which deals with the implementation of a conceptual model that uses the recent ISO 16642 standard (Terminological Markup Framework). This standard turned out to be suitable for concept modeling since it allowed for organizing the original resources by concepts and to associate the various terms for a given concept. Additional structuring is produced by sharing conceptual relationships, that is, cross-linking of resource results through the introduction of semantic relations which may have initially be missing. A special emphasis is put on medical resources used in this project, i.e. the French translation by the Institut National de la Santé et de la Recherche Médicale (INSERM) of the MeSH thesaurus from the US National Library of Medicine, the public health thesaurus of the Banque de Données de Santé Publique (BDSP) and the dictionary of human and mammals reproduction biotechnology of the Institut National de la Recherche Agronomique (INRA).

