Results 1 - 10
of
137
Unsupervised Models for Named Entity Classification
- In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
, 1999
"... This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabe ..."
Abstract
-
Cited by 359 (3 self)
- Add to MetaCart
This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabeled data can reduce the requirements for supervision to just 7 simple “seed ” rules. The approach gains leverage from natural redundancy in the data: for many named-entity instances both the spelling of the name and the context in which it appears are sufficient to determine its type. We present two algorithms. The first method uses a similar algorithm to that of (Yarowsky 95), with modifications motivated by (Blum and Mitchell 98). The second algorithm extends ideas from boosting algorithms, designed for supervised learning tasks, to the framework suggested by (Blum and Mitchell 98). 1
Extracting product features and opinions from reviews
, 2005
"... Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces OPINE, an unsupervised informationextraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their ..."
Abstract
-
Cited by 151 (2 self)
- Add to MetaCart
Consumers are often forced to wade through many on-line reviews in order to make an informed product choice. This paper introduces OPINE, an unsupervised informationextraction system which mines reviews in order to build a model of important product features, their evaluation by reviewers, and their relative quality across products. Compared to previous work, OPINE achieves 22 % higher precision (with only 3 % lower recall) on the feature extraction task. OPINE’s novel use of relaxation labeling for finding the semantic orientation of words in context leads to strong performance on the tasks of finding opinion phrases and their polarity. 1
Towards the Self-Annotating Web
, 2004
"... The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata. In this paper we propose PANKOW (Pattern-based Annotation throug ..."
Abstract
-
Cited by 117 (10 self)
- Add to MetaCart
The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata. In this paper we propose PANKOW (Pattern-based Annotation through Knowledge on the Web), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.
Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery
, 2005
"... In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-lev ..."
Abstract
-
Cited by 101 (18 self)
- Add to MetaCart
In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-level in the form of instantiated modeling primitives within a so called Probabilistic Ontology Model (POM), we remain independent of a concrete target language while being able to translate the instantiated primitives into any (reasonably expressive) knowledge representation formalism. Second, user interaction is a core aspect of Text2Onto and the fact that the system calculates a confidence for each learned object allows to design sophisticated visualizations of the POM. Third, by incorporating strategies for data-driven change discovery, we avoid processing the whole corpus from scratch each time it changes, only selectively updating the POM according to the corpus changes instead. Besides increasing e#ciency in this way, it also allows a user to trace the evolution of the ontology with respect to the changes in the underlying corpus.
Espresso: Leveraging generic patterns for automatically harvesting semantic relations
, 2006
"... In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and inst ..."
Abstract
-
Cited by 80 (1 self)
- Add to MetaCart
In this paper, we present Espresso, a weakly-supervised, general-purpose, and accurate algorithm for harvesting semantic relations. The main contributions are: i) a method for exploiting generic patterns by filtering incorrect instances using the Web; and ii) a principled measure of pattern and instance reliability enabling the filtering algorithm. We present an empirical comparison of Espresso with various state of the art systems, on different size and genre corpora, on extracting various general and specific relations. Experimental results show that our exploitation of generic patterns substantially increases system recall with small effect on overall precision. 1
Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis
- Journal of Artificial Intelligence research
, 2005
"... We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Ha ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.
Learning Taxonomic Relations from Heterogeneous Evidence
"... We present a novel approach to the automatic acquisition of taxonomic relations. The main difference to earlier approaches is that we do not only consider one single source of evidence, i.e. a specific algorithm or approach, but examine the possibility of learning taxonomic relations by considerin ..."
Abstract
-
Cited by 63 (8 self)
- Add to MetaCart
We present a novel approach to the automatic acquisition of taxonomic relations. The main difference to earlier approaches is that we do not only consider one single source of evidence, i.e. a specific algorithm or approach, but examine the possibility of learning taxonomic relations by considering various and heterogeneous forms of evidence. In particular, we derive these different evidences by using well-known NLP techniques and resources and combine them via two simple strategies. Our approach shows very promising results compared to other results from the literature. The main aim of the work presented in this paper is (i) to gain insight into the behaviour of different approaches to learn taxonomic relations, (ii) to provide a first step towards combining these different approaches, and (iii) to establish a baseline for further research.
Gimme’ The Context: Context-driven Automatic Semantic Annotation with C-PANKOW
, 2005
"... Without the proliferation of formal semantic annotations, the Semantic Web is certainly doomed to failure. In earlier work we presented a new paradigm to avoid this: the ’Self Annotating Web’, in which globally available knowledge is used to annotate resources such as web pages. In particular, we pr ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
Without the proliferation of formal semantic annotations, the Semantic Web is certainly doomed to failure. In earlier work we presented a new paradigm to avoid this: the ’Self Annotating Web’, in which globally available knowledge is used to annotate resources such as web pages. In particular, we presented a concrete method instantiating this paradigm, called PANKOW (Pattern-based ANnotation through Knowledge On the Web). In PANKOW, a named entity to be annotated is put into several linguistic patterns that convey competing semantic meanings. The patterns that are matched most often on the Web indicate the meaning of the named entity — leading to automatic or semi-automatic annotation. In this paper we present C-PANKOW (Context-driven PANKOW), which alleviates several shortcomings of PANKOW. First, by downloading abstracts and processing them off-line, we avoid the generation of large number of linguistic patterns and correspondingly large number of Google queries. Second, by linguistically analyzing and normalizing the downloaded abstracts, we increase the coverage of our pattern matching mechanism and overcome several limitations of the earlier pattern generation process. Third, we use the annotation context in order to distinguish the significance of a pattern match for the given annotation task. Our experiments show that C-PANKOW inherits all the advantages of PANKOW (no training required etc.), but in addition it is far more efficient and effective.
Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques
- In IEEE Intl. Conf. on Data Mining (ICDM
, 2003
"... We present Sentiment Analyzer (SA) that extracts sentiment (or opinion) about a subject from online text documents. Instead of classifying the sentiment of an entire document about a subject, SA detects all references to the given subject, and determines sentiment in each of the references using nat ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
We present Sentiment Analyzer (SA) that extracts sentiment (or opinion) about a subject from online text documents. Instead of classifying the sentiment of an entire document about a subject, SA detects all references to the given subject, and determines sentiment in each of the references using natural language processing (NLP) techniques. Our sentiment analysis consists of 1) a topic specific feature term extraction, 2) sentiment extraction, and 3) (subject, sentiment) association by relationship analysis. SA utilizes two linguistic resources for the analysis: the sentiment lexicon and the sentiment pattern database. The performance of the algorithms was verified on online product review articles (“digital camera ” and “music ” reviews), and more general documents including general webpages and news articles. 1.
Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations
- In Proceedings of HLT/NAACL-03
, 2003
"... The discovery of semantic relations from text becomes increasingly important for applications such as Question Answering, Information Extraction, Text Summarization, Text Understanding, and others. The semantic relations are detected by checking selectional constraints. ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
The discovery of semantic relations from text becomes increasingly important for applications such as Question Answering, Information Extraction, Text Summarization, Text Understanding, and others. The semantic relations are detected by checking selectional constraints.

