Results 1 - 10
of
11
Multidocument Summarization via Information Extraction
- In Proceedings of the HLT Conference
, 2001
"... We present and evaluate the initial version of RIPTIDES, a system that combines information extraction, extraction-based summarization, and natural language generation to support userdirected multidocument summarization. 1. ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
We present and evaluate the initial version of RIPTIDES, a system that combines information extraction, extraction-based summarization, and natural language generation to support userdirected multidocument summarization. 1.
The state of the art in ontology learning: a framework for comparison
- Knowledge Engineering Review
, 2003
"... In recent years there have been some efforts to automate the ontology acquisition and construction process. The proposed systems differ from each other in some distinguishing factors and have many features in common. This paper presents the state of the art in ontology learning (OL) and introduces a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In recent years there have been some efforts to automate the ontology acquisition and construction process. The proposed systems differ from each other in some distinguishing factors and have many features in common. This paper presents the state of the art in ontology learning (OL) and introduces a framework for classifying and comparing OL systems. The dimensions of the framework answer to questions about what to learn, from where to learn and how to learn. They include features of the input, the methods of learning and knowledge acquisition, the elements learned, the resulted ontology and also the evaluation process. To extract the framework over 50 OL systems or modules from the recent workshops, conferences and published journals are studied and seven prominent of them with most differences are selected to be compared according to our framework. In this paper after a brief description of the seven selected systems we will describe the framework dimensions. Then we will place the representative ontology learning systems into our framework. At last we will describe the differences, strengths and weaknesses of various values for our dimensions in order to present a guideline for researchers to choose the appropriate features (dimensions ’ values) to create or use an OL system for their own domain or application.
Novelty Detection Based on Sentence Level Patterns
- CIKM
, 2005
"... The detection of new information in a document stream is an important component of many potential applications. In this paper, a new novelty detection approach based on the identification of sentence level patterns is proposed. Given a user’s information need, some patterns in sentences such as comb ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The detection of new information in a document stream is an important component of many potential applications. In this paper, a new novelty detection approach based on the identification of sentence level patterns is proposed. Given a user’s information need, some patterns in sentences such as combinations of query words, named entities and phrases, may contain more important and relevant information than single words. Therefore, the proposed novelty detection approach focuses on the identification of previously unseen query-related patterns in sentences. Specifically, a query is preprocessed and represented with patterns that include both query words and required answer types. These patterns are used to retrieve sentences, which are then determined to be novel if it is likely that a new answer is present. An analysis of patterns in sentences was performed with data from the TREC 2002 novelty track and experiments on novelty detection were carried out on data from the TREC 2003 and 2004 novelty tracks. The experimental results show that the proposed pattern-based approach significantly outperforms all three baselines in terms of precision at top ranks.
Voted NER System using Appropriate Unlabeled Data
"... This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic word level features along with the language dependent features extracted from the Part of Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method have been used as the features in each of the classifiers. A semi-supervised method has been used to describe the measures to automatically select effective documents and sentences from unlabeled data. Finally, the models have been combined together into a final system by weighted voting technique. Experimental results show the effectiveness of the proposed approach with the overall Recall, Precision, and F-Score values of 93.81%, 92.18 % and 92.98%, respectively. We have shown how the language dependent features can improve the system performance. 1
Improving novelty detection for general topics using sentence level information patterns
- In Proceedings of
, 2006
"... The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, the information- pattern concept for novelty detection ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, the information- pattern concept for novelty detection is presented with the emphasis on new information patterns for general topics (queries) that cannot be simply turned into specific questions whose answers are specific named entities (NEs). Then we elaborate a thorough analysis of sentence level information patterns on data from the TREC novelty tracks, including sentence lengths, named entities, sentence level opinion patterns. This analysis provides guidelines in applying those patterns in novelty detection particularly for the general topics. Finally, a unified pattern-based approach is presented to novelty detection for both general and specific topics. The new method for dealing with general topics will be the focus. Experimental results show that the proposed approach significantly improves the performance of novelty detection for general topics as well as the overall performance for all topics from the 2002-2004 TREC novelty tracks.
Effects of Developmental Heuristics for Natural Language Learning
, 2003
"... Machine learning in natural language has been a widely pursued area of research. However, few learning techniques model themselves after human learning, despite the nature of the task being closely connected to human cognition. In particular, the idea of learning language in stages is a common appro ..."
Abstract
- Add to MetaCart
Machine learning in natural language has been a widely pursued area of research. However, few learning techniques model themselves after human learning, despite the nature of the task being closely connected to human cognition. In particular, the idea of learning language in stages is a common approach for human learning, as can be seen in practice in the education system and in research on language acquisition. However, staged learning for natural language is an area largely overlooked by machine learning researchers. This thesis proposes a developmental learning heuristic for natural language models, to evaluate its performance on natural language tasks. The heuristic simulates human learning stages by training on child, teenage and adult text, provided by the British National Corpus. The three staged learning techniques that are proposed take advantage of these stages to create a single developed Hidden Markov Model. This model is then applied to the task of part-of-speech tagging to observe the effects of development on
Abstract
"... In this work, we present a new semantic language modeling approach to model news stories in the Topic Detection and Tracking (TDT) task. In the new approach, we build a unigram language model for each semantic class in a news story. We also cast the link detection subtask of TDT as a two-class class ..."
Abstract
- Add to MetaCart
In this work, we present a new semantic language modeling approach to model news stories in the Topic Detection and Tracking (TDT) task. In the new approach, we build a unigram language model for each semantic class in a news story. We also cast the link detection subtask of TDT as a two-class classification problem in which the features of each sample consist of the generative log-likelihood ratios from each semantic class. We then compute a linear discriminant classifier using the perceptron learning algorithm on the training set. Results on the test set show a marginal improvement over the unigram performance, but are not very encouraging on the whole. 1
Information Extraction From Broadcast News Speech Data
- Proceedings Of The DARPA Broadcast News Workshop, February 28-March 3
, 1999
"... In this paper we describe a robust algorithm for information extraction from spoken language data. Our probabilistic algorithm builds on results in language modeling, using classbased smoothing to produce state-of-the-art performance for a wide range of speech error rates. We show that our system pe ..."
Abstract
- Add to MetaCart
In this paper we describe a robust algorithm for information extraction from spoken language data. Our probabilistic algorithm builds on results in language modeling, using classbased smoothing to produce state-of-the-art performance for a wide range of speech error rates. We show that our system performs well with sparse data, as well as with out-of-domain data. 1. INTRODUCTION Extracting linguistic structure such as proper names, noun phrases, and verb phrases is an important first step in many systems aimed at automatic language understanding. While significant progress has been made on this problem, most of the work has focused on "clean" textual data such as newswire texts, where cues such as capitalization and punctuation are important for obtaining high accuracy results. However, there are many data sources where these cues are no t reliable, such as in spoken language data or single-case text. Spoken language sources, in particular, pose additional problems because of disflu...
Semantic Language Models for
"... In this work, we present a new semantic language modeling approach to model news stories in the Topic Detection and Tracking (TDT) task. In the new approach, we build a unigram language model for each semantic class in a news story. We also cast the link detection subtask of TDT as a two-class ..."
Abstract
- Add to MetaCart
In this work, we present a new semantic language modeling approach to model news stories in the Topic Detection and Tracking (TDT) task. In the new approach, we build a unigram language model for each semantic class in a news story. We also cast the link detection subtask of TDT as a two-class classification problem in which the features of each sample consist of the generative log-likelihood ratios from each semantic class. We then compute a linear discriminant classifier using the perceptron learning algorithm on the training set. Results on the test set show a marginal improvement over the unigram performance, but are not very encouraging on the whole.

