Results 1 - 10
of
19
Ranking Definitions with Supervised Learning Methods
- In Proceedings of the 14th International World Wide Web Conference
, 2005
"... This paper is concerned with the problem of definition search. Specifically, given a term, we are to retrieve definitional excerpts of the term and rank the extracted excerpts according to their likelihood of being good definitions. This is in contrast to the traditional approaches of either generat ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
This paper is concerned with the problem of definition search. Specifically, given a term, we are to retrieve definitional excerpts of the term and rank the extracted excerpts according to their likelihood of being good definitions. This is in contrast to the traditional approaches of either generating a single combined definition or simply outputting all retrieved definitions. Definition ranking is essential for the task. Methods for performing definition ranking are proposed in this paper, which formalize the problem as either classification or ordinal regression. A specification for judging the goodness of a definition is given. We employ SVM as the classification model and Ranking SVM as the ordinal regression model respectively, such that they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined. An enterprise search system based on this method has been developed and has been put into practical use. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi. This is true both when the answers are paragraphs and when they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.
Unsupervised learning of soft patterns for generating definitions from online news
- In Proceedings of the 13th World Wide Web Conference
, 2004
"... Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such terms. Previous work has identified definitions using hand-crafted rules or supervised learning that constructs rigid, har ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such terms. Previous work has identified definitions using hand-crafted rules or supervised learning that constructs rigid, hard text patterns. In contrast, we demonstrate a new approach that uses flexible, soft matching patterns to characterize definition sentences. Our soft patterns are able to effectively accommodate the diversity of definition sentence structure exhibited in news. We use pseudorelevance feedback to automatically label sentences for use in soft pattern generation. The application of our unsupervised method significantly improves baseline systems on both the standardized TREC corpus as well as crawled online news articles by 27 % and 30%, respectively, in terms of F measure. When applied to a state-of-art definition generation system recently fielded in the TREC 2003 definitional question answering task, it improves the performance by 14%.
Learning to identify definitions using syntactic features
- Proceedings of the EACL workshop on Learning Structured Information in Natural Language Applications
, 2006
"... This paper describes an approach to learning concept definitions which operates on fully parsed text. A subcorpus of the Dutch version of Wikipedia was searched for sentences which have the syntactic properties of definitions. Next, we experimented with various text classification techniques to dist ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This paper describes an approach to learning concept definitions which operates on fully parsed text. A subcorpus of the Dutch version of Wikipedia was searched for sentences which have the syntactic properties of definitions. Next, we experimented with various text classification techniques to distinguish actual definitions from other sentences. A maximum entropy classifier which incorporates features referring to the position of the sentence in the document as well as various syntactic features, gives the best results. 1
A practically unsupervised learning method to identify single-snippet answers to definition questions on the web
- In HLT/EMNLP
, 2005
"... We present a practically unsupervised learning method to produce single-snippet answers to definition questions in question answering systems that supplement Web search engines. The method exploits on-line encyclopedias and dictionaries to generate automatically an arbitrarily large number of positi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We present a practically unsupervised learning method to produce single-snippet answers to definition questions in question answering systems that supplement Web search engines. The method exploits on-line encyclopedias and dictionaries to generate automatically an arbitrarily large number of positive and negative definition examples, which are then used to train an SVM to separate the two classes. We show experimentally that the proposed method is viable, that it outperforms the alternative of training the system on questions and news articles from TREC, and that it helps the search engine handle definition questions significantly better. 1
A Comparative Study on Sentence Retrieval for Definitional Question Answering
- SIGIR Workshop on Information Retrieval for Question Answering (IR4QA
, 2004
"... Most definitional question answering (QA) systems integrate statistical ranking using Web and WordNet as external resources and pattern matching to retrieve relevant sentences for further processing. We examine the impact of using these two common resources in answering definition questions by varyi ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Most definitional question answering (QA) systems integrate statistical ranking using Web and WordNet as external resources and pattern matching to retrieve relevant sentences for further processing. We examine the impact of using these two common resources in answering definition questions by varying the use of WordNet and two types of Web resources in statistical ranking, and definition pattern modules in a typical definitional QA system. In particular, we show that an appropriate combination of Webreinforced ranking and soft matching pattern produces an unsupervised system that outperforms the best system at TREC-12 by 6%.
Soft Pattern Matching Models for Definitional Question Answering
"... We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard matching patterns to identify definition sentences. Such rigid surface matching often fares poorly when ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard matching patterns to identify definition sentences. Such rigid surface matching often fares poorly when faced with language variations. We propose two soft matching models to address this problem: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. We demonstrate the effectiveness of the models on definition sentence retrieval for definitional question answering. We show that both models significantly outperform the state-of-the-art manually constructed hard matching patterns on recent TREC data. A critical difference between the two models is that the PHMM has a more complex topology. We experimentally show that the PHMM can handle language variations more effectively but requires more training data to converge. While we evaluate soft pattern models only on definitional question answering, we believe that both models are generic and can be extended to other areas where lexico-syntactic pattern matching can be applied.
PAGE CONTENT RANK: AN APPROACH TO THE WEB CONTENT MINING
"... Methods of web data mining can be divided into several categories according to a kind of mined information and goals that particular categories set: Web structure mining (WSM), Web usage mining (WUM), and Web Content Mining (WCM). The objective of this paper is to propose a new WCM method of a page ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Methods of web data mining can be divided into several categories according to a kind of mined information and goals that particular categories set: Web structure mining (WSM), Web usage mining (WUM), and Web Content Mining (WCM). The objective of this paper is to propose a new WCM method of a page relevance ranking based on the page content exploration. The method, we call it Page Content Rank (PCR) in the paper, combines a number of heuristics that seem to be important for analysing the content of Web pages. The page importance is determined on the base of the importance of terms which the page contains. The importance of a term is specified with respect to a given query q and it is based on its statistical and linguistic features. As a source set of pages for mining we use a set of pages responded by a search engine to the query q. PCR uses a neural network as its inner classification structure. We describe an implementation of the proposed method and a comparison of its results with the other existing classification system – PageRank algorithm.
Discovering grammar rules for Automatic Extraction of Definitions
- In Doctoral Consortium at the Eurolan Summer School 2007, Iasi, Romania
, 2007
"... Automatic extraction of definitions from text documents can be very useful in various scenarios, especially in eLearning systems. In this paper, we propose an approach aimed at assisting the discovery of grammar rules which can be used to identify definitions, using Genetic Algorithms and Genetic Pr ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Automatic extraction of definitions from text documents can be very useful in various scenarios, especially in eLearning systems. In this paper, we propose an approach aimed at assisting the discovery of grammar rules which can be used to identify definitions, using Genetic Algorithms and Genetic Programming. By categorising definitions to enable the learning of more specialised grammars, we envisage to improve the performance of our learning programs. A genetic algorithm will be used to learn the relative importance of particular predefined features in definitions. To support this algorithm, we also propose a genetic program to evolve new features from existing ones. 1
Integrating language technology and semantic web techniques
- in elearning, Proceedings of ICL 2006
"... In the LT4eL project, we will improve the retrieval of learning objects within ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In the LT4eL project, we will improve the retrieval of learning objects within
Term Graph Model for Text Classification ⋆
"... Abstract. Most existing text classification methods (and text mining methods at large) are based on representing the documents using the traditional vector space model. We argue that important information, such as the relationship among words, is lost. We propose a term graph model to represent not ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Most existing text classification methods (and text mining methods at large) are based on representing the documents using the traditional vector space model. We argue that important information, such as the relationship among words, is lost. We propose a term graph model to represent not only the content of a document but also the relationship

