Results 11 - 20
of
67
A framework for named entity recognition in the open domain
- In Proceedings of the Recent Advances in Natural Language Processing (RANLP
, 2003
"... In this paper, a system for Named Entity Recognition in the Open domain (NERO) is described. It is concerned with recognition of various types of entity, types that will be appropriate for Information Extraction in any scenario context. The recognition task is performed by identifying normally capit ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, a system for Named Entity Recognition in the Open domain (NERO) is described. It is concerned with recognition of various types of entity, types that will be appropriate for Information Extraction in any scenario context. The recognition task is performed by identifying normally capitalised phrases in a document and then submitting queries to a search engine to find potential hypernyms of the capitalised sequences. These hypernyms are then clustered to derive a typology of named entities for the document. The hypernyms of the normally capitalised phrases are used to classify them with respect to this typology. The method is tested on a small corpus and its classifications are evaluated. Finally, conclusions are drawn and future work considered. 1
Automated hierarchy discovery for planning in partially observable domains
- Advances in Neural Information Processing Systems 19
, 2006
"... author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public.
Conditional probabilistic context-free grammars
, 2004
"... In this note we present a discriminative framework for learning distributions over parse trees of context-free languages, which we call conditional probabilistic context-free grammars (CPCFGs). The best-performing approaches to learning statistical parsing models are generative, in that they estimat ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In this note we present a discriminative framework for learning distributions over parse trees of context-free languages, which we call conditional probabilistic context-free grammars (CPCFGs). The best-performing approaches to learning statistical parsing models are generative, in that they estimate the joint distribution p(t, x) over parse trees
Statistical relational learning for natural language information extraction
- In Getoor, L., & Taskar, B. (Eds.), Statistical Relational Learning, forthcoming book
, 2005
"... Understanding natural language presents many challenging problems that lend themselves to statistical relational learning (SRL). Historically, both logical and probabilistic methods have found wide application in natural language processing (NLP). NLP inevitably involves reasoning about an arbitrary ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Understanding natural language presents many challenging problems that lend themselves to statistical relational learning (SRL). Historically, both logical and probabilistic methods have found wide application in natural language processing (NLP). NLP inevitably involves reasoning about an arbitrary number of entities
Improving Information Retrieval with Textual Analysis: Bayesian Models and Beyond
- Master’s thesis, MIT
, 2001
"... Information retrieval (IR) is a difficult problem. While many have attempted to model text documents and improve search results by doing so, the most successful text retrieval to date has been developed in an ad-hoc manner. One possible reason for this is that in developing these models very little ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Information retrieval (IR) is a difficult problem. While many have attempted to model text documents and improve search results by doing so, the most successful text retrieval to date has been developed in an ad-hoc manner. One possible reason for this is that in developing these models very little focus has been placed on the actual properties of text. In this thesis, we discuss a principled Bayesian approach we take to information retrieval, which we base on the standard IR probabilistic model. Not surprisingly, we find this approach to be less successful than traditional ad-hoc retrieval. Using data analysis to highlight the discrepancies between our model and the actual properties of text documents, we hope to arrive at a better model for our corpus, and thus a better information retrieval strategy. Specifically, we believe we will find it is inaccurate to assume that whether a term occurs in a document is independent of whether it has already occurred, and we will suggest a way to improve upon this without adding complexity to the solution.
Descriptive Clustering as a Method for Exploring Text Collections
, 2006
"... Grupowanie opisowe jako metoda eksploracji zbiorów dokumentów tekstowych ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Grupowanie opisowe jako metoda eksploracji zbiorów dokumentów tekstowych
On the utility of curricula in unsupervised learning of probabilistic grammars (supplementary material
, 2011
"... We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insigh ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We examine the utility of a curriculum (a means of presenting training samples in a meaningful order) in unsupervised learning of probabilistic grammars. We introduce the incremental construction hypothesis that explains the benefits of a curriculum in learning grammars and offers some useful insights into the design of curricula as well as learning algorithms. We present results of experiments with (a) carefully crafted synthetic data that provide support for our hypothesis and (b) natural language corpus that demonstrate the utility of curricula in unsupervised learning of probabilistic grammars. 1
CoLesIR at CLEF 2006: rapid prototyping of a N-gram-based CLIR system
- In Nardi et al. [8
, 2006
"... In this our first joint participation as the CoLesIR group, our team has participated in the Portuguese monolingual ad-hoc task and in all robust ad-hoc tasks —all monolingual tasks, the English-to-German bilingual task, and the multilingual task. We have developed an n-gram model inspired by the pr ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this our first joint participation as the CoLesIR group, our team has participated in the Portuguese monolingual ad-hoc task and in all robust ad-hoc tasks —all monolingual tasks, the English-to-German bilingual task, and the multilingual task. We have developed an n-gram model inspired by the previous work of the Johns Hopkins University Applied Physics Lab. Our approach makes generalized use of freely available resources —such as the Europarl parallel corpus, the GIZA++ wordalignment toolkit, and the Terrier retrieval platform—, and employs a new n-gram direct translation technique. This new technique takes as input previously existing aligned word lists and obtains as output aligned n-gram lists. It can also handle word translation probabilities, as in the case of statistical word alignments. This new n-gram-based approach shares the main advantages of the original proposal. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. Since it does not rely on language-specific processing, it can be applied to very different languages, even when
Holistic Query Expansion Using Graphical Models
, 2004
"... this paper we present a method for answering relationship questions, as posed for example in the spring of 2003 evaluation exercise of the AQUAINT program, which has funded this research ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this paper we present a method for answering relationship questions, as posed for example in the spring of 2003 evaluation exercise of the AQUAINT program, which has funded this research
A mathematical model of context
- Modeling and Using Context. LNAI 2680
, 2003
"... Context is vital for deciding which of the possible senses of a word is being used in a particular situation, a task known as disambiguation. Motivated by a survey of disambiguation techniques in natural language processing, this paper presents a mathematical model describing the relationship betwee ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Context is vital for deciding which of the possible senses of a word is being used in a particular situation, a task known as disambiguation. Motivated by a survey of disambiguation techniques in natural language processing, this paper presents a mathematical model describing the relationship between words, meanings and contexts, giving examples of how context-groups can be used to distinguish different senses of ambiguous words. Many aspects of this model have interesting similarities with quantum theory. 1

