Results 1 - 10
of
22
Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax
- In proceedings of the 35th Annual Meeting of the ACL
, 1997
"... A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using tran ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the success- ful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed.
The Problem of Logical-Form Equivalence
- Computational Linguistics
, 1992
"... this paper, I review the problem, and attempt to highlight certain salient aspects of it that have been lost in the pursuing of solutions, in order to reconcile the apparently contradictory claims of the problem's intractability and its resolution. 2 Review of Natural-Language Generation ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
this paper, I review the problem, and attempt to highlight certain salient aspects of it that have been lost in the pursuing of solutions, in order to reconcile the apparently contradictory claims of the problem's intractability and its resolution. 2 Review of Natural-Language Generation
Formal grammar and information theory: Together again?
- PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY
, 2000
"... In the last 40 years, research on models of spoken and written language has been split between two seemingly irreconcilable traditions: formal linguistics in the Chomsky tradition, and information theory in the Shannon tradition. Zellig Harris had advocated a close alliance between grammatical and i ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In the last 40 years, research on models of spoken and written language has been split between two seemingly irreconcilable traditions: formal linguistics in the Chomsky tradition, and information theory in the Shannon tradition. Zellig Harris had advocated a close alliance between grammatical and information-theoretic principles in the analysis of natural language, and early formal-language theory provided another strong link between information theory and linguistics. Nevertheless, in most research on language and computation, grammatical and information-theoretic approaches had moved far apart. Today, after many years on the defensive, the information-theoretic approach has gained new strength and achieved practical successes in speech recognition, information retrieval, and, increasingly, in language analysis and machine translation. The exponential increase in the speed and storage capacity of computers is the proximate cause of these engineering successes, allowing the automatic estimation of the parameters of probabilistic models of language by counting occurrences of linguistic events in very large bodies of text and speech. However, I will argue that informationtheoretic and computational ideas are also playing an increasing role in the scientific understanding of language, and will help bring together formal-linguistic and information-theoretic perspectives.
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Corpus-Based Thesaurus Construction for Image Retrieval in Specialist Domains
- in Proceedings of the 25th European Conference on Advances in Information Retrieval (ECIR
, 2003
"... This paper explores the use of texts that are related to an image collection, also known as collateral texts, for building thesauri in specialist domains to aid in image retrieval. Corpus linguistic and information extraction methods are used for identifying key terms and conceptual relationships ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
This paper explores the use of texts that are related to an image collection, also known as collateral texts, for building thesauri in specialist domains to aid in image retrieval. Corpus linguistic and information extraction methods are used for identifying key terms and conceptual relationships in specialist texts that may be used for query expansion purposes. The specialist domain context imposes certain constraints on the language used in the texts, which makes the texts computationally more tractable. The effectiveness of such an approach is demonstrated through a prototype system that has been developed for the storage and retrieval of images and texts, applied in the forensic science domain.
Unsupervised Efficient Learning and Representation of Language Structure
- PROC. 25TH CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY
, 2003
"... We describe a linguistic pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of corpus data. This is achieved by compactly coding recursively structured constituent patterns, and by placing strings that have an identical backbone and similar conte ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
We describe a linguistic pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of corpus data. This is achieved by compactly coding recursively structured constituent patterns, and by placing strings that have an identical backbone and similar context structure into the same equivalence class. The resulting representations constitute an efficient encoding of linguistic knowledge and support systematic generalization to unseen sentences.
Parsing with Finite-State Transducers
, 1996
"... Accurately parsing natural language sentences requires large scale and detailed lexical grammars. We will see that for the problem of parsing natural language sentences, #nite-state models are not an e#cient but somewhat inaccurate tool but rather one of the best formalism to represent accurately co ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Accurately parsing natural language sentences requires large scale and detailed lexical grammars. We will see that for the problem of parsing natural language sentences, #nite-state models are not an e#cient but somewhat inaccurate tool but rather one of the best formalism to represent accurately complex linguistic phenomena. Finite-state transducers should appeal to the linguist looking for precise and natural description of complex syntactic structures while the wide range of formal operations on #nite-state transducers provides the designer of parsing programs with powerful tools to improve parsing e#ciency. The parsing programs derived from this approach are both simple, precise linguistically and very e#cient.
Symbolic Machine Learning: A Different Answer to the Problem of the Acquisition of Lexical Knowledge from Corpora, in "TripleC (Cognition, Communication, Co-operation), special issue: selected papers from
- ECAP 2005 - European Computing and Philosophy Conference 2005
"... Abstract: One relevant way to structure the domain of lexical knowledge (complex terms, or relations between lexical units) acquisition from corpora is to oppose numerical versus symbolic techniques. Numerical approaches of acquisition exploit the frequential aspect of data, and use statistical tech ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: One relevant way to structure the domain of lexical knowledge (complex terms, or relations between lexical units) acquisition from corpora is to oppose numerical versus symbolic techniques. Numerical approaches of acquisition exploit the frequential aspect of data, and use statistical techniques, while symbolic approaches exploit the structural aspect of data, and use structural or symbolic information. Methods from this former approach have been widely used and produce portable, robust, and fully automatic systems. They provide however poor explanations of their results, and may have difficulties to grasp very specific relations. The symbolic approach groups two strategies. The first one is the symbolic linguistic approach, in which operational definitions of the elements to acquire are manually established by linguists ―usually in the form of morpholexical patterns that carry the studied terms or relations―, or by a list of linguistic clues. However, when such patterns or clues are unknown, but examples of elements respecting the target terms or relation are known, techniques from the second strategy of this symbolic approach can be used, i.e. symbolic machine learning (ML) methods. This facet of this approach, far less known and employed, is just beginning to appear and widen in the natural language processing community. The aim of this paper is to point out the interest of such techniques, and to show how they can be used to infer efficient and expressive extraction patterns of complex terms or lexical relations from examples of elements that verify the target relations or the form of the terms. However, these techniques are often supervised, i.e. require to be (manually) fed by examples. We also explain that one method from each of the numerical and symbolic ML approaches can be combined in order to keep advantages from both: meaningful patterns, efficient extraction and portability. 1.

