Results 1 - 10
of
10
The GF Resource grammar library
- August
, 2002
"... The GF Resource Grammar Library is a set of natural language grammars implemented in GF (Grammatical Framework). These grammars are in a strong sense parallel: they are built upon a common abstract syntax, i.e. a common tree structure. Individual languages are obtained via compositional mappings fro ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The GF Resource Grammar Library is a set of natural language grammars implemented in GF (Grammatical Framework). These grammars are in a strong sense parallel: they are built upon a common abstract syntax, i.e. a common tree structure. Individual languages are obtained via compositional mappings from abstract syntax trees to feature structures specific to each language. The grammar defines, for each language, a complete set of morphological paradigms and a syntax fragment comparable to CLE (Core Language Engine). It is available as open-source software under the GNU LGPL License.
Cross-Entropy and Linguistic Typology
- Proceedings of New Methods in Language Processing 3
, 1998
"... The idea of "familial relationships" among languages is well-established and accepted, although some controversies persist in a few specific instances. By painstakingly recording and identifying regularities and similarities and comparing these to the historical record, linguists have been abl ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The idea of "familial relationships" among languages is well-established and accepted, although some controversies persist in a few specific instances. By painstakingly recording and identifying regularities and similarities and comparing these to the historical record, linguists have been able to produce a general "family tree" incorporating most natural languages. We suggest here that much of these trees can be automatically determined by a complementary technique of distributional analysis. Recent work by (Farach et al., 1995) and (Juola, 1997) suggests that Kullback-Leibler divergence (or cross-entropy) can be meaningfully measured from small samples, in some cases as small as only 20 or so words. Using these techniques, we define and measure a distance function between translations of a small corpus (c. 70 words/sample) covering much of the accepted Indo-European family, and reconstruct a relationship tree by hierarchical cluster analysis. The resulting tree shows remarkable similarity to the accepted Indo-European family; this we read as evidence both for the immense power of this measurement technique and for the validity of this kind of mechanical similarity judgement in the identification of typological relationships. Furthermore, this technique is in theory sensitive to different sorts of relationships than more common word-list based methods and may help illuminate these from a different direction. 1
Inference of divergence times as a statistical inverse problem
- Phylogenetic Methods and the Prehistory of Languages. McDonald Institute Monographs
, 2004
"... A familiar complaint about statisticians and applied mathematicians is that they are the possessors of a relatively small number of rather elegant hammers with which they roam the world seeking convenient nails to pound, or at least screws they can pretend are nails. One all too often hears ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A familiar complaint about statisticians and applied mathematicians is that they are the possessors of a relatively small number of rather elegant hammers with which they roam the world seeking convenient nails to pound, or at least screws they can pretend are nails. One all too often hears
Untangling our past: Languages, Trees, Splits and Networks David Bryant
"... in the logic and practice of the other. Rhizomophiles correctly point out that standard phylogeny programmes always produce a tree, even when a tree model is not appropriate. Phylophiles rightly argue that they are only using a tree as an initial starting point in their investigations of the complex ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
in the logic and practice of the other. Rhizomophiles correctly point out that standard phylogeny programmes always produce a tree, even when a tree model is not appropriate. Phylophiles rightly argue that they are only using a tree as an initial starting point in their investigations of the complexities of human history (Jordan & Gray, 2001). While phylogenetic programmes do indeed always output a tree (or trees), it is possible to evaluate how well the data fits a tree model using these programmes. For example, consistency and retention indices can be used to measure the fit of a data set on a tree. However, what is not possible using these programmes is to evaluate explicit alternatives to a pure tree model. What is needed to get beyond the impasse of these polarised a priori views is an analytic approach that enables us to assess where on the continuum between a pure tree and a totally tangled network any particular case may lie. More specifically, this approach should be able both
Cross-Entropy and Linguistic Typology
, 1998
"... The idea o 'familial relationships" among lan- guages is well-established and accepted, although some controversies persist in a few specific instances. By painstakingly recording and identifying regnlarities and similarities and comparing these to the historical record, linguists have been ab ..."
Abstract
- Add to MetaCart
The idea o 'familial relationships" among lan- guages is well-established and accepted, although some controversies persist in a few specific instances. By painstakingly recording and identifying regnlarities and similarities and comparing these to the historical record, linguists have been able to produce a general "family tree" incorporating most natural lan- guages.
A Probabilistic Approach to Diachronic Phonology
"... We present a probabilistic model of diachronic phonology in which individual word forms undergo stochastic edits along the branches of a phylogenetic tree. Our approach allows us to achieve three goals with a single unified model: (1) reconstruction of both ancient and modern word forms, (2) discove ..."
Abstract
- Add to MetaCart
We present a probabilistic model of diachronic phonology in which individual word forms undergo stochastic edits along the branches of a phylogenetic tree. Our approach allows us to achieve three goals with a single unified model: (1) reconstruction of both ancient and modern word forms, (2) discovery of general phonological changes, and (3) selection among different phylogenies. We learn our model using a Monte Carlo EM algorithm and present quantitative results validating the model.
A Probabilistic Approach to Language Change
, 2008
"... We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. This framework combines the advantages of the classical comparative method with the robustness of corpus-based prob ..."
Abstract
- Add to MetaCart
We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. This framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes by reconstructing ancient word forms of Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies.
Quantifying semantic shift for reconstructing language families
"... In comparative historical linguistics, one must weigh evidence from large numbers of putative cognates in order to arrive at the best hypothesis of the family tree and reconstructions. The comparativist presently uses unquantified knowledge of these processes. We present a typological study of word ..."
Abstract
- Add to MetaCart
In comparative historical linguistics, one must weigh evidence from large numbers of putative cognates in order to arrive at the best hypothesis of the family tree and reconstructions. The comparativist presently uses unquantified knowledge of these processes. We present a typological study of word polysemy in order to construct a quantified network of semantic similarity among basic vocabulary items for comparative historical research. We investigate 22 concepts denoting natural objects in the Swadesh list across a typological sample of over 50 languages. In addition to its value for comparative historical linguistics, the study also reveals universals of lexical conceptual space. 1.
Simulating and reconstructing language change
"... In this work we probe phylogenetic algorithms for their ability to reconstruct historic language relationships. We present a formal model for the development of languages incorporating vertical (genealogical) and horizontal (language contact) e ects. As a distinctive feature, we also added a geograp ..."
Abstract
- Add to MetaCart
In this work we probe phylogenetic algorithms for their ability to reconstruct historic language relationships. We present a formal model for the development of languages incorporating vertical (genealogical) and horizontal (language contact) e ects. As a distinctive feature, we also added a geographic model to mimic the e ects of constrained population movements. Using our model, we generated a large number of simulated language histories whose results were analyzed by a variety of established phylogenetic algorithms. Therein, we systematically investigated the e ects of di erent contact intensities and of geographic as well es genealogic topologies. We found that tree-based algorithms are robust under a variety of di erent settings and are capable of inferring (parts of) the relationships correctly even under high levels of network-like in uences. We also studied the SplitsTree algorithm which should be more appropriate to cope with network-like e ects. However, although SplitsTree clearly performs better in some settings, it generally shows a rather erratic behavior.
Meaning and Form in a Language Computer Simulation
, 801
"... 6 Authors are listed in inverse alphabetical order. Abstract: Thousands of different forms (words) are associated with thousands of different meanings (concepts) in a language computer model. Reasonable agreement with reality is found for the number of languages in a family and the Hamming distances ..."
Abstract
- Add to MetaCart
6 Authors are listed in inverse alphabetical order. Abstract: Thousands of different forms (words) are associated with thousands of different meanings (concepts) in a language computer model. Reasonable agreement with reality is found for the number of languages in a family and the Hamming distances between languages. 1

