Results 1 - 10
of
40
Probabilistic Syntax
, 2002
"... istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don't model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered -- and hence must be estimated to have probability zero, Chomsky wrongly assumes -- but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mind-internal I-language as opposed to the observed-in-the-world E-language. This chapter is not meant to be a detailed critique of Chomsky's arguments -- Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion -- but some of these concerns are still importa
A Latent Variable Model for Geographic Lexical Variation
"... The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports ” or “ent ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports ” or “entertainment ” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models. 1
Discovering Sociolinguistic Associations with Structured Sparsity
"... We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors ’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors ’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ1, ∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties. 1
Pronunciation Modeling in Speech Synthesis
, 1998
"... iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel
SOCIOPHONETIC APPLICATIONS OF SPEECH PERCEPTION EXPERIMENTS
"... Although studies of perception are still largely assigned to the realms of experimental phonetics or psychology, sociolinguists have been recognizing the importance of perception. Several lines of experimental inquiry have emerged. Nevertheless, perception has been studied far less by sociolinguists ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Although studies of perception are still largely assigned to the realms of experimental phonetics or psychology, sociolinguists have been recognizing the importance of perception. Several lines of experimental inquiry have emerged. Nevertheless, perception has been studied far less by sociolinguists than has speech production. One reason is that speech perception is daunting at first. Examining it requires careful attention to experimental design, a considerable amount of preparation, and, in many cases, use of a speech synthesizer. Even so, research on perception can be highly productive. This paper attempts to review the sorts of experiments that have been conducted in the past and to provide guidelines for sociolinguists interested in studying perception, with suggestions for future work. Although perception has been a neglected stepsister of production in sociolinguistics, it, like Cinderella, may have its day soon. Two important factors could—and should—move perception to the forefront of sociophonetic research. One is simply the huge potential for sociolinguistic perception studies because the area has been neglected for so long. The other reason is a more practical one: although perception experiments require extreme attention to detail in the preparation phase, data analysis is generally less time-consuming than in production studies, and this difference may make it more attractive to researchers. The aversion of much of sociolinguistics to perception has been, to some extent, more apparent than real. Many sociolinguistic studies over the past generation, especially instrumental studies, have succeeded in divorcing speech production from speech perception. However, perception issues may play a hidden role in studies that ostensibly address production. The reason is that variationists have not always carefully distinguished production from perception. This tendency is an artifact of the reliance of sociolinguistics on impressionistic transcription. The impressionistic tradition, based on the development of the International Phonetic Alphabet
Social Exclusion and Ethnic Groups: The Challenge to Economics
"... This article discusses the concept of social exclusion with an eye to assessing its utility in the study of ethnic and racial group inequality in the modern nation state. A brief review of the literature and some methodological discussion are offered. The article then examines race-based social excl ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This article discusses the concept of social exclusion with an eye to assessing its utility in the study of ethnic and racial group inequality in the modern nation state. A brief review of the literature and some methodological discussion are offered. The article then examines race-based social exclusion in the United States, showing bow race and ethnicity can inhibit the full participation of individuals in a society’s economic life. The concept of social capital-the role of nonmarket relations in aiding or impeding investments in human skills-is stressed. The article concludes with a discussion of the legitimacy of race-based remedies for the problem of exclusion. The concept of social exclusion has gained wide currency in recent years. But is this concept useful in studying racial and ethnic inequality? Social divisions between racial and ethnic groups-along economic, cultural, and political lines-are a central feature of public life throughout the world. The problem spans geographic and political boundaries and reflects universal social dynamics. Accordingly, much can be learned from comparing such tensions across national boundaries. Inequality and conflict between groups entail not just economic but also,
TRANSMISSION AND DIFFUSION
"... The transmission of linguistic change within a speech community is characterized by incrementation within a faithfully reproduced pattern characteristic of the family tree model, while diffusion across communities shows weakening of the original pattern and a loss of structural features. It is propo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The transmission of linguistic change within a speech community is characterized by incrementation within a faithfully reproduced pattern characteristic of the family tree model, while diffusion across communities shows weakening of the original pattern and a loss of structural features. It is proposed that this is the result of the difference between the learning abilities of children and adults. Evidence is drawn from two studies of geographic diffusion. (1) Structural constraints are lost in the diffusion of the New York city pattern for tensing short-a to four other communities: northern New Jersey, Albany, Cincinnati and New Orleans. (2) The spread of the Northern Cities Shift from Chicago to Saint Louis is found to represent the borrowing of individual sound changes, rather than the diffusion of the structural pattern as a whole.
On the boundaries of linguistic competence: Matched-guise experiments as evidence of knowledge of grammar
- Lingua, Special Issue on Data in Syntax, Semantics, and Pragmatics 1579–1598
"... According to the standard definition, linguistic competence, the object of study of generative ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
According to the standard definition, linguistic competence, the object of study of generative
The C-ORAL-ROM CORPUS A Multilingual Resource of Spontaneous Speech for Romance Languages.
"... The C-ORAL-ROM project has delivered a multilingual corpus of spontaneous speech for the main romance languages (Italian, French, Portuguese and Spanish). The collection aims to represent the variety of speech acts performed in everyday language and to enable the description of prosodic and syntacti ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The C-ORAL-ROM project has delivered a multilingual corpus of spontaneous speech for the main romance languages (Italian, French, Portuguese and Spanish). The collection aims to represent the variety of speech acts performed in everyday language and to enable the description of prosodic and syntactic structures in the four romance languages. Sampling criteria are defined in a corpus design scheme. C-ORAL-ROM adopts two different sampling strategies, one for the formal and one for the informal part: While a set of typical domains of application is selected to document the formal use of language, the informal part documents speech variation using parameters referring to the event’s structure (dialogue vs. monologue) and the sociological domain of use (family-private vs public). The four romance corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are assumed to be the more relevant cues for the identification of relevant linguistic domains in spontaneous speech (utterances). Relations with other concurrent criteria are discussed. The multimedia storage of the C-ORAL-ROM corpus is based on this principle; each textual string ending with a terminal break is aligned, through the Win Pitch speech software, to its acoustic counterpart, generating the data base of all utterances.

