Results 1 - 10
of
25
Collostructions: Investigating the interaction of words and constructions
, 2003
"... This paper introduces an extension of collocational analysis that takes into account grammatical structure and is specifically geared to investigating the interaction of lexemes and the grammatical constructions associated with them. The method is framed in a construction-based approach to langua ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This paper introduces an extension of collocational analysis that takes into account grammatical structure and is specifically geared to investigating the interaction of lexemes and the grammatical constructions associated with them. The method is framed in a construction-based approach to language, i.e. it assumes that grammar consists of signs (form-meaning pairs), and is thus not fundamentally different from the lexicon. The method is applied to linguistic expressions at various levels of abstraction (words, semi-fixed phrases, argument structures, tense, aspect and mood). The method has two main applications: first, to increase the adequacy of grammatical description by providing an objective way of identifying the meaning of a grammatical construction and determining the degree to which particular slots in it prefer or are restricted to a particular set of lexemes; second, to provide data for linguistic theory-building
Covarying Collexemes in the Into-causative
- Empirical and Experimental Methods in Cognitive/Functional Research
, 2004
"... this paper we extend a `single-slot' methodology developed in Stefanowitsch and Gries (2003) to the investigation of potential interactions between two slots and apply it to the into-causative. We show that such interactions exist, i.e. that cause and result predicates `covary' systematically. We th ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
this paper we extend a `single-slot' methodology developed in Stefanowitsch and Gries (2003) to the investigation of potential interactions between two slots and apply it to the into-causative. We show that such interactions exist, i.e. that cause and result predicates `covary' systematically. We then consider two factors influencing this covariation: a cognitive The order of authors is arbitrary. The authors would like to thank Britta Mondorf and Andr Schfer for supplying the raw data from The Guardian used in this study
Gravity counts for the boundaries of collocations
- INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS
, 2004
"... This paper compares several methods (MI, T-score, Dice) for the extraction of collocations and presents a new method called Gravity Counts. The respective methods are evaluated and compared, measuring the combinability and collocability for each pair of words within the moving span of three words in ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper compares several methods (MI, T-score, Dice) for the extraction of collocations and presents a new method called Gravity Counts. The respective methods are evaluated and compared, measuring the combinability and collocability for each pair of words within the moving span of three words in the corpus of “The Times” newspaper for the year 1995. The collocability of words is the basis for detection of the collocational chains, i.e. frequent recurrent uninterrupted strings of word-forms, with clear-cut boundaries, found in the corpus. Collocational chains obtained with the help of different methods are compared and their lexical, grammatical and semantic features discussed.
From lexis to syntax: the use of multi-word units in grammatical description
"... We describe an approach to the description of sentence structures based on a linear model. The sentence is segmented using automatically identified multi-word units from a large corpus; recurrent elements from the corpus are matched up with fragments of the sentence. After positioning the current wo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We describe an approach to the description of sentence structures based on a linear model. The sentence is segmented using automatically identified multi-word units from a large corpus; recurrent elements from the corpus are matched up with fragments of the sentence. After positioning the current work in relation to recent related research we present two sample analyses and discuss the usefulness of this approach to syntactic description and possible applications.
Corpus linguistics and theoretical linguistics A love–hate relationship? Not necessarily…*
"... [I]t is common now to address theoretical issues through the examination of bodies of naturally occurring language use. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
[I]t is common now to address theoretical issues through the examination of bodies of naturally occurring language use.
Semantic prosodies in English and Portuguese: A contrastive study
, 2000
"... : The present study is aimed at describing the semantic prosody of equivalent items in English and Portuguese. Semantic prosody is the connotation conveyed by the regular co-occurrence of lexical items, as revealed by the exploration of a computer-readable corpus. Although there are several studies ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: The present study is aimed at describing the semantic prosody of equivalent items in English and Portuguese. Semantic prosody is the connotation conveyed by the regular co-occurrence of lexical items, as revealed by the exploration of a computer-readable corpus. Although there are several studies dealing with semantic prosody in English, only one has looked at this issue contrasting English and Portuguese. In general, the findings indicate important similarities and differences between the two languages, and point toward inadequacies in contemporary dictionaries. The study was based on the exploration of a Portuguese computer corpus of over 140 million words, one of the largest for Portuguese. The BNC, with 100 million words, was also used in the course of the investigation. The general conclusion of the study presented here is that information on connotation, especially that based on the exploration of corpora, should be part of glossaries and dictionaries, particularly production and bilingual dictionaries, which are tools that translators rely upon regularly. 1.
Validating the Construct of Word in Applied Corpus-based Vocabulary Research: A Critical Survey
"... Corpus-based vocabulary research has had a profound impact on English language education, and there is abundant evidence that this will remain the case for the foreseeable future. Perhaps the greatest challenge of such research is the determination of what constitutes a Word for counting and analysi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Corpus-based vocabulary research has had a profound impact on English language education, and there is abundant evidence that this will remain the case for the foreseeable future. Perhaps the greatest challenge of such research is the determination of what constitutes a Word for counting and analysis purposes. Decisions in this regard have important ramifications not only for the lexical findings themselves, but also for the pedagogical theories and practices that derive from them. This article surveys several fields of study in order to discuss this dilemma, with a particular focus on three problematic areas relating to computer-processed corpora: (a) morphological relationships between words, (b) homonymy and polysemy, and (c) multiword items. The article concludes with recommendations for assessing the validity of the Word construct in applied corpus-based vocabulary research. The influence of corpora and corpus-based research on educational theories and practices is well-established in both first language (L1) and second
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
"... We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving ” and “look forward to Ving”). At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. We present a prototype implementation of the proposed method, EdIt, that can handle a broad range of errors. Promising results are illustrated with three common types of errors in nonnative writing. 1
Riddle posed by computer (6): The Computer Generation of Cryptic Crossword Clues
"... Thesis submitted in fulfilment of requirements for degree of PhD The copyright of this thesis rests with the author. Due acknowledgement must always be made of any quotation or information derived from it. 1 This thesis describes the development of a system (ENIGMA) that generates a wide variety of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Thesis submitted in fulfilment of requirements for degree of PhD The copyright of this thesis rests with the author. Due acknowledgement must always be made of any quotation or information derived from it. 1 This thesis describes the development of a system (ENIGMA) that generates a wide variety of cryptic crossword clues for any given word. A valid cryptic clue must appear to make sense as a fragment of natural language, but it must also be possible for the reader to interpret it using the syntax and semantics of crossword convention and solve the puzzle that it sets. It should also have a fluent surface reading that conflicts with the crossword clue interpretation of the same text. Consider, for example, the following clue for the word THESIS: Article with spies ’ view (6) This clue sets a puzzle in which the (the definite article) must be combined with SIS (Special Intelligence Service) to create the word thesis (a view), but the reader is distracted by the apparent meaning and structure of the natural language fragment when attempting to solve it. I introduce the term Natural Language Creation (NLC) to describe the process through which ENIGMA generates a text with two layers of meaning. It starts with a representation of the puzzle meaning of the clue, and generates both a layered text that communicates this puzzle and a fluent surface meaning for that same text with a shallow, lexically-bound semantic representation. This parallel generation process reflects my intuition of the creative process through which cryptic clues are written, and domain expert commentary on clue-setting. The system first determines all the ways in which a clue might be formed for the input word (which is known as the “light”). For example, a light might be an anagram of some other word, or it might be possible to form the light by embedding one word inside another. There are typically a great many of these clue-forming possibilities, each of which forms a different puzzle reading, and the system constructs a data message for each one and annotates it with lexical, syntactic and semantic information. As the puzzle reading is lexicalised a hybrid language understanding process locates syntactic and semantic fits between the elements of the clue text and constructs the surface reading of the clue as it emerges.

