Results 1 - 10
of
423
Retrieving Collocations from Text: Xtract
- Computational Linguistics
, 1993
"... Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of wri ..."
Abstract
-
Cited by 229 (1 self)
- Add to MetaCart
Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to retrieve various types of collocations from the analysis of large samples of textual data. These techniques automatically produce large numbers of collocations along with statistical figures intended to reflect the relevance of the associations. However, noue of these techniques provides functional information along with the collocation. Also, the results produced often contained improper word associations reflecting some spurious aspect of the training corpus that did not stand for true collocations. In this paper, we describe a set of techniques based on statistical methods for retrieving and identifying collocations from large textual corpora. These techniques produce a wide range of collocations and are based on some original filtering methods that allow the production of richer and higher-precision output. These techniques have been implemented and resulted in a lexicographic tool, Xtract. The techniques are described and some results are presented on a 10 million-word corpus of stock market news reports. A lexicographic evaluation of Xtract as a collocation retrieval tool has been made, and the estimated precision of Xtract is 80%.
The English noun phrase in its sentential aspect
, 1987
"... This dissertation is a defense of the hypothesis that the noun phrase is headed by afunctional element (i.e., \non-lexical " category) D, identi ed with the determiner. In this way, the structure of the noun phrase parallels that of the sentence, which is headed by In (ection), under assumption ..."
Abstract
-
Cited by 193 (4 self)
- Add to MetaCart
This dissertation is a defense of the hypothesis that the noun phrase is headed by afunctional element (i.e., \non-lexical " category) D, identi ed with the determiner. In this way, the structure of the noun phrase parallels that of the sentence, which is headed by In (ection), under assumptions now standard within the Government-Binding (GB) framework. The central empirical problem addressed is the question of the proper analysis of the so-called \Poss-ing " gerund in English. This construction possesses simultaneously many properties of sentences, and many properties of noun phrases. The problem of capturing this dual aspect of the Possing construction is heightened by current restrictive views of X-bar theory, which, in particular, rule out the obvious structure for Poss-ing, [NP NP VPing], by virtue of its exocentricity. Consideration of languages in which nouns, even the most basic concrete nouns, show agreement (AGR) with their possessors, points to an analysis
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis
- In Proceedings of HLT-EMNLP
, 2005
"... This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large sub ..."
Abstract
-
Cited by 129 (7 self)
- Add to MetaCart
This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline. 1
English Relative Clause Constructions
- JOURNAL OF LINGUISTICS
, 1997
"... This paper sketches a grammar of English relative clause constructions (including infinitival and reduced relatives) based on the notions of construction type and type constraints. Generalizations about dependency relations and clausal functions are factored into distinct dimensions contributing con ..."
Abstract
-
Cited by 125 (9 self)
- Add to MetaCart
This paper sketches a grammar of English relative clause constructions (including infinitival and reduced relatives) based on the notions of construction type and type constraints. Generalizations about dependency relations and clausal functions are factored into distinct dimensions contributing constraints to specific construction types in a multiple inheritance type hierarchy. The grammar presented here provides an account of extraction, pied piping and relative clause `stacking' without appeal to transformational operations, transderivational competition, or invisible (`empty') categories of any kind.
Revision-Based Generation of Natural Language Summaries Providing Historical Background -- Corpus-Based Analysis, Design, Implementation and Evaluation
, 1994
"... Automatically summarizing vast amounts of on-line quantitative data with a short natural language paragraph has a wide range of real-world applications. However, this specific task raises a number of difficult issues that are quite distinct from the generic task of language generation: conciseness, ..."
Abstract
-
Cited by 100 (6 self)
- Add to MetaCart
Automatically summarizing vast amounts of on-line quantitative data with a short natural language paragraph has a wide range of real-world applications. However, this specific task raises a number of difficult issues that are quite distinct from the generic task of language generation: conciseness, complex sentences, floating concepts, historical background, paraphrasing power and implicit content. In this thesis, I address these specific issues by proposing a new generation model in which a first pass builds a draft containing only the essential new facts to report and a second pass incrementally revises this draft to opportunistically add as many background facts as can fit within the space limit. This model requires a new type of linguistic knowledge: revision operations, which specifyies the various ways a draft can...
Annotating expressions of opinions and emotions in language. Language Resources and Evaluation
- Language Resources and Evaluation (formerly Computers and the Humanities
, 2005
"... Abstract. This paper describes a corpus annotation project to study issues in the manual annotation of opinions, emotions, sentiments, speculations, evaluations and other private states in language. The resulting corpus annotation scheme is described, as well as examples of its use. In addition, the ..."
Abstract
-
Cited by 90 (13 self)
- Add to MetaCart
Abstract. This paper describes a corpus annotation project to study issues in the manual annotation of opinions, emotions, sentiments, speculations, evaluations and other private states in language. The resulting corpus annotation scheme is described, as well as examples of its use. In addition, the manual annotation process and the results of an inter-annotator agreement study on a 10,000-sentence corpus of articles drawn from the world press are presented.
Learning Subjective Nouns Using Extraction Pattern Bootstrapping
, 2003
"... We explore the idea of creating a subjectivity classifier that uses lists of subjective nouns learned by bootstrapping algorithms. The goal of our research is to develop a system that can distinguish subjective sentences from objective sentences. First, we use two bootstrapping algorithms that ..."
Abstract
-
Cited by 89 (5 self)
- Add to MetaCart
We explore the idea of creating a subjectivity classifier that uses lists of subjective nouns learned by bootstrapping algorithms. The goal of our research is to develop a system that can distinguish subjective sentences from objective sentences. First, we use two bootstrapping algorithms that exploit extraction patterns to learn sets of subjective nouns. Then we train a Naive Bayes classifier using the subjective nouns, discourse features, and subjectivity clues identified in prior research. The bootstrapping algorithms learned over 1000 subjective nouns, and the subjectivity classifier performed well, achieving 77% recall with 81% precision.
Part-of-Speech Tagging and Partial Parsing
- Corpus-Based Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on

