Results 1 -
4 of
4
An architecture for language processing for scientific texts
- In Proceedings of the 4th UK E-Science All Hands Meeting
, 2006
"... We describe the architecture for language processing adopted on the eScience project ‘Extracting the Science from Scientific Publications ’ (nicknamed SciBorg). In this approach, papers from different sources are first processed to give a common XML format (SciXML). Language processing modules opera ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We describe the architecture for language processing adopted on the eScience project ‘Extracting the Science from Scientific Publications ’ (nicknamed SciBorg). In this approach, papers from different sources are first processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.
Annotating Question Types in Social Q&A Sites
"... Abstract. In all domains, including eHumanities, it is crucial to understand how people seek information and what kinds of questions they ask. In this paper, we present an annotation study of domain-specific questions collected from the current leading social Question and Answer site, namely Yahoo! ..."
Abstract
- Add to MetaCart
Abstract. In all domains, including eHumanities, it is crucial to understand how people seek information and what kinds of questions they ask. In this paper, we present an annotation study of domain-specific questions collected from the current leading social Question and Answer site, namely Yahoo! Answers. We define
WITP Recognizing Citations in Public Comments Arguello, Callan
"... ABSTRACT. Notice and comment rulemaking is central to how U.S. federal agencies craft new regulation. E-rulemaking, the process of soliciting and considering public comments that are submitted electronically, poses a challenge for agencies. The large volume of comments received makes it difficult to ..."
Abstract
- Add to MetaCart
ABSTRACT. Notice and comment rulemaking is central to how U.S. federal agencies craft new regulation. E-rulemaking, the process of soliciting and considering public comments that are submitted electronically, poses a challenge for agencies. The large volume of comments received makes it difficult to distill and address the most substantive concerns of the public. This work attempts to alleviate this burden by applying existing machine learning techniques to the problem of recognizing citation sentences. A citation in this context is defined as a statement in which the author of the public comment references an external source of factual information that is associated with a specific person or organization. The problem is formulated as a binary classification problem: Is a specific person or organization mentioned in a sentence being referenced as an external source of information? We show that our definition of a citation is reproducible by human judges and that citations can be detected using machine learning techniques with some success. Casting this as a machine learning problem requires selecting an appropriate representation of the sentence. Several feature sets are evaluated individually and in combination. Superior results are obtained by combining feature sets. Syntactic features, which characterize the structure of the sentence rather than its content, significantly improve accuracy when combined with other features, but not when used in isolation. Although prediction Jaime Arguello is a Ph.D. student at the Language Technologies Institute at Carnegie Mellon University. His work focuses on text data mining, information retrieval, and natural language processing.
Annotating Underquantification
"... Many noun phrases in text are ambiguously quantified: syntax doesn’t explicitly tell us whether they refer to a single entity or to several, and what portion of the set denoted by the Nbar actually takes part in the event expressed by the verb. We describe this ambiguity phenomenon in terms of under ..."
Abstract
- Add to MetaCart
Many noun phrases in text are ambiguously quantified: syntax doesn’t explicitly tell us whether they refer to a single entity or to several, and what portion of the set denoted by the Nbar actually takes part in the event expressed by the verb. We describe this ambiguity phenomenon in terms of underspecification, or rather underquantification. We attempt to validate the underquantification hypothesis by producing and testing an annotation scheme for quantification resolution, the aim of which is to associate a single quantifier with each noun phrase in our corpus. 1 Quantification resolution We are concerned with ambiguously quantified noun phrases (NPs) and their interpretation, as illustrated by the following examples: 1. Cats are mammals = All cats... 2. Cats have four legs = Most cats... 3. Cats were sleeping by the fire = Some cats... 4. The beans spilt out of the bag = Most/All of the beans... 5. Water was dripping through the ceiling = Some water... We are interested in quantification resolution, that is, the process of giving an ambiguously quantified NP a formalisation which expresses a unique set relation appropriate to the semantics of the utterance. For instance, we wish to arrive at: 6. All cats are mammals. |φ∩ψ | = |φ | where φ is the set of all cats and ψ the set of all mammals. Resolving the quantification value of NPs is important for many NLP tasks. Let us imagine an information extraction system having retrieved the triples ‘cat – is – mammal ’ and ‘cat – chase –

