Results 1 -
7 of
7
A Maximum Entropy Approach to Identifying Sentence Boundaries
- In Proceedings of the Fifth Conference on Applied Natural Language Processing
, 1997
"... We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lex ..."
Abstract
-
Cited by 145 (3 self)
- Add to MetaCart
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and ! as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Romanalphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.
Document Structure
- COMPUTATIONAL LINGUISTICS
, 2003
"... ... document structure can be seen as an extension of Nunberg's `text-grammar'; it is also closely related to `logical' mark-up in languages like HTML and LATEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
... document structure can be seen as an extension of Nunberg's `text-grammar'; it is also closely related to `logical' mark-up in languages like HTML and LATEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined more cleanly.
Resolving Attachment and Clause Boundary Ambiguities for Simplifying Relative Clause Constructs
, 2002
"... says [51-year-old Cathy Tinsall] from [South had [five children]. [The suicide note] included [lurid references] to [the economy] run under [the influence] of [Herr Pohl], might stop [a British government] from running [its own economic policy]. ACL Student ResearchWorkshop, 10 July 2002 -- How I ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
says [51-year-old Cathy Tinsall] from [South had [five children]. [The suicide note] included [lurid references] to [the economy] run under [the influence] of [Herr Pohl], might stop [a British government] from running [its own economic policy]. ACL Student ResearchWorkshop, 10 July 2002 -- How Important is this Problem? In the PennWall Street Journal Treebank (Marcus et al., 1993): 19% clauses 24% clauses are preceded by complex noun phrases having the Prep ACL Student ResearchWorkshop, 10 July 2002 -- Clause Attachment Why not use a parser? sentences in need of simplification don't come through a parser very well. Applications require speed Non-restrictive relative clauses are increasingly being treated the attachment decisions to anaphora resolution ACL Student ResearchWorkshop, 10 July 2002 -- Example The board is dominated heirs late John T. Dorrance Jr., controlled about 58% of Campbell's stock. (T/leta_s (S/np_vp (NP/det_n The_AT (N1/n board_NN1)) (V/be_ppart/-
Shallow vs. Deep Techniques for Handling Linguistic Constraints and Optimisations
- In Proceedings of the KI-99 Workshop on May I Speak Freely: Between Templates and Free Choice in Natural Language Generation
, 1999
"... An important aspect of many nlg systems is ensuring that all generated texts obey linguistic constraints and are (near-)optimal under linguistic quality measures. Where they are possible, deep techniques can automate the enforcement of linguistic constraints and optimisations. In contrast, shallo ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
An important aspect of many nlg systems is ensuring that all generated texts obey linguistic constraints and are (near-)optimal under linguistic quality measures. Where they are possible, deep techniques can automate the enforcement of linguistic constraints and optimisations. In contrast, shallow techniques require developers to explicitly enforce constraints and optimisations. Deep techniques therefore offer the potential of improving system robustness and decreasing development time. Unfortunately, deep techniques cannot be used for many types of optimisations and constraints because of gaps in our understanding of linguistic phenomena, or because the necessary software would be very expensive to create. This discussion is illustrated by examining where deep and shallow techniques are used in the stop system, which produces personalised smoking cessation leaflets. 1 Introduction Applied Natural Language Generation (nlg) systems should be robust, that is they should produ...
Generating textual diagrams and diagrammatic texts,. Cooperative Multimodal
, 2001
"... Abstract. There are obvious ways in which text and diagrams within a document should be coordinated: for instance, the placement of a diagram might influence the wording of the text. However, there is a more subtle interaction between text and diagrams, which has emerged from work on generating tech ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. There are obvious ways in which text and diagrams within a document should be coordinated: for instance, the placement of a diagram might influence the wording of the text. However, there is a more subtle interaction between text and diagrams, which has emerged from work on generating technical documents that make extensive use of layout. Constituents that would normally be classified as textual may contain diagrammatic features (e.g., when multiple indenting is used); conversely, non-pictorial diagrams usually contain short strings of text (e.g., labels within boxes). We argue that text and diagrams really lie on a continuum, and that for generating documents of this kind we need a descriptive framework that combines linguistic and graphical features in thesamerepresentation. 1
Using an Rhetorical Representation to Generate a Variety of Pragmatically Congruent Texts
, 2000
"... In order for a text planner to produce all the possible pragmatically congruent texts and only these, we distinguish between abstract and concrete rhetorical representations of a text. We discuss these representations and present our methodology for exploring the mappings from the underlying ..."
Abstract
- Add to MetaCart
In order for a text planner to produce all the possible pragmatically congruent texts and only these, we distinguish between abstract and concrete rhetorical representations of a text. We discuss these representations and present our methodology for exploring the mappings from the underlying message to the actual surface discourse. 1
Layout Annotation in a Corpus of Patient Information Leaflets
- In Proceedings of the Language Resources and Evaluation Conference (LREC
, 2000
"... We discuss the problems and issues that arised during the development of a procedure for annotating layout in a corpus of Patient Information Leaflets. We show how the genre of the corpus as well as the aim of the annotation influenced the annotation scheme. We also describe the automatic annotation ..."
Abstract
- Add to MetaCart
We discuss the problems and issues that arised during the development of a procedure for annotating layout in a corpus of Patient Information Leaflets. We show how the genre of the corpus as well as the aim of the annotation influenced the annotation scheme. We also describe the automatic annotation procedure.

