Results 1 -
5 of
5
Semantics of paragraphs
- Computational Linguistics
, 1991
"... We present a computational theory of the paragraph. Within it we formally define coherence, give semantics to the adversative conjunction "but " and to the Gricean maxim of quantity, and present some new methods for anaphora resolution. The theory precisely characterizes the relationship b ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We present a computational theory of the paragraph. Within it we formally define coherence, give semantics to the adversative conjunction "but " and to the Gricean maxim of quantity, and present some new methods for anaphora resolution. The theory precisely characterizes the relationship between the content of the paragraph and background knowledge needed for its understanding. This is achieved by introducing a new type of logical theory consisting of an object level, corresponding to the content of the paragraph, a referential level, which is a new logical level encoding background knowledge, and a metalevel containing constraints on models of discourse (e.g. a formal version of Gricean maxims). We propose also specific mechanisms of interaction between these levels, resembling both classical provability and abduction. Paragraphs are then represented by a class of structures called p-models. 1.
Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation
- Proceedings of the 3rd Workshop on Building and Using Comparable Corpora. Applications of Parallel and Comparable Corpora in Natural Language Engineering and the Humanities
, 2010
"... Lack of sufficient linguistic resources and parallel corpora for many languages and domains currently is one of the major obstacles to further advancement of automated translation. The solution proposed in this paper is to exploit the fact that non-parallel bi- or multilingual text resources are muc ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Lack of sufficient linguistic resources and parallel corpora for many languages and domains currently is one of the major obstacles to further advancement of automated translation. The solution proposed in this paper is to exploit the fact that non-parallel bi- or multilingual text resources are much more widely available than parallel translation data. This position paper presents previous research in this field and research plans of the ACCURAT project. Its goal is to find, analyze and evaluate novel methods that exploit comparable corpora in order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains. 1.
Creating general-purpose corpora using automated search engine queries
"... The Internet is a natural source of linguistic data, providing an abundance of texts of various types in a large number of languages. These texts are already in electronic form suitable for corpus studies, either as downloadable pages, or as a resource to be searched using search engines. On the oth ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The Internet is a natural source of linguistic data, providing an abundance of texts of various types in a large number of languages. These texts are already in electronic form suitable for corpus studies, either as downloadable pages, or as a resource to be searched using search engines. On the other hand, large representative corpora of the size of the British National Corpus, BNC (Aston and Burnard, 1998) exist for
Building an International Corpus of Arabic (ICA): Progress of Compilation Stage
"... This paper focuses on three axes. The first axis gives a survey of the importance of corpora in language studies e.g. lexicography, grammar, semantics, Natural Language Processing and other areas. The second axis demonstrates how the Arabic language lacks textual resources, such as corpora and tools ..."
Abstract
- Add to MetaCart
This paper focuses on three axes. The first axis gives a survey of the importance of corpora in language studies e.g. lexicography, grammar, semantics, Natural Language Processing and other areas. The second axis demonstrates how the Arabic language lacks textual resources, such as corpora and tools for corpus analysis and the effected of this lack on the quality of Arabic language applications. There are rarely successful trials in compiling Arabic corpora, therefore, the third axis presents the technical design of the International Corpus of Arabic (ICA), a newly established representative corpus of Arabic that is intended to cover the Arabic language as being used all over the Arab world. The corpus is planned to support various Arabic studies that depends on authentic data, in addition to building Arabic Natural Language Processing Applications. 1
Supporting Research Environment for Swedish and Turkish
"... Language resources such as corpora consisting of annotated texts and utterances have been shown to be a central component in language studies and natural language processing as they, when ..."
Abstract
- Add to MetaCart
Language resources such as corpora consisting of annotated texts and utterances have been shown to be a central component in language studies and natural language processing as they, when

