Results 1 -
9 of
9
How To Detect Grammatical Errors In A Text Without Parsing It
"... The Constituent Likelihood Auwmatic Word-tagging System (CLAWS) was originally designed for the low-level grammatical analysis of the million-word LOB Corpus of English text samples. CLAWS does not attempt a full parse, but uses a first-order Markov model of language to assign word-class labels to w ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The Constituent Likelihood Auwmatic Word-tagging System (CLAWS) was originally designed for the low-level grammatical analysis of the million-word LOB Corpus of English text samples. CLAWS does not attempt a full parse, but uses a first-order Markov model of language to assign word-class labels to words. CLAWS can be modified to detect grammatical errors, essentially by flagging unlikely word-class transitions in the input text. This may seem to be an intuitively iraplausible and theoretically inadequate model of natural language syntax, but nevertheless it can successfully pinpoint most grammatical errors in a text. Several modifications to CLAWS have been explol. The resulting system cannot detect all errors in typad documents; but then neither do far more complex systems, which attempt a full parse, requiring much greater computation.
Automatic Extraction of Tagset Mappings from Parallel-Annotated Corpora
, 1995
"... Several research projects around the world are building grammatically analysed corpora; that is, collections of text annotated with part-of-speech wordtags and syntax trees. However, projects have used quite different wordtagging and parsing schemes. Developers of corpora adhere to a variety of comp ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Several research projects around the world are building grammatically analysed corpora; that is, collections of text annotated with part-of-speech wordtags and syntax trees. However, projects have used quite different wordtagging and parsing schemes. Developers of corpora adhere to a variety of competing models or theories of grammar and parsing, with the effect of restricting the accessibility of their respective corpora, and the potential for collation into a single fully parsed corpus. In view of this heterogeneity, we have begun to investigate and develop methods of automatically mapping between the annotation schemes of the most widely known corpora, thus assessing their differences and improving their reusability. Annotating a single corpus with the different schemes allows for comparisons and will provide a rich testbed for automatic parsers. Collation of all the included corpora into a single large annotated corpus will provide a more detailed language model to be developed for...
Is there anybody out there?: The detection of intelligent and generic language-like features
- JOURNAL OF THE BRITISH INTERPLANETARY SOCIETY
"... The authors present an overview of their language-detection research to date, along with considerations for further research. The research focuses on the unique structure of communication, seeking to identify whether a given signal has features within it that display intelligence or language-like ch ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The authors present an overview of their language-detection research to date, along with considerations for further research. The research focuses on the unique structure of communication, seeking to identify whether a given signal has features within it that display intelligence or language-like characteristics, and comparing this with current methods used in searches for extra-terrestrial intelligence, in particular the SETI (Search for Extra Terrestrial Intelligence) Institute’s Project Phoenix. Project Phoenix looks for signals within a pre-defined bandwidth, on the basis that if they occur it would indicate a source of intelligence outside our own. In this active research area, the reported research looks beyond this for patterns in a signal which should indicate if intelligence is present by applying formulated algorithms and using tailor-made software which will sense if similar structures exist. The objective is therefore to investigate algorithms that will accomplish this goal. The research reported concentrates on ascertaining whether inter-species communication displays generic attributes that distinguish it from other sources, such as music and white noise. First contact may come from eavesdropping on radio broadcasts of their own natural language. 1.
Unsupervised Grammar Inference Systems for Natural Language
, 2002
"... In recent years there have been significant advances in the field of Unsupervised Grammar Inference (UGI) for Natural Languages such as English or Dutch. This paper presents a broad range of UGI implementations, where we can begin to see how the theory has been put in to practise. Several mature sys ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In recent years there have been significant advances in the field of Unsupervised Grammar Inference (UGI) for Natural Languages such as English or Dutch. This paper presents a broad range of UGI implementations, where we can begin to see how the theory has been put in to practise. Several mature systems are emerging, built using complex models and capable of deriving natural language grammatical phenomena. The range of systems is classified into: models based on Categorial Grammar (GraSp, CLL, EMILE); Memory Based Learning models (FAMBL, RISE); Evolutionary computing models (ILM, LAgts); and string-pattern searches (ABL, GB). An objectively measurable statistical comparison of performance Of the systems reviewed is not yet feasible. However, their merits and shortfalls are discussed, as well as a look at what the future has in store for UGI.
Automatic Acquisition of Word Classification using Distributional Analysis of Content Words with Respect to Function Words
, 2002
"... This project describes a method which can automatically infer word classification. Previous systems designed to assign parts-of-speech to words sought the use of training data or were built upon rules devised by experts in linguistics. The report details the use of an unsupervised approach that can ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This project describes a method which can automatically infer word classification. Previous systems designed to assign parts-of-speech to words sought the use of training data or were built upon rules devised by experts in linguistics. The report details the use of an unsupervised approach that can reduce significantly the reliance on prior linguistic intuition. The study looks in to how words behave relative to the function words. As these are the most common words, there is a great deal of information that can be attained. It was possible to analyse how the content words from a given body of text were distributed with respect to the function words. This information could be used as a profile, and therefore content words with a similar profile against the function words could be assumed to be of similar word class. Agglomerative hierarchical clustering techniques were applied to partition words into different clusters. Words that were deemed similar were grouped together, and thus, each cluster should contain words that posses the same part-of-speech. This project performed many experiments to investigate how the many factors affected the overall clustering performance, in order to find the optimal parameters. The results report an accuracy of 87% when performed on the LOB corpus. Experiments were also carried out with an alternative Spanish corpus and the clustering accuracy achieved 85%. Semantic clustering was also observed indicating the effectiveness of the described approach for the task of automatically acquiring word classification.
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems
, 2003
"... The evaluation of grammar inference systems is clearly a non-trivial task, as it is possible to have more than one correct grammar for a given language. The `looks good to me' approach, carried out by computational linguists analysing their own grammar inference system results, has prevailed for man ..."
Abstract
- Add to MetaCart
The evaluation of grammar inference systems is clearly a non-trivial task, as it is possible to have more than one correct grammar for a given language. The `looks good to me' approach, carried out by computational linguists analysing their own grammar inference system results, has prevailed for many years. This paper explores why this method has been so popular, in terms of its strengths, and also why it is no longer adequate as a reliable means to measuring performance. Corpus based methods, that can be performed automatically, are investigated to see how they can meet the needs of this difficult problem.
CORPUS LINGUISTICS AND THE DESIGN OF A RESPONSE MESSAGE
, 2001
"... Most research related to SETI, the Search for Extra-Terrestrial Intelligence, is focussed on techniques for detection of possible incoming signals from extraterrestrial intelligent sources, and algorithms for analysis of these signals to identify intelligent language-like characteristics. However, a ..."
Abstract
- Add to MetaCart
Most research related to SETI, the Search for Extra-Terrestrial Intelligence, is focussed on techniques for detection of possible incoming signals from extraterrestrial intelligent sources, and algorithms for analysis of these signals to identify intelligent language-like characteristics. However, another issue for research and debate is the nature of our response, should a signal arrive and be detected. The design of potentially the most significant communicative act in history should not be decided solely by astrophysicists; the Corpus Linguistics
aConCorde: Towards an open-source, extendable concordancer for Arabic
"... There is, currently, a surge of activity surrounding Arabic corpus linguistics. As the number of available Arabic corpora continues to grow, there is an increasing need for robust tools that can process this data, whether for research or teaching. One such tool that is useful for both of these purpo ..."
Abstract
- Add to MetaCart
There is, currently, a surge of activity surrounding Arabic corpus linguistics. As the number of available Arabic corpora continues to grow, there is an increasing need for robust tools that can process this data, whether for research or teaching. One such tool that is useful for both of these purposes is the concordancer – a simple tool for displaying a specified target word in its context. However, obtaining one that can reliably cope with the Arabic language had proved difficult. Also, there was a desire to add some novel features to the standard concordancer to enhance its usefulness within the classroom – easy-to-use root- and stem-based concordance and integration to corpus clustering algorithms are two examples. Therefore, aConCorde was created to provide such a tool to the community. 1.
Grammatical Inference and Corpus Linguistics
, 2009
"... The candidate confirms that the work submitted is his/her own and that appropriate credit has been given where reference has been made to the work of others.- ii-Acknowledgements I acknowledge the support of members of the Language research group, particularly Latifa Al-Sulaiti and Bayan Abu Shawar. ..."
Abstract
- Add to MetaCart
The candidate confirms that the work submitted is his/her own and that appropriate credit has been given where reference has been made to the work of others.- ii-Acknowledgements I acknowledge the support of members of the Language research group, particularly Latifa Al-Sulaiti and Bayan Abu Shawar. I am grateful for all they have taught me. We had many fun times during our time in Leeds and it’s been a pleasure to make such ever-lasting friendships. I’d also like to extend my thanks to the School of Computing. I’ve enjoyed my time here tremendously; the staff, postgrads and students are first-class. This thesis is dedicated to my supervisor, Eric Atwell – a man of unlimited generosity, encouragement and patience. I’m not sure if I would have achieved a fraction of what I have – either academically or professionally – if it wasn’t for him. My thanks also to all my friends and family who supported me during my studies.- iii-This thesis looks at ways to bring together two research fields: Grammatical Inference and Corpus Linguistics. Grammatical Inference is research on using Unsupervised Machine Learning to extract grammatical descriptions from a corpus or text dataset, and Corpus Linguistics is all about using corpora in linguistic research; so on the face of it the two fields should be related. The thesis starts with a review of grammatical Inference research, and an overview of Corpus Linguistics research as represented at the International Conference on Corpus Linguistics. Part 1 presents a review of Unsupervised Grammar Inference Systems for Natural Language. Part 2 presents a snapshot of current research topics in Corpus Linguistics, as represented at the International Conference on Corpus Linguistics, and reviews the extent to which Grammar Inference has been used in Corpus Linguistics. It transpires that GI researchers focus on development of novel algorithms and have little or no interest in standardisation of corpus resources; and Corpus Linguists are largely unaware of GI as a potential tool for their research. To introduce the two research communities to each other, this thesis presents an application of Corpus Linguistics principles to GI

