Results 11 - 20
of
20
Statistical Language Processing based on Self-Organising Word Classification
, 1994
"... An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting class ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags --- unique n-bit numbers the most significant bit-patterns of which incorporate class information. Therefore, access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of two natural languages, from the phonemic to the semantic level. The system has been favourably compared --- directly and indirectly --- with other word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by structural...
A Review of Statistical Language Processing Techniques
- Artificial Intelligence Review
, 1995
"... We present a review of some recently developed techniques in the field of natural language processing. This area has witnessed a confluence of approaches which are inspired by theories from linguistics and those which are inspired by theories from information theory: statistical language models are ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a review of some recently developed techniques in the field of natural language processing. This area has witnessed a confluence of approaches which are inspired by theories from linguistics and those which are inspired by theories from information theory: statistical language models are becoming more linguistically sophisticated and the models of language used by linguists are incorporating stochastic techniques to help resolve ambiguities. We include a discussion about the underlying similarities between some of these systems and mention two approaches to the evaluation of statistical language processing systems. 1 Introduction Within the last decade, a great deal of attention has been paid to techniques for processing large natural language copora. The purpose of much of this activity has been to refine computational models of language so that the performance of various technical applications can be improved (e.g. speech recognisers [67], speech synthesisers [32], optica...
Structural Tags, Annealing and Automatic Word Classification
- Artificial Intelligence and the Simulation of Behaviour Quarterly
, 1994
"... We describe an automatic word classification system which uses a locally optimal annealing algorithm and average class mutual information. A new word-class representation, the structural tag is introduced and its advantages for use in statistical language modelling are presented. A summary of some r ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We describe an automatic word classification system which uses a locally optimal annealing algorithm and average class mutual information. A new word-class representation, the structural tag is introduced and its advantages for use in statistical language modelling are presented. A summary of some results with the one million word lob corpus is given; the algorithm is also shown to discover the vowel-consonant distinction and displays an ability to cluster words syntactically in a Latin corpus. Finally, a comparison is made between the current classification system and several alternative systems, which shows that the current system performs tolerably well. 1 Introduction This paper contains a description of some work on an automatic word classification system which uses a technique similar to annealing [1]. The automatic acquisition of word classes corresponds to the paradagmatic component [5] of the syntagmatic-paradagmatic bootstrapping problem [19]. The best of the recent classifi...
Discussions at the data border: from generalised hypertext to structural computing
- Journal of Network and Computer Applications
, 2003
"... Structural Computing grew from the trend in hypertext research towards generalised systems, it asserts the primacy of structure over data. As a philosophy it has been compared to Structuralism in anthropology and linguistics and has given birth to a new trend in systems design known as Multiple Open ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Structural Computing grew from the trend in hypertext research towards generalised systems, it asserts the primacy of structure over data. As a philosophy it has been compared to Structuralism in anthropology and linguistics and has given birth to a new trend in systems design known as Multiple Open Services (MOS). The Fundamental Open Hypermedia Model (FOHM) is an alternative approach to Generalised Hypertext that views the various Hypertext domains as continuous rather than discrete. Its relationships to Structural Computing, Structuralism and MOS have never been fully explored. This paper examines these relationships. We explore how FOHM might be implemented in MOS environments and describe the Data Border, the point where Structure meets Data. We then use this to explore how FOHM and Generalised Hypermedia are related to Structuralism and Structural Computing. 1
I am not a number: I am a free variable
- In Haskell workshop
, 2004
"... In this paper, we show how to manipulate syntax with binding using a mixed representation of names for free variables (with respect to the task in hand) and de Bruijn indices [dB72] for bound variables. By doing so, we retain the advantages of both representations: naming supports easy, arithmetic-f ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we show how to manipulate syntax with binding using a mixed representation of names for free variables (with respect to the task in hand) and de Bruijn indices [dB72] for bound variables. By doing so, we retain the advantages of both representations: naming supports easy, arithmetic-free manipulation of terms; de Bruijn indices eliminate the need for α-conversion. Further, we have ensure that not only the user but also the implementation need never deal with de Bruijn indices, except within key basic operations. Moreover, we give a representation for names which readily supports a power structure naturally reflecting the structure of the implementation. Name choice is safe and straightforward. Our technology combines easily with an approach to syntax manipulation inspired by Huet’s ‘zippers’[Hue97]. Without the technology in this paper, we could not have implemented Epigram [McB04]. Our example—constructing inductive elimination operators for datatype families—is but one of many where it proves invaluable. Prologue In conversation, we like to have names for the people we’re talking about. If we had to say things like ‘the person three to the left of me ’ rather than ‘Fred’, things would get complicated whenever anyone went to the lavatory. You don’t need to have formalized the strengthening property for Pure Type Systems [MP99] to appreciate this basic phenomenon of social interaction. It is in the company of strangers that more primitive pointing-based modes of reference acquire a useful rôle as a way of indicating unambiguously an individual with no socially agreed name. Even so, once a stranger enters the context of the conversation, he or she typically acquires a name. What this name is and who chooses it depends on the power relationships between those involved, as we learned in the playground at school. Moreover, if we are having a conversation about hypothetical individuals—say, Alice, Bob and Unscrupulous Charlie—we have a tendency to name them locally to the discussion. We do not worry about whether Unscrupulous Charlie might actually turn out to be called Shameless David whenever he turns up. That is, we exploit naming locally to assist the construction of explanations which apply to individuals regardless of what they are called. 1 1
Web-based Multimedia Support for Distributed Cooperative Software Engineering
- In Proceedings, International Symposium on Multimedia Software Engineering
, 2000
"... The Tatami project is building a system to support software engineering over the internet, exploiting recent advances in web technology, interface design, and specification. Our effort to improve the usability of such systems led us into algebraic semiotics, while our effort to develop better formal ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The Tatami project is building a system to support software engineering over the internet, exploiting recent advances in web technology, interface design, and specification. Our effort to improve the usability of such systems led us into algebraic semiotics, while our effort to develop better formal methods for distributed concurrent systems led us into hidden algebra. We discuss the Tatami system design, especially user interface issues, and sketch an extension of algebraic semiotics for interface dynamics. 1 Introduction The Tatami project has pursued three main goals: 1. explore novel multimedia interface design principles, for easing the use of complex interactive systems; 2. build and use a generic distributed environment for cooperative work; and 3. verify distributed concurrent software. We discuss these goals in turn. The first is motivated by the difficulties many practicing engineers have with formal methods tools. We have taken theorem provers as a typically difficult c...
An Overview of Multimodal Video Representation for Semantic Analysis
- European Workshop on the Integration of Knowledge, Semantics and Digital Media Technologies (EWIMT 2005), IEE
, 2005
"... This paper gives an overview of approaches to video representation targeting semantic analysis for content-based indexing and retrieval. It highlights the major achievements of the existing methodologies and sheds new light to the challenges that are still unsolved. The problem of adaptive represent ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper gives an overview of approaches to video representation targeting semantic analysis for content-based indexing and retrieval. It highlights the major achievements of the existing methodologies and sheds new light to the challenges that are still unsolved. The problem of adaptive representation of digital multimedia is critically assessed and some novel ideas are presented. In addition, the concept of video multimodality is reevaluated and redefined in order to introduce the modalities like editing technique. An extensive literature survey on the topics involved is given.
Tossing Algebraic Flowers down the Great Divide
- In People and Ideas in Theoretical Computer Science
, 1999
"... Data Types and Algebraic Semantics The history of programming languages, and to a large extent of software engineering as a whole, can be seen as a succession of ever more powerful abstraction mechanisms. The first stored program computers were programmed in binary, which soon gave way to assembly l ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Data Types and Algebraic Semantics The history of programming languages, and to a large extent of software engineering as a whole, can be seen as a succession of ever more powerful abstraction mechanisms. The first stored program computers were programmed in binary, which soon gave way to assembly languages that allowed symbolic codes for operations and addresses. fortran began the spread of "high level" programming languages, though at the time it was strongly opposed by many assembly programmers; important features that developed later include blocks, recursive procedures, flexible types, classes, inheritance, modules, and genericity. Without going into the philosophical problems raised by abstraction (which in view of the discussion of realism in Section 4 may be considerable), it seems clear that the mathematics used to describe programming concepts should in general get more abstract as the programming concepts get more abstract. Nevertheless, there has been great resistance to u...
A Survey on Multimodal Video Representation for Semantic Retrieval
"... Abstract- This paper surveys the approaches to video representation, focusing on semantic analysis for content-based indexing and retrieval. A problem of adaptive representation of digital multimedia is critically assessed and some novel ideas are presented. Furthermore, the concept of video multimo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract- This paper surveys the approaches to video representation, focusing on semantic analysis for content-based indexing and retrieval. A problem of adaptive representation of digital multimedia is critically assessed and some novel ideas are presented. Furthermore, the concept of video multimodality is reevaluated and redefined in order to introduce modalities such as editing technique or affect to the audience.

