Web Data Mining
, 2006
Vol. 53: Soft Computing Approach to Pattern Recognition and Image Processing
Vol. 53: Soft Computing Approach to Pattern Recognition and Image Processing
Topology of Strings: Median String is NPComplete.
 Theoretical Computer Science
, 2000
Given a set of strings, the problem of finding a string that minimises its distance to the set is directly related with problems frequently encountered in areas involving Pattern Recognition or Computational Biology. Based on the Levenshtein (or edit) distance, different definitions of distances between a string and a set of strings can be adopted.
Given a set of strings, the problem of finding a string that minimises its distance to the set is directly related with problems frequently encountered in areas involving Pattern Recognition or Computational Biology. Based on the Levenshtein (or edit) distance, different definitions of distances between a string and a set of strings can be adopted. In particular, if this definition is the sum of the distances to each string of the set, the string that minimises this distance is the (generalised) median string. Finding this string corresponds in Speech Recognition to giving a model for a set of acoustic sequences, and in Computational Biology to constructing an optimal evolutionary tree when the given phylogeny is a star. Only efficient algorithms are known for finding approximate solutions. The results in this paper are combinatorial and negative. We prove that computing the median string corresponds to a NPcomplete decision problems, thus proving that this problem is NPhard.
Use of Median String for Classification
, 2000
A string that minimizes the sum of distances to the strings of a given set is known as (generalized) median string of the set. This concept is important in Pattern Recognition for modelling a (large) set of garbled strings or patterns. The search of such a string is an NPHard problem.
A string that minimizes the sum of distances to the strings of a given set is known as (generalized) median string of the set. This concept is important in Pattern Recognition for modelling a (large) set of garbled strings or patterns. The search of such a string is an NPHard problem and, therefore, no efficient algorithms to compute the median strings can be designed. Recently a greedy approach was proposed to compute an approximate median string of a set of strings. In this work an algorithm is proposed that iteratively improves the approximate solution given above.
Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach
, 2005
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new approaches are needed.
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new approaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the advent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vectorbased, nature of representation, the classes inductively discovered from real data using decisiontheoretic techniques have little meaning outside the statistical framework. This is because decision surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and linguistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisa
Structural Representation of Speech for Phonetic Classification
 In: Proc. 17th ICPR. Volume 3
, 2004
This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition (ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive features which is a linguistically wellmotivated alternative to the "beadsonastring" view of speech that is standard in current ASR systems.
This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition (ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive features which is a linguistically wellmotivated alternative to the "beadsonastring" view of speech that is standard in current ASR systems. We report the promising results of phoneme classification experiments conducted on a standard continuous speech task.
An approximate median search algorithm in nonmetric spaces
, 2001
Given a set of data points and a distance function, the median point is defined as the point (in the set) that minimizes the sum of the distances to the remaining points of the set.
Given a set of data points and a distance function, the median point is defined as the point (in the set) that minimizes the sum of the distances to the remaining points of the set.
Improving classification using median string and NN rules
, 2001
In Pattern Recognition, the concept of (generalized) median string is important for modelling a (large) set of garbled strings or patterns. The search for such a string is a difficult computational problem; thus, only suboptimal approaches of median string can be computed with a reasonable effort.
In Pattern Recognition, the concept of (generalized) median string is important for modelling a (large) set of garbled strings or patterns. The search for such a string is a difficult computational problem; thus, only suboptimal approaches of median string can be computed with a reasonable effort. Recently, a greedy algorithm was proposed to compute an approximate median string of a set of strings using an iterative improvement. In this work, we propose an alternative definition of median string which is introduced into this algorithm. Experiments have been carried out with real data to compare the performances of the NearestNeighbours classifiers based on set median and median strings.These experiments showed that the new definition of median string gives better results than obtained with set median and classical median strings.
An Algorithm for Fast Median Search
, 1997
Searching for a median of a set of patterns is a wellknown technique to model the set or to aid searching for more accurate models such as a generalized median or a kmedian. While medians and generalized medians are (relatively) easy to compute in the case of Euclidean representation spaces, this is unfortunately no longer true when more complex distance measures are to be used.
Searching for a median of a set of patterns is a wellknown technique to model the set or to aid searching for more accurate models such as a generalized median or a kmedian. While medians and generalized medians are (relatively) easy to compute in the case of Euclidean representation spaces, this is unfortunately no longer true when more complex distance measures are to be used, which often happens in practical situations. In these cases, a direct method to perform median search is not feasible for large sets of patterns. In order to cope with this computational problem, an algorithm for median search is proposed which is faster than the direct method in terms of distance computations.