Results 11  20
of
333
A Survey of Kernels for Structured Data
, 2003
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realwor ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much ‘realworld’ data, however, is structured – it has no natural representation in a single table. Usually, to apply kernel methods to ‘realworld’ data, extensive preprocessing is performed to embed the data into a real vector space and thus in a single table. This survey describes several approaches of defining positive definite kernels on structured instances directly.
Parsing the Wall Street Journal using a LexicalFunctional Grammar and Discriminative Estimation Techniques
 IN PROCEEDINGS OF THE 40TH MEETING OF THE ACL
, 2002
"... We present a stochastic parsing system consisting of a LexicalFunctional Grammar (LFG), a constraintbased parser and a stochastic disambiguation model. We report on the results of applying this system to parsing the UPenn Wall Street Journal (WSJ) treebank. The model combines full and parti ..."
Abstract

Cited by 141 (12 self)
 Add to MetaCart
(Show Context)
We present a stochastic parsing system consisting of a LexicalFunctional Grammar (LFG), a constraintbased parser and a stochastic disambiguation model. We report on the results of applying this system to parsing the UPenn Wall Street Journal (WSJ) treebank. The model combines full and partial parsing techniques to reach full grammar coverage on unseen data. The treebank annotations are used to provide partially labeled data for discriminative statistical estimation using exponential models. Disambiguation performance is evaluated by measuring matches of predicateargument relations on two distinct test sets. On a gold standard of manually annotated fstructures for a subset of the WSJ treebank, this evaluation reaches 79% Fscore. An evaluation on a gold standard of dependency relations for Brown corpus data achieves 76% Fscore.
A kernel between sets of vectors
 In International Conference on Machine Learning
, 2003
"... In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples ..."
Abstract

Cited by 130 (8 self)
 Add to MetaCart
(Show Context)
In various application domains, including image recognition, it is natural to represent each example as a set of vectors. With a base kernel we can implicitly map these vectors to a Hilbert space and fit a Gaussian distribution to the whole set using Kernel PCA. We define our kernel between examples as Bhattacharyya’s measure of affinity between such Gaussians. The resulting kernel is computable in closed form and enjoys many favorable properties, including graceful behavior under transformations, potentially justifying the vector set representation even in cases when more conventional representations also exist. 1.
Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis
 Journal of Machine Learning Research
, 2007
"... Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a ..."
Abstract

Cited by 123 (11 self)
 Add to MetaCart
(Show Context)
Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to nonlinear dimensionality reduction scenarios by applying the kernel trick.
Fast Kernels for String and Tree Matching
, 2004
"... Introduction Many problems in machine learning require a data classification algorithm to work with a set of discrete objects. Common examples include biological sequence analysis where data is represented as strings (Durbin et al., 1998) and Natural Language Processing (NLP) where the data is give ..."
Abstract

Cited by 110 (7 self)
 Add to MetaCart
(Show Context)
Introduction Many problems in machine learning require a data classification algorithm to work with a set of discrete objects. Common examples include biological sequence analysis where data is represented as strings (Durbin et al., 1998) and Natural Language Processing (NLP) where the data is given in the form of a string combined with a parse tree (Collins and Du#y, 2001) or an annotated sequence (Altun et al., 2003). In order to apply kernel methods one defines a measure of similarity between discrete structures via a feature map # : X F. Here X is the set of discrete structures (eg. the set of all parse trees of a language) and F is a Hilbert space. Since #(x) F we can define a kernel by evaluating the scalar products k(x, x # ) = ##(x), #(x # )# (1.1) where x, x # X. The success of a kernel method employing k depends both on the faithful representation of discrete data and an e#cient means of computing k. Recent research e#ort has focussed on defining meaningful ker
Question classification using support vector machines
 In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
, 2003
"... Question classification is very important for question answering. This paper presents our research work on automatic question classification through machine learning approaches. We have experimented with five machine learning algorithms: Nearest ..."
Abstract

Cited by 109 (1 self)
 Add to MetaCart
(Show Context)
Question classification is very important for question answering. This paper presents our research work on automatic question classification through machine learning approaches. We have experimented with five machine learning algorithms: Nearest
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
Fast Methods for Kernelbased Text Analysis
, 2003
"... Kernelbased learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. Th ..."
Abstract

Cited by 92 (1 self)
 Add to MetaCart
Kernelbased learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernelbased text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to largescale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernelbased classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernelbased classifiers.
Ranking Algorithms for NamedEntity Extraction: Boosting and the Voted Perceptron
, 2002
"... We describe algorithms that rerank the top N hypotheses from a maximumentropy tagger, the application being namedentity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms g ..."
Abstract

Cited by 87 (2 self)
 Add to MetaCart
(Show Context)
We describe algorithms that rerank the top N hypotheses from a maximumentropy tagger, the application being namedentity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximumentropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.
Maximum margin semisupervised learning for structured variables
 Advances in Neural Information Processing Systems 18
, 2005
"... Abstract Many realworld classification problems involve the prediction ofmultiple interdependent variables forming some structural dependency. Recent progress in machine learning has mainly focused onsupervised classification of such structured variables. In this paper, we investigate structured c ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
Abstract Many realworld classification problems involve the prediction ofmultiple interdependent variables forming some structural dependency. Recent progress in machine learning has mainly focused onsupervised classification of such structured variables. In this paper, we investigate structured classification in a semisupervised setting.We present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and wederive a maximummargin formulation of semisupervised learning for structured variables. Unlike transductive algorithms, our formulation naturally extends to new test points.