Results 1 - 10
of
37
Statistical Dependency Analysis with Support Vector Machines
- In Proceedings of IWPT
, 2003
"... In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little wo ..."
Abstract
-
Cited by 83 (0 self)
- Add to MetaCart
In this paper, we propose a method for analyzing word-word dependencies using deterministic bottom-up manner using Support Vector machines. We experimented with dependency trees converted from Penn treebank data, and achieved over 90 % accuracy of word-word dependency. Though the result is little worse than the most up-to-date phrase structure based parsers, it looks satisfactorily accurate considering that our parser uses no information from phrase structures. 1
Learning globally-consistent local distance functions for shape-based image retrieval and classification
- In ICCV
, 2007
"... We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feat ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feature vectors common in object recognition work as a basis for our image-to-image distance functions. Our large-margin formulation for learning the distance functions is similar to formulations used in the machine learning literature on distance metric learning, however we differ in that we learn local distance functions— a different parameterized function for every image of our training set—whereas typically a single global distance function is learned. This was a novel approach first introduced in Frome, Singer, & Malik, NIPS 2006. In that work we learned the local distance functions independently, and the outputs of these functions could not be compared at test time without the use of additional heuristics or training. Here we introduce a different approach that has the advantage that it learns distance functions that are globally consistent in that they can be directly compared for purposes of retrieval and classification. The output of the learning algorithm are weights assigned to the image features, which is intuitively appealing in the computer vision setting: some features are more salient than others, and which are more salient depends on the category, or image, being considered. We train and test using the Caltech 101 object recognition benchmark. Using fifteen training images per category, we achieved a mean recognition rate of 63.2 % and
Extracting relations with integrated information using kernel methods
- In Proceedings of the annual meeting of ACL
, 2005
"... Entity relation detection is a form of information extraction that finds predefined relations between pairs of entities in text. This paper describes a relation detection approach that combines clues from different levels of syntactic processing using kernel methods. Information from three different ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
Entity relation detection is a form of information extraction that finds predefined relations between pairs of entities in text. This paper describes a relation detection approach that combines clues from different levels of syntactic processing using kernel methods. Information from three different levels of processing is considered: tokenization, sentence parsing and deep dependency analysis. Each source of information is represented by kernel functions. Then composite kernels are developed to integrate and extend individual kernels so that processing errors occurring at one level can be overcome by information from other levels. We present an evaluation of these methods on the 2004 ACE relation detection task, using Support Vector Machines, and show that each level of syntactic processing contributes useful information for this task. When evaluated on the official test data, our approach produced very competitive ACE value scores. We also compare the SVM with KNN on different kernels. 1
CONTRAlign: discriminative training for protein sequence alignment
- In: International Conference in Research on Computational Molecular Biology (RECOMB). (2006
, 2006
"... 1 Introduction In comparative structural biology studies, analyzing or predicting protein three-dimensional structure often begins with identifying patterns of amino acid substitution via protein sequence alignment. While the evolutionary informationobtained from alignments can provide insights into ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
1 Introduction In comparative structural biology studies, analyzing or predicting protein three-dimensional structure often begins with identifying patterns of amino acid substitution via protein sequence alignment. While the evolutionary informationobtained from alignments can provide insights into protein structure, constructing accurate alignments may be difficult when proteins share significant struc-tural similarity but little sequence similarity. Indeed, for modern alignment tools, alignment quality drops rapidly when the sequences compared have lower than25 % identity, the "twilight zone " of protein alignment [1].
Frequency sensitive competitive learning for balanced clustering on high-dimensional hyperspheres
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2004
"... Competitive learning mechanisms for clustering in general suffer from poor performance for very high dimensional (> 1000) data because of “curse of dimensionality” effects. In applications such as document clustering, it is customary to normalize the high dimensional input vectors to unit length, a ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Competitive learning mechanisms for clustering in general suffer from poor performance for very high dimensional (> 1000) data because of “curse of dimensionality” effects. In applications such as document clustering, it is customary to normalize the high dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regularkmeans and its soft EM based version,spkmeans tends to generate extremely imbalanced clusters in high dimensional spaces when the desired number of clusters is large (tens or more). In this paper, we first show that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model and in fact it can be considered as a batch mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency sensitive competitive learning variants that are applicable to static data and produced high quality and well balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. We also propose a frequency sensitive algorithm to cluster streaming 1 data. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.
A Hierarchy of Support Vector Machines for Pattern Detection
- Journal of Artificial Intelligence Research
, 2006
"... We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse- ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection.
Kernel pca for similarity invariant shape recognition
- In the Journal of Neurocomputing
, 2006
"... We present in this paper a novel approach for shape description based on kernel principal component analysis (KPCA). The strength of this method resides in the similarity (rotation, translation and particularly scale) invariance of KPCA when using a family of triangular conditionally positive defini ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We present in this paper a novel approach for shape description based on kernel principal component analysis (KPCA). The strength of this method resides in the similarity (rotation, translation and particularly scale) invariance of KPCA when using a family of triangular conditionally positive definite kernels. Beside this invariance, the method provides an effective way to capture non-linearities in shape geometry. A given two-dimensional curve is described using the eigenvalues of the underlying manifold modeled in a high-dimensional Hilbert space. Using Fourier analysis, we will show that this eigenvalue description captures low to high variations of the shape frequencies. Experiments conducted on standard databases including the SQUID, the Swedish and the Smithsonian leaf databases, show that the method is effective in capturing invariance and generalizes well for shape matching and retrieval. Key words:
Authorship Identification for Heterogeneous Documents
, 2002
"... The study of authorship identification in Japanese has for the most part been restricted to literary texts using basic statistical methods. In the present study, authors of mailing list messages are identified using a machine learning technique (Support Vector Machines). In addition, the classifier ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The study of authorship identification in Japanese has for the most part been restricted to literary texts using basic statistical methods. In the present study, authors of mailing list messages are identified using a machine learning technique (Support Vector Machines). In addition, the classifier trained on the mailing list data is applied to identify the author of Web documents in order to investigate performance in authorship identification for more heterogeneous documents. Experimental results show better identification performance when we use the features of not only conventional word N-gram information but also of frequent sequential patterns extracted by a data mining technique (PrefixSpan).
Browsing and sorting digital pictures using automatic image classification and quality analysis
- In Proceedings of HCI International ’07
, 2007
"... Abstract. In this paper we describe a new interface for browsing and sorting of digital pictures. Our approach is two-fold. First we present a new method to automatically identify similar images and rate them based on sharpness and exposure quality of the images. Second we present a zoomable user in ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. In this paper we describe a new interface for browsing and sorting of digital pictures. Our approach is two-fold. First we present a new method to automatically identify similar images and rate them based on sharpness and exposure quality of the images. Second we present a zoomable user interface based on the details-on-demand paradigm enabling users to browse large collections of digital images and select only the best images for further processing or sharing. Key words: Photoware, digital photography, image analysis, similarity measurement, informed browsing, zoomable user interfaces, content based image retrieval. 1
Machine learningbased dependency analyzer for chinese
- In Proc. ICCC
, 2005
"... In this paper, we present a deterministic dependency structure analyzer for Chinese. This analyzer implements two algorithms – Yamada and Nivre algorithms – and two sorts of classifiers – Support Vector Machines and Maximum Entropy models. We compare the performance of these 2x2 combinations. We eva ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper, we present a deterministic dependency structure analyzer for Chinese. This analyzer implements two algorithms – Yamada and Nivre algorithms – and two sorts of classifiers – Support Vector Machines and Maximum Entropy models. We compare the performance of these 2x2 combinations. We evaluate the methods on a dependency tagged corpus derived from the CKIP Treebank corpus. Then, we analyze the errors in the experiments and found that some errors are caused by mistakes of nominal compound analysis. Therefore we adopt a noun phrase chunker to overcome this problem.

