Results 1  10
of
337
ADE4: a multivariate analysis and graphical display software
 Stat. Comput
, 1997
"... e searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE4 very easy for nonspecialists in statistics, data analysis or computer science. Keywords: Multivar ..."
Abstract

Cited by 109 (12 self)
 Add to MetaCart
(Show Context)
e searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE4 very easy for nonspecialists in statistics, data analysis or computer science. Keywords: Multivariate analysis, principal component analysis, correspondence analysis, instrumental variables, canonical correspondence analysis, partial least squares regression, coinertia analysis, graphics, multivariate graphics, interactive graphics, Macintosh, HyperCard, Windows 95 1. Introduction ADE4 is a multivariate analysis and graphical display software for Apple Macintosh and Windows 95 microcomputers. It is made up of several standalone applications, called modules, that feature a wide range of multivariate analysis methods, from simple onetable analysis to threeway table analysis and twotable coupling methods. It also provides many possibilitie
A Framework for Robust Subspace Learning
 International Journal of Computer Vision
, 2003
"... Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multilinear models. These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications. ..."
Abstract

Cited by 106 (6 self)
 Add to MetaCart
(Show Context)
Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multilinear models. These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications.
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
"... Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The kmeans algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
(Show Context)
Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The kmeans algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining because data sets in data mining often contain categorical values. In this paper we present an algorithm, called kmodes, to extend the kmeans paradigm to categorical domains. We introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes, and use a frequency based method to update modes in the clustering process to minimise the clustering cost function. Tested with the well known soybean disease data set the algorithm has demonstrated a very good classification performance. Experiments on a very large health insurance data set consisting of half a million records and 34 categorical attributes show that the algorithm is scalable in terms of both the number of clusters and the number of records.
Using Correspondence Analysis to Combine Classifiers
 Machine Learning
, 1998
"... . Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm repeatedly to different versions of the training data, or by ap ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
. Several effective methods have been developed recently for improving predictive performance by generating and combining multiple learned models. The general approach is to create a set of learned models either by applying an algorithm repeatedly to different versions of the training data, or by applying different learning algorithms to the same data. The predictions of the models are then combined according to a voting scheme. This paper focuses on the task of combining the predictions of a set of learned models. The method described uses the strategies of stacking and Correspondence Analysis to model the relationship between the learning examples and their classification by a collection of learned models. A nearest neighbor method is then applied within the resulting representation to classify previously unseen examples. The new algorithm does not perform worse than, and frequently performs significantly better than other combining techniques on a suite of data sets. Keywords: Clas...
Recognizing Subjectivity: A Case Study of Manual Tagging
 Natural Language Engineering
, 1999
"... In this paper, we describe a case study of a sentencelevel categorization in which tagging instructions are developed and used by four judges to classify clauses from the Wall Street Journal as either subjective or objective. Agreement among the four judges is analyzed, and, based on that analysis, ..."
Abstract

Cited by 49 (8 self)
 Add to MetaCart
(Show Context)
In this paper, we describe a case study of a sentencelevel categorization in which tagging instructions are developed and used by four judges to classify clauses from the Wall Street Journal as either subjective or objective. Agreement among the four judges is analyzed, and, based on that analysis, each clause is given a final classification. To provide empirical support for the classifications, correlations are assessed in the data between the subjective category and a basic semantic class posited by Quirk et al. (1985).
Pursuing failure: The distribution of program failures in a profile space
 In Proceedings of the 8th European Software Engineering Conference held jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering
, 2001
"... Observationbased testing calls for analyzing profiles of executions induced by potential test cases, in order to select a subset of executions to be checked for conformance to requirements. A family of techniques for selecting such a subset is evaluated experimentally. These techniques employ autom ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
(Show Context)
Observationbased testing calls for analyzing profiles of executions induced by potential test cases, in order to select a subset of executions to be checked for conformance to requirements. A family of techniques for selecting such a subset is evaluated experimentally. These techniques employ automatic cluster analysis to partition executions, and they use various sampling techniques to select executions from clusters. The experimental results support the hypothesis that with appropriate profiling, failures often have unusual profiles that are revealed by cluster analysis. The results also suggest that failures often form small clusters or chains in sparselypopulated areas of the profile space. A form of adaptive sampling called failurepursuit sampling is proposed for revealing failures in such regions, and this sampling method is evaluated experimentally. The results suggest that failurepursuit sampling is effective.
Euclidean embedding of cooccurrence data
 Advances in Neural Information Processing Systems 17
, 2005
"... Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single comm ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their cooccurrence statistics. Thejoint distributions are modeled as exponentials of Euclidean distances in the lowdimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1 Introduction Embeddings of objects in a lowdimensional space are an important tool in unsupervisedlearning and in preprocessing data for supervised learning algorithms. They are especially valuable for exploratory data analysis and visualization by providing easily interpretablerepresentations of the relationships among objects. Most current embedding techniques build low dimensional mappings that preserve certain relationships among objects and differ in the relationships they choose to preserve, which range from pairwise distances in multidimensional scaling (MDS) [4] to neighborhood structure in locally linear embedding[12]. All these methods operate on objects of a single type endowed with a measure of similarity or dissimilarity. However, realworld data often involve objects of several very different types without anatural measure of similarity. For example, typical web pages or scientific papers contain
Distributed weightedmultidimensional scaling for node localization in sensor networks
 ACM Trans. Sens. Netw
, 2006
"... Accurate, distributed localization algorithms are needed for a wide variety of wireless sensor network applications. This article introduces a scalable, distributed weightedmultidimensional scaling (dwMDS) algorithm that adaptively emphasizes the most accurate range measurements and naturally acc ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
(Show Context)
Accurate, distributed localization algorithms are needed for a wide variety of wireless sensor network applications. This article introduces a scalable, distributed weightedmultidimensional scaling (dwMDS) algorithm that adaptively emphasizes the most accurate range measurements and naturally accounts for communication constraints within the sensor network. Each node adaptively chooses a neighborhood of sensors, updates its position estimate by minimizing a local cost function and then passes this update to neighboring sensors. Derived bounds on communication requirements provide insight on the energy efficiency of the proposed distributed method versus a centralized approach. For received signalstrength (RSS) based range measurements, we demonstrate via simulation that location estimates are nearly unbiased with variance close to the CramérRao lower bound. Further, RSS and timeofarrival (TOA) channel measurements are used to demonstrate performance as good as the centralized maximumlikelihood estimator (MLE) in a realworld sensor network.