Results 1  10
of
24
Quantization
 IEEE TRANS. INFORM. THEORY
, 1998
"... The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analogtodigital conversion was first recognized during the early development of pulsecode modula ..."
Abstract

Cited by 782 (12 self)
 Add to MetaCart
The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analogtodigital conversion was first recognized during the early development of pulsecode modulation systems, especially in the 1948 paper of Oliver, Pierce, and Shannon. Also in 1948, Bennett published the first highresolution analysis of quantization and an exact analysis of quantization noise for Gaussian processes, and Shannon published the beginnings of rate distortion theory, which would provide a theory for quantization as analogtodigital conversion and as data compression. Beginning with these three papers of fifty years ago, we trace the history of quantization from its origins through this decade, and we survey the fundamentals of the theory and many of the popular and promising techniques for quantization.
Clustering Using a Similarity Measure Based on Shared Nearest Neighbors
 IEEE Transactions on Computers
, 1973
"... AbstractA nonparametric clustering technique incorporating the concept of similarity based on the sharing of near neighbors is presented. In addition to being an essentially paraliel approach, the computational elegance of the method is such that the scheme is applicable to a wide class of practi ..."
Abstract

Cited by 157 (0 self)
 Add to MetaCart
AbstractA nonparametric clustering technique incorporating the concept of similarity based on the sharing of near neighbors is presented. In addition to being an essentially paraliel approach, the computational elegance of the method is such that the scheme is applicable to a wide class of practical problems involving large sample size and high dimensionality. No attempt is made to show how a priori problem knowledge can be introduced into the procedure. Index TermsClustering, nonparametric, pattern recognition, shared near neighbors, similarity measure. I.
Iterate: A conceptual clustering algorithm for data mining
 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS
, 1998
"... The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ITERATE, employs: (i) a data ordering scheme and (ii) an iterative redistribution operator to produce maximally cohesive and distinct clusters. Cohesion or intraclass similarity is measured in terms of the match between individual objects and their assigned cluster prototype. Distinctness or interclass dissimilarity is measured by an average of the variance of the distribution matchbetween clusters. We demonstrate that interpretability, from a problem solving viewpoint, is addressed by theintra and interclass measures. Empirical results demonstrate the properties of the discovery algorithm, and its applications to problem solving.
Iterate: A conceptual clustering method for knowledge discovery in databases
 In Braunschweig, B., & Day, R. (Eds.), Innovative Applications of Artificial Intelligence in the Oil and Gas Industry
, 1995
"... ..."
(Show Context)
Exploratory Analysis of Marketing Data: Trees vs. Regression
"... This article compares the predictive ability of models developed by two different statistical methods, tree analysis and regression analysis. Each was used in an exploratory study to develop a model to make predictions for a specific marketing situation. The Statistical Methods The regression model ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
This article compares the predictive ability of models developed by two different statistical methods, tree analysis and regression analysis. Each was used in an exploratory study to develop a model to make predictions for a specific marketing situation. The Statistical Methods The regression model is well known and no description is provided here. Tree analysis, however, is less well known. To add to the confusion, it has been labeled in a number o £ ways – e.g., multiple classification, multilevel crosstabulations, or configurational analysis. Whatever the names, the basic idea is to classify objects in cells so that the objects in the cells are similar to one another yet different from the objects in other cells. Similarity is judged by the score on a given dependent or criterion variable (which differentiates this method from cluster or factor analysis, where the similarity is based only upon scores on a set of descriptive variables). Tree analysis is an extension to n variables of the simple crossclassification approach. Consider the following example: a researcher is studying the factors which determine whether a family owns two or more automobiles. He finds that income may be used to classify respondents. Illustrative results for his sample are provided in Figure 1. He then decides that the number of drivers in the family may also be important for highincome families.
An Application of Econometric Models to International Marketing
 Journal of Marketing Research
, 1970
"... This article considers the various ways in which firms might estimate market size by country, with particular consideration given to the use of econometric models. The article aims at three related questions. First, what has happened over the past thirty years in the use of econometric models for me ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
This article considers the various ways in which firms might estimate market size by country, with particular consideration given to the use of econometric models. The article aims at three related questions. First, what has happened over the past thirty years in the use of econometric models for measuring geographical markets? Second, is it possible to demonstrate that currently available econometric techniques lead to "improved" measurement of geographical marketsand, in particular, for international markets? Finally, have advances in applied econometric analysis over the past thirty years led to any demonstrable progress in measuring gem graphical markets? Methods For Measuring Sales Rates By Country Trade and Production Data
Extending Iterate Conceptual Clustering Scheme In Dealing With Numeric Data
, 1995
"... ion and Interpretation Clustering Meaningful Clusters with Interpretations Figure 1: The Key Steps in Conceptual Clustering Systems grouping the data objects into clusters or groups based on the similarity of properties among the objects. The goal is to derive more general concepts that describe the ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
ion and Interpretation Clustering Meaningful Clusters with Interpretations Figure 1: The Key Steps in Conceptual Clustering Systems grouping the data objects into clusters or groups based on the similarity of properties among the objects. The goal is to derive more general concepts that describe the problem solving task. The task of interpretation involves determining whether the induced concepts are useful for the problem solving tasks that the user is interested in. This task involves the examination of the intentional description of a class in the context of background knowledge about the domain. Overview of the Clustering Methods Traditional approaches to cluster analysis (numerical taxonomy) represent the objects to be clustered as points in a multidimensional metric space and adopt distance metrics, such as Euclidean and Mahalanobis measures, to define dissimilarity between objects. Cluster analysis methods take on one of two different forms: 1. parametric methods: they assume t...
Face Recognition
"... Abstract. The objective of DARPA’s Human ID at a Distance (HID) program “is to develop automated biometric identification technologies to detect, recognize and identify humans at great distances. ” While nominally intended for security applications, if deployed widely, such technologies could become ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The objective of DARPA’s Human ID at a Distance (HID) program “is to develop automated biometric identification technologies to detect, recognize and identify humans at great distances. ” While nominally intended for security applications, if deployed widely, such technologies could become an enormous privacy threat, making practical the automatic surveillance of individuals on a grand scale. Face recognition, as the HID technology most rapidly approaching maturity, deserves immediate research attention in order to understand its strengths and limitations, with an objective of reliably foiling it when it is used inappropriately. This paper is a status report for a research program designed to achieve this objective within a larger goal of similarly defeating all HID technologies. 1
MASKS: Maintaining Anonymity by Sequestering Key Statistics
 University of Pennsylvania
, 2009
"... Highresolution digital cameras are becoming everlarger parts of our daily lives, whether as part of closedcircuit surveillance systems or as part of portable digital devices that many of us carry around with us. Combining the broadening reach of these cameras with automatic face recognition techn ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Highresolution digital cameras are becoming everlarger parts of our daily lives, whether as part of closedcircuit surveillance systems or as part of portable digital devices that many of us carry around with us. Combining the broadening reach of these cameras with automatic face recognition technology creates a sensor network that is ripe for abuse: our every action could be recorded and tagged with our identities, the date, and our location as if we each had an investigator tasked only with keeping each of us under constant surveillance. Adding the continually falling cost of data storage to this mix, and we are left with a situation where the privacy abuses don't need to happen today: the stored imagery can be mined and remined forever, while the sophistication of automatic analysis continues to grow. The MASKS project takes the first steps toward addressing this problem. If we would like to be able to deidentify faces before the images are shared with others, we cannot do so with ad hoc techniques applied identically to all faces. Since each face is unique, the method of disguising that face must be equally unique. In order to hide or reduce those critical identifying characteristics, we are delivering the following foundational contributions toward characterizing the nature of facial information: We have created a new posecontrolled, highresolution database of facial images. The most prominent anatomical markers on each face have been marked for position and shape, establishing a new gold standard for facial segmentation. A parameterized model of the diversity of our subject population was built based on statistical analysis of the
Streaming kmeans on WellClusterable Data
"... One of the central problems in dataanalysis is kmeans clustering. In recent years, considerable attention in the literature addressed the streaming variant of this problem, culminating in a series of results (HarPeled and Mazumdar; Frahling and Sohler; Frahling, Monemizadeh, and Sohler; Chen) tha ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
One of the central problems in dataanalysis is kmeans clustering. In recent years, considerable attention in the literature addressed the streaming variant of this problem, culminating in a series of results (HarPeled and Mazumdar; Frahling and Sohler; Frahling, Monemizadeh, and Sohler; Chen) that produced a (1 + ε)approximation for kmeans clustering in the streaming setting. Unfortunately, since optimizing the kmeans objective is MaxSNP hard, all algorithms that achieve a (1 + ε)approximation must take time exponential in k unless P=NP. Thus, to avoid exponential dependence on k, some additional assumptions must be made to guarantee high quality approximation and polynomial running time. A recent paper of Ostrovsky, Rabani, Schulman, and Swamy (FOCS 2006) introduced the very natural assumption of data separability: the assumption closely reflects how kmeans is used in practice and allowed the authors to create a highquality approximation for kmeans clustering in the nonstreaming setting with polynomial running time even for large values of k. Their work left open a natural and important question: are similar results possible in a streaming setting? This is the question we answer in this paper, albeit using substantially different techniques. We show a nearoptimal streaming approximation algorithm for kmeans in highdimensional Euclidean space with sublinear memory and a single pass, under the same data separability assumption. Our algorithm offers significant improvements in both space and run