Results 1 - 10
of
15
Groups of Parts and Their Balances in Compositional Data Analysis
- Mathematical Geology
, 2005
"... Amalgamation of parts of a composition has been extensively used as a technique of analysis to achieve reduced dimension, as was discussed during the CoDaWork’03 meeting (Girona, Spain, 2003). It was shown to be a non-linear operation in the simplex that does not preserve distances under perturbatio ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
Amalgamation of parts of a composition has been extensively used as a technique of analysis to achieve reduced dimension, as was discussed during the CoDaWork’03 meeting (Girona, Spain, 2003). It was shown to be a non-linear operation in the simplex that does not preserve distances under perturbation. The discussion motivated the introduction in the present paper of concepts such as group of parts, balance between groups, and sequential binary partition, which are intended to provide tools of compositional data analysis for dimension reduction. Key concepts underlying this development are the established tools of subcomposition, coordinates in an orthogonal basis of the simplex, balancing element and, in general, the Aitchison geometry in the simplex. Main new results are: a method to analyze grouped parts of a compositional vector through the adequate coordinates in an ad hoc orthonormal basis; and the study of balances of groups of parts (inter-group analysis) as an orthogonal projection similar to that used in standard subcompositional analysis (intra-group analysis). A simulated example compares results when testing equal centers of two populations using amalgamated parts and balances; it shows that, in certain circumstances, results from both analysis can disagree.
Outlier detection for compositional data using robust methods
- Mathematical Geosciences
, 2008
"... Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in case of compositional data. For the family of logratio transformations (additive, centered and isometric logratio transformation) it is shown that the MDs based on classical estimates are invariant to ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in case of compositional data. For the family of logratio transformations (additive, centered and isometric logratio transformation) it is shown that the MDs based on classical estimates are invariant to these transformations, and that the MDs based on affine equivariant estimators of location and covariance are the same for additive and isometric logratio transformation. Moreover, for 3-dimensional compositions the data structure can be visualized by contour lines, and in higher dimension the MDs of closed and opened data give an impression of the multivariate data behavior. KEY WORDS: Mahalanobis distance, robust statistics, ternary diagram, multivariate outliers, logratio transformation.
Compositional data and their analysis: An introduction
- Geological Society, London, Special Publications
, 2006
"... Abstract: Compositional data are those which contain only relative information. They are parts of some whole. In most cases they are recorded as closed data, i.e. data summing to a constant, such as 100 %- whole-rock geochemical data being classic examples. Compositional data have important and part ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract: Compositional data are those which contain only relative information. They are parts of some whole. In most cases they are recorded as closed data, i.e. data summing to a constant, such as 100 %- whole-rock geochemical data being classic examples. Compositional data have important and particular properties hat preclude the application f standard statistical techniques on such data in raw form. Standard techniques are designed to be used with data that are free to range from-oo to +oo. Compositional data are always positive and range only from 0 to 100, or any other constant, when given in closed form. If one component increases, others must, perforce, decrease, whether or not there is a genetic link between these components. This means that the results of standard statistical analysis of the relationships between raw components orparts in a compositional dataset are clouded by spurious effects. Although such analyses may give appar-ently interpretable r sults, they are, at best, approximations and need to be treated with consider-able circumspection. The methods outlined in this volume are based on the premise that it is the relative variation of components which s of interest, rather than absolute variation. Log-ratios of components provide the natural means of studying compositional data. In this contribution the basic terms and operations are introduced using simple numerical examples to illustrate their com-
Ground Metric Learning
"... Optimal transport distances have been used for more than a decade in machine learning to compare histograms of features. They have one parameter: the ground metric, which can be any metric between the features themselves. As is the case for all parameterized dis-tances, optimal transport distances c ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Optimal transport distances have been used for more than a decade in machine learning to compare histograms of features. They have one parameter: the ground metric, which can be any metric between the features themselves. As is the case for all parameterized dis-tances, optimal transport distances can only prove useful in practice when this parameter is carefully chosen. To date, the only option available to practitioners to set the ground met-ric parameter was to rely on a priori knowledge of the features, which limited considerably the scope of application of optimal transport distances. We propose to lift this limitation and consider instead algorithms that can learn the ground metric using only a training set of labeled histograms. We call this approach ground metric learning. We formulate the problem of learning the ground metric as the minimization of the difference of two convex polyhedral functions over a convex set of metric matrices. We follow the presentation of our algorithms with promising experimental results which show that this approach is useful both for retrieval and binary/multiclass classification tasks.
Application of discriminant function analysis and change-point problem in dating volcanic ashes
- In proceedings of Compositional Data Analysis Workshop
, 2005
"... Abstract The application of Discriminant function analysis (DFA) is not a new idea in the study of tephrochrology. In this paper, DFA is applied to compositional datasets of two different types of tephras from Mountain Ruapehu in New Zealand and Mountain Rainier in USA. The canonical variables from ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract The application of Discriminant function analysis (DFA) is not a new idea in the study of tephrochrology. In this paper, DFA is applied to compositional datasets of two different types of tephras from Mountain Ruapehu in New Zealand and Mountain Rainier in USA. The canonical variables from the analysis are further investigated with a statistical methodology of change-point problems in order to gain a better understanding of the change in compositional pattern over time. Finally, a special case of segmented regression has been proposed to model both the time of change and the change in pattern. This model can be used to estimate the age for the unknown tephras using Bayesian statistical calibration.
Sequential allocation and balancing prognostic factors. Clinics
"... In controlled clinical trials, each of several prognostic factors should be balanced across the trial arms. Traditional restricted randomization may be proved inadequate especially with small sample sizes. In psychiatric disorders such as obsessive compulsive disorder (OCD), small trials prevail. Th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In controlled clinical trials, each of several prognostic factors should be balanced across the trial arms. Traditional restricted randomization may be proved inadequate especially with small sample sizes. In psychiatric disorders such as obsessive compulsive disorder (OCD), small trials prevail. Therefore, proce-dures to minimize the chance of imbalance between treatment arms are advisable. This paper describes a minimization procedure specifically designed for a clinical trial that evaluates treatment efficacy for OCD patients. Aitchison’s compositional distance was used to calculate vectors for each possibility of allocation in a covariate adaptive method. Two different procedures were designed to allocate patients in small blocks or sequentially one-by-one. Partial results of this allocation procedure as well as simulated ones are shown. In the clinical trial for which this procedure was developed, the balancing between treatment arms was achieved successfully. Simulations of results considering different arrival order of patients showed that most of the patients are allocated in a different treatment arm if arrival order is modified. Results show that a random factor is maintained with the random arrival order of patients. This specific procedure allows the use of a large number of prognostic factors for the allocation decision
Compositional Data in Biomedical Research
"... Modern methods of compositional data analysis are not well known in biomedical re-search. Moreover, there appear to be few mathematical and statistical researchers working on compositional biomedical problems. Like the earth and environmental sci-ences, biomedicine has many problems in which the rel ..."
Abstract
- Add to MetaCart
Modern methods of compositional data analysis are not well known in biomedical re-search. Moreover, there appear to be few mathematical and statistical researchers working on compositional biomedical problems. Like the earth and environmental sci-ences, biomedicine has many problems in which the relevant scientific information is encoded in the relative abundance of key species or categories. I introduce three prob-lems in cancer research in which analysis of compositions plays an important role. The problems involve 1) the classification of serum proteomic profiles for early detection of lung cancer, 2) inference of the relative amounts of different tissue types in a diagnos-tic tumor biopsy, and 3) the subcellular localization of the BRCA1 protein, and it’s role in breast cancer patient prognosis. For each of these problems I outline a partial solution. However, none of these problems is “solved”. I attempt to identify areas in which additional statistical development is needed with the hope of encouraging more compositional data analysts to become involved in biomedical research.
CLINICS 2009;64(6):511-8 CLINICAL SCIENCE
"... Sequential allocation to balance prognostic factors in a psychiatric clinical trial ..."
Abstract
- Add to MetaCart
(Show Context)
Sequential allocation to balance prognostic factors in a psychiatric clinical trial
Scoring from contests
, 2012
"... This article presents a new method of scoring alternatives from “contest” data. The model is a generalization of the method of paired comparison to accommodate comparisons between arbitrarily sized sets of alternatives in which outcomes are any division of a fixed prize. Our approach is also applica ..."
Abstract
- Add to MetaCart
(Show Context)
This article presents a new method of scoring alternatives from “contest” data. The model is a generalization of the method of paired comparison to accommodate comparisons between arbitrarily sized sets of alternatives in which outcomes are any division of a fixed prize. Our approach is also applicable to contests between varying quantities of alternatives. We prove that under a reasonable condition on the comparability of alternatives, there exists a unique collection of scores that produces unbiased estimates of the overall performance of each alternative and satisfies a well-known axiom regarding choice probabilities. We apply the method to several canonical problems in which varying choice sets and continuous outcomes may create problems for standard scoring methods. These problems include measuring centrality in network data and the scoring of political candidates via a “feeling thermometer.” In the latter case, we also use the method to uncover and solve a potential difficulty with common methods of rescaling thermometer data to account for issues of interpersonal comparability.