@MISC{Pacheco_applicationsof, author = {Alexia Pacheco}, title = {Applications of Histogram Principal Components Analysis}, year = {} }

Share

OpenURL

Abstract

Abstract. In [8, Rodrı́guez, Diday and Winsberg (2000)], they propose an al-gorithm for Principal Components Analysis when the variables are of histogram type. This algorithm also works if the data table has variables of interval type and histogram type mixed. If all the variables are interval type it produces the same output as the one produced by the algorithm of the Centers Method proposed in [2, Cazes, Chouakria, Diday and Schektman (1997)]. In this paper the effectiveness and the applicability of this method is illustrated with an example using real data of runoff in the Térraba River Basin in Costa Rica gathered by the Costa Rican Institute of Electricity (ICE). 1 The algorithm In this section a summary of the algorithm for Histogram Principal Components Anal-ysis is presented, the details can be read in [8, Rodrı́guez, Diday and Winsberg (2000)]. In this algorithm, instead of representing the histograms in the factorial plane, the Empirical Distribution Function FY defined, in [1, Bock and Diday (2000)], associated with each histogram is represented. Definition 1. Let X = (xij) i=1,2,...,m j=1,2,...,n be a symbolic data table with continuous, in-terval and histogram variables types, and let be k = max{s, where s is the number of modalities of Y j}, j = 1, 2,..., n where Y j is of histogram type3. We define the vector–succession of intervals associated with each cell of X as: 1. if xij = [a, b] then the vector–succession of intervals associated is: x↓ij = [a, b] [a, b]