## Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics (2001)

Venue: | IEEE Transactions on Neural Networks |

Citations: | 48 - 19 self |

### BibTeX

@ARTICLE{Kaski01bankruptcyanalysis,

author = {Samuel Kaski and Janne Sinkkonen and Jaakko Peltonen},

title = {Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics},

journal = {IEEE Transactions on Neural Networks},

year = {2001},

volume = {12},

pages = {936--947}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A Self-Organizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxiliary random variable that reflects what is important in the data. In this paper the variable indicates bankruptcy within the next few years. The conditional density of the auxiliary variable is first estimated, and the change in the estimate resulting from local displacements in the primary data space is measured using the Fisher information matrix. When a Self-Organizing Map is computed in the new metric it still visualizes the data space in a topology-preserving fashion, but represents the (local) directions in which the probability of bankruptcy changes the most.

### Citations

8134 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...riminant analysis and our work will be discussed in more detail in Section II-D). Now j , ji , and j will all be estimated from the data. Formulas for estimating the model with the EM algorithm [16=-=]-=- are presented in Appendix B. D. Related works According to our knowledge the introduced principle is new. Works in which some aspects resemble our approach exist, however. Amari and Wu [17] have augm... |

3259 |
The self-organizing map
- Kohonen
- 1990
(Show Context)
Citation Context ... practical point of view is to develop methods for analyzing and understanding the dierent corporate behavior types and their relation to bankruptcy. In this task, the Self-Organizing Map (SOM) [1], [=-=2]-=- has been found a valuable tool, mainly because of its good visualization capabilities. The present paper introduces a further development of SOM-based data analysis. Our results show that it yields m... |

1161 | Information Theory and Statistics
- Kullback
- 1968
(Show Context)
Citation Context ...n the nancial state x are then those that change the probability of bankruptcy, the distribution p(cjx). A change in distributions can be measured by the Kullback-Leibler divergence D. An old result [7] gives a formula for the local Kullback-Leibler divergence as D(p(cjx)kp(cjx + dx)) = dx T J(x)dx ; (1) where J(x) = E p(cjx) ( @ @x log p(cjx) @ @x log p(cjx) T ) (2) is the Fisher information... |

1116 |
Pattern Recognition and Neural Networks
- Ripley
- 1996
(Show Context)
Citation Context ...nsity p(cjx) of the auxiliary random variable C, conditioned on X. Plenty of alternative methods are available. Many of them have been developed for classication purposes (for reviews see e.g. [12], [=-=13-=-]). Most such methods would typically be suboptimal for our purpose, however, because a good classier optimizes the (sometimes implicit) probability density function (PDF) estimate near the class bord... |

1098 |
Self-organized formation of topologically correct feature maps
- Kohonen
- 1982
(Show Context)
Citation Context ...m the practical point of view is to develop methods for analyzing and understanding the dierent corporate behavior types and their relation to bankruptcy. In this task, the Self-Organizing Map (SOM) [=-=1]-=-, [2] has been found a valuable tool, mainly because of its good visualization capabilities. The present paper introduces a further development of SOM-based data analysis. Our results show that it yie... |

396 | Exploiting generative models in discriminative classiers
- Jaakkola, Haussler
- 1999
(Show Context)
Citation Context .... Y, MONTH 2001 110 chines by making an isotropic change to the metric near the class border. In contrast to this, our metric is non-isotropic and changes the metric everywhere. Jaakkola and Haussler =-=[18-=-] induced a distance measure into a discrete input space using a generative probability model. The crucial dierences are that they do not use external information, and that they do not constrain the m... |

290 | Natural gradient works efficiently in learning
- Amari
- 1998
(Show Context)
Citation Context ...In a Euclidean metric, the update is given by the gradient . The Fisher metric, however, is a Riemannian metric, and steepest descent in a Riemannian metric is given by the so-called natural gradient =-=[25]-=-. Generally, the natural gradient is equal to the conventional gradient multiplied by the representation of the metric tensor (a matrix), inverted. Because in the original coordinate system the metric... |

226 | Differential-Geometrical Methods in Statistics - Amari - 1985 |

206 | Self Organization of a Massive Document Collection
- Kohonen, Kaski, et al.
- 2000
(Show Context)
Citation Context ... with the lightest shade and the probability 0.040 as pure black. amount of computation required for the simple Euclidean metric. Note that there exist several speedup methods for the SOM (see, e.g., =-=[26]-=-). We have not investigated in detail their use with the Fisher metric but many of them are applicable. IV. Application to bankruptcy analysis The method presented in the previous chapters is applied ... |

166 |
Financial ratios and the probabilistic prediction of bankruptcy
- Ohlson
- 1980
(Show Context)
Citation Context ...and co-workers (summarized in [22]), who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. =-=[27], [28], [2-=-9], [30], [31], [32], [33], [34], [35], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly ove... |

154 |
Self-organizing semantic maps
- Ritter, Kohonen
- 1989
(Show Context)
Citation Context ...n, and that they do not constrain the metric to preserve the topology. In some earlier works auxiliary information has been incorporated directly into the representations of the data (see, e.g., [2], =-=[19-=-]; note, however, that the goal in these works is dierent from ours). The auxiliary information can be encoded for example in the 1-out-of-C manner and concatenated to the data vectors x. The main pro... |

151 | Discriminant analysis by Gaussian mixtures
- Hastie, Tibshirani
- 1994
(Show Context)
Citation Context ...aussian mixture When the component densities in the model (7) are chosen to be Gaussians parameterized by their means, the model is equivalent to the Mixture Discriminant Analysis 2 in [14] (cf. also =-=[-=-15]; the relation between mixture discriminant analysis and our work will be discussed in more detail in Section II-D). Now j , ji , and j will all be estimated from the data. Formulas for estimati... |

147 |
Information and the accuracy attainable in the estimation of statistical parameters
- Rao
(Show Context)
Citation Context ...ut space. Note 1: The Fisher information matrix was originally derived for measuring the eect that a change in the model parameters produces on the probability distributions that the models generate [=-=8-=-]. The resulting distance is called (Fisher) information distance or (Fisher) information metric in the information geometry literature (see, e.g., [9], [10], [11]). Here we measure the eect March 7, ... |

90 |
Managerial Applications of Neural Networks: The Case of Bank Failure Predictions
- Tam, Kiang
- 1992
(Show Context)
Citation Context ...rs (summarized in [22]), who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], =-=[29], [30], [3-=-1], [32], [33], [34], [35], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed L... |

79 | S: Clustering based on conditional distributions in an auxiliary space
- Sinkkonen, Kaski
(Show Context)
Citation Context ...raction of the Fisher metric is March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 122 similar to the recently introduced Kullback-Leibler clustering algorithm [40], =-=[4-=-1]. This connection will be detailed in future papers. In summary, we have extended the SOM-based exploratory analyses of the factors aecting bankruptcy risk in dierent kinds of companies by the new l... |

77 |
Geometrical Foundations of Asymptotic Inference
- Kass, Vos
- 1997
(Show Context)
Citation Context ...distributions that the models generate [8]. The resulting distance is called (Fisher) information distance or (Fisher) information metric in the information geometry literature (see, e.g., [9], [10], =-=[11-=-]). Here we measure the eect March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 106 of a change in the location in the primary data space to obtain a metric there. We... |

64 | Improving support vector machine classifiers by modifying kernel functions
- Amari, Wu
- 1999
(Show Context)
Citation Context ...algorithm [16] are presented in Appendix B. D. Related works According to our knowledge the introduced principle is new. Works in which some aspects resemble our approach exist, however. Amari and Wu =-=[17]-=- have augmented support vector maMarch 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 110 chines by making an isotropic change to the metric near the class border. In c... |

64 |
Mutual information maximization: Models of cortical selforganization
- Becker
- 1996
(Show Context)
Citation Context ...to the latter task, whereas LDA usually emphasizes the former. The classical canonical correlation analysis has recently been generalized by replacing the linear combinations with nonlinear functions =-=[23-=-], [24]. Our framework could as well be adapted to the task of nding statistical dependencies between two data sets by replacing the discrete auxiliary random variable with a parametrized set of featu... |

57 |
Bankruptcy prediction using neural networks. Decis Support Syst 11(5): 545–557
- RL, Sharda
- 1994
(Show Context)
Citation Context ...mmarized in [22]), who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], =-=[30], [31], [3-=-2], [33], [34], [35], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. Ho... |

52 |
Differential geometry and statistics
- Murray, Rice
- 1993
(Show Context)
Citation Context ...robability distributions that the models generate [8]. The resulting distance is called (Fisher) information distance or (Fisher) information metric in the information geometry literature (see, e.g., =-=[9]-=-–[11]). Here we measure the effect of a change in the location in the primary data space to obtain a metric there. We will call the resulting metric the Fisher metric and call the approach semisupervi... |

49 | Mutual information in learning feature transformation
- Torkkola, Campbell
- 2000
(Show Context)
Citation Context ...ch is the general denition of feature extraction, have a relation to our method. Automatic methods for optimizing such mappings, for example by maximizing mutual information have been proposed [20], [=-=21-=-]. Unlike in a standard separate feature extraction stage, however, the change of the metric in our method denes a manifold which cannot in general be projected to a Euclidean space of the same or low... |

41 | A methodology for information theoretic feature extraction
- Fisher, Principe
- 1998
(Show Context)
Citation Context ...e, which is the general denition of feature extraction, have a relation to our method. Automatic methods for optimizing such mappings, for example by maximizing mutual information have been proposed [=-=20-=-], [21]. Unlike in a standard separate feature extraction stage, however, the change of the metric in our method denes a manifold which cannot in general be projected to a Euclidean space of the same ... |

26 |
Di erential-Geometrical Methods in Statistics
- Amari
- 1985
(Show Context)
Citation Context ...ility distributions that the models generate [8]. The resulting distance is called (Fisher) information distance or (Fisher) information metric in the information geometry literature (see, e.g., [9], =-=[10-=-], [11]). Here we measure the eect March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 106 of a change in the location in the primary data space to obtain a metric the... |

24 |
Forecasting with Neural Networks: An Application Using Bankruptcy Data
- Fletcher, Goss
- 1993
(Show Context)
Citation Context ...[22]), who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], [30], [31], =-=[32], [33], [3-=-4], [35], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. However, in al... |

21 | Flexible discriminant and mixture models
- Hastie, Tibshirani, et al.
- 1995
(Show Context)
Citation Context ... kernels. C.2 Gaussian mixture When the component densities in the model (7) are chosen to be Gaussians parameterized by their means, the model is equivalent to the Mixture Discriminant Analysis 2 in =-=[-=-14] (cf. also [15]; the relation between mixture discriminant analysis and our work will be discussed in more detail in Section II-D). Now j , ji , and j will all be estimated from the data. Formul... |

21 |
Natural gradient works e¤ciently in learning
- Amari
- 1998
(Show Context)
Citation Context ...ric, the update is given by the gradient @ @m i kx m i k 2 . The Fisher metric, however, is a Riemannian metric, and steepest descent in a Riemannian metric is given by the so-called natural gradient =-=[25]-=-. Generally, the natural gradient is equal to the conventional gradient multiplied by the representation of the metric tensor (a matrix). Because in the original coordinate system the metric tensor is... |

18 |
Corporate collapse: the causes and symptoms
- Argenti
- 1976
(Show Context)
Citation Context ...ankruptcy analysis: trying to understand the dierent corporate behaviors and their relation to the risk of bankruptcy. A very inuential qualitative work in this area has been carried out by Argenti [3=-=7]. One of-=- his observations was that there are several dierent bankruptcy types ("failure trajectories") that dier in their causes, symptoms, and length. Along these lines of thought, a research proje... |

15 |
A comparison of discriminant analysis versus artificial neural networks
- YO, Swales, et al.
- 1993
(Show Context)
Citation Context ...ed in [22]), who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], [30], =-=[31], [32], [3-=-3], [34], [35], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. However,... |

13 |
Predicting bankruptcies with the self-organizing map. Neurocomputing
- Kiviluoto
- 1998
(Show Context)
Citation Context ...ed in a supervised manner. We will apply the new metric to analyze the bankruptcy risk of enterprises on the basis of nancial statements. The setting is similar to that of Kiviluoto and Bergius [4], [=-=5]-=-, [6]. They 1 Even though the mapping is continuous it is not topology preserving since it may be projective. March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 104 h... |

13 |
Neural network performance on bankruptcy classification problem
- Udo
(Show Context)
Citation Context ...pplied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], [30], [31], [32], [33], =-=[34], [35], [3-=-6]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. However, in all cases the ... |

12 |
Multivariate normality and forecasting of business bankruptcy
- Karels, Prakash
- 1987
(Show Context)
Citation Context ...-workers (summarized in [22]), who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], =-=[28], [29], [3-=-0], [31], [32], [33], [34], [35], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperfo... |

10 | Metrics that learn relevance
- Kaski, Sinkkonen
- 2000
(Show Context)
Citation Context ...tly transform the original metric of the input space. The space is locally scaled so that the new (local) distances will measure the change of the auxiliary information (for a preliminary account see =-=[3]-=-). Proximity relations or, loosely speaking 1 , topology of the input space is still retained. Note that by contrast, a change of the metric that does not preserve the proximity relations would map so... |

10 |
A neural implementation of canonical correlation analysis
- Lai, Fyfe
- 1999
(Show Context)
Citation Context ... latter task, whereas LDA usually emphasizes the former. The classical canonical correlation analysis has recently been generalized by replacing the linear combinations with nonlinear functions [23], =-=[24-=-]. Our framework could as well be adapted to the task of nding statistical dependencies between two data sets by replacing the discrete auxiliary random variable with a parametrized set of features co... |

10 |
E.: Neural and statistical classifiers-taxonomy and two case studies
- Holmstrom, Koistinen, et al.
- 1997
(Show Context)
Citation Context ...robability density of the auxiliary random variable , conditioned on . Plenty of alternative methods are available. Many of them have been developed for classification purposes (for reviews see e.g., =-=[12]-=-, [13]). (4) (5) Fig. 1. The metric generated by a pdf estimate for a 2-D two-class data set (x = 1000). The first class is sampled from a symmetrical Gaussian with �@™Aa (the topmost cluster in the f... |

6 |
Predicting Japanese Corporate Bankruptcy in terms of financial data using neural networks
- Tsukada, Baba
- 1994
(Show Context)
Citation Context ... who applied Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], [30], [31], [32], =-=[33], [34], [3-=-5], [36]. Generally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. However, in all case... |

4 |
Exploring corporate bankruptcy with two-level self-organizing maps. Decision technologies for computational management science
- Kiviluoto, Bergius
- 1998
(Show Context)
Citation Context ... a supervised manner. We will apply the new metric to analyze the bankruptcy risk of enterprises on the basis of nancial statements. The setting is similar to that of Kiviluoto and Bergius [4], [5], [=-=6]-=-. They 1 Even though the mapping is continuous it is not topology preserving since it may be projective. March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 104 have u... |

3 |
Analyzing nancial statements with the self-organizing map
- Kiviluoto, Bergius
- 1997
(Show Context)
Citation Context ...induced in a supervised manner. We will apply the new metric to analyze the bankruptcy risk of enterprises on the basis of nancial statements. The setting is similar to that of Kiviluoto and Bergius [=-=4]-=-, [5], [6]. They 1 Even though the mapping is continuous it is not topology preserving since it may be projective. March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 ... |

3 |
Neural and statistical classi��ers��taxonomy and two case studies
- Holmstrm, Koistinen, et al.
- 1997
(Show Context)
Citation Context ...ity density p(cjx) of the auxiliary random variable C, conditioned on X. Plenty of alternative methods are available. Many of them have been developed for classication purposes (for reviews see e.g. [=-=12-=-], [13]). Most such methods would typically be suboptimal for our purpose, however, because a good classier optimizes the (sometimes implicit) probability density function (PDF) estimate near the clas... |

3 |
Neural network analysis of Russian banks
- Shumsky, Yarovoy
- 1997
(Show Context)
Citation Context ...r Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], [30], [31], [32], [33], [34], [35], =-=[36]. Generall-=-y, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. However, in all cases the improvement ... |

3 | Clustering by similarity in an auxiliary space
- Sinkkonen, Kaski
- 2000
(Show Context)
Citation Context ...he extraction of the Fisher metric is March 7, 2001 DRAFT IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. XX, NO. Y, MONTH 2001 122 similar to the recently introduced Kullback-Leibler clustering algorithm =-=[4-=-0], [41]. This connection will be detailed in future papers. In summary, we have extended the SOM-based exploratory analyses of the factors aecting bankruptcy risk in dierent kinds of companies by the... |

3 | Analyzing financial statements with the self-organizing map
- Kiviluoto, Bergius
- 1997
(Show Context)
Citation Context ...nduced in a supervised manner. We will apply the new metric to analyze the bankruptcy risk of enterprises on the basis of financial statements. The setting is similar to that of Kiviluoto and Bergius =-=[4]-=-–[6]. They have used SOMs to extend bankruptcy analysis from traditional straightforward prediction of bankruptcy to visual exploratory analyses of the relationship between the financial statements 1 ... |

3 | Self organizing neural networks for the analysis and representation of data: some � nancial cases, Neural Com puting - o, B, et al. - 1993 |

3 |
Comparing 2D and 3D self-organizing maps in financial data visualization
- Kiviluoto
- 1998
(Show Context)
Citation Context ...ject in Helsinki University of Technology has attempted to quantifys942 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001 and visualize these different behavior patterns [5], [6], [38], =-=[39]-=-. Because the present study is closely related to this project, some of its findings and also challenges are briefly summarized below. First, the SOM does not increase the accuracy in bankruptcy predi... |

2 |
A complete guide to predicting, avoiding, and dealing with bankruptcy
- Altman
- 1983
(Show Context)
Citation Context ...reducing) nonlinear mapping. The change of the metric can additionally be interpreted as a kind of nonlinear version of linear discriminant analysis (LDA; for applications of LDA in nance see, e.g., [=-=2-=-2]). The LDA nds a linear transformation, dened globally for the whole data space, that aims at maximizing class separability. In a more recently proposed variant called mixture discriminant analysis ... |

2 |
Comparing 2D and 3D self-organizing maps in nancial data visualization, in Methodologies for the Conception, Design and Application of Soft Computing
- Kiviluoto
- 1998
(Show Context)
Citation Context ...s, symptoms, and length. Along these lines of thought, a research project in Helsinki University of Technology has attempted to quantify and visualize these dierent behavior patterns [5], [6], [38], [=-=3-=-9]. Because the present study is closely related to this project, some of its ndings and also challenges are briey summarized below. First, the SOM does not increase the accuracy in bankruptcy predict... |

2 |
Two-level self-organizing maps for analysis of financial statements
- Kiviluoto, Bergius
- 1998
(Show Context)
Citation Context ...ch project in Helsinki University of Technology has attempted to quantifys942 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 4, JULY 2001 and visualize these different behavior patterns [5], [6], =-=[38]-=-, [39]. Because the present study is closely related to this project, some of its findings and also challenges are briefly summarized below. First, the SOM does not increase the accuracy in bankruptcy... |

1 |
del Bro and Carlos Serrano-Cinca, Self-organizing neural networks for the analysis and representation of data: Some nancial cases
- Martn
- 1993
(Show Context)
Citation Context ... Linear Discriminant Analysis to this problem. Later, almost every statistical method, including neural network approaches, has been proposed (see e.g. [27], [28], [29], [30], [31], [32], [33], [34], =-=[35], [36]. Ge-=-nerally, it has been observed that some of these methods, especially the more "advanced" ones such as neural network models, have slightly overperformed LDA. However, in all cases the improv... |

1 |
Two-level self-organizing maps for analysis of nancial statements
- Kiviluoto, Bergius
- 1998
(Show Context)
Citation Context ... causes, symptoms, and length. Along these lines of thought, a research project in Helsinki University of Technology has attempted to quantify and visualize these dierent behavior patterns [5], [6], [=-=3-=-8], [39]. Because the present study is closely related to this project, some of its ndings and also challenges are briey summarized below. First, the SOM does not increase the accuracy in bankruptcy p... |

1 |
Corporate Collapse—The Causes and Symptoms
- Argenti
- 1976
(Show Context)
Citation Context ...kruptcy analysis: trying to understand the different corporate behaviors and their relation to the risk of bankruptcy. A very influential qualitative work in this area has been carried out by Argenti =-=[37]-=-. One of his observations was that there are several different bankruptcy types (“failure trajectories”) that differ in their causes, symptoms, and length. Along these lines of thought, a research pro... |

1 |
clustering based on conditional distributions in an auxiliary space
- “Semisupervised
- 2000
(Show Context)
Citation Context ...on factor can be used. In finding the relevant local features of the input space, the extraction of the Fisher metric is similar to the recently introduced Kullback-Leibler clustering algorithm [40], =-=[41]-=-. This connection will be detailed in future papers. In summary, we have extended the SOM-based exploratory analyzes of the factors affecting bankruptcy risk in different kinds of companies by the new... |