## Addressing the problems of bayesian network classification of video using high-dimensional features

Venue: | IEEE Transactions on knowledge and data engineering |

Citations: | 5 - 1 self |

### BibTeX

@ARTICLE{Mittal_addressingthe,

author = {Ankush Mittal and Loong-fah Cheong},

title = {Addressing the problems of bayesian network classification of video using high-dimensional features},

journal = {IEEE Transactions on knowledge and data engineering},

year = {},

volume = {2004},

pages = {230--244}

}

### OpenURL

### Abstract

Abstract—Bayesian theory is of great interest in pattern classification. In this paper, we present an approach to aid in the effective application of Bayesian networks in tasks like video classification, where descriptors originate from varied sources and are large in number. In order to extend the application of conventional Bayesian theory to the case of continuous and nonparametric descriptor space, dimension partitioning into attributes by minimizing the discrete Bayes error is proposed. The partitioning output goes to the dimensionality reduction module. A new algorithm for dimensionality reduction for improving the classification accuracy is proposed based on the class pair discriminative capacity of the dimensions. It is also shown how attributes can be weighed automatically in a single-label assignment based on comparing the class pairs. A computationally efficient method to assign multiple labels on the samples is also presented. Comparison with standard classification tools on video data of more than 4,000 segments shows the potential of our approach in pattern classification. Index Terms—Content-based retrieval, discrete bayes error, partitioning, dimensionality reduction, multiple labels assignment, Bayesian networks. 1

### Citations

7072 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...c1i1 1 and c2i1 0, i.e., V1 class causes attribute si1 while V2 does not. If attribute si1 becomes active on some assignment, the likelihood of V2 reduces (this is called “explaining away” effect =-=[32]-=-). 4.3 Multiple Labels Assignment The computational complexity of exact inference on Bayesian networks is NP-hard. For small networks, inference is still practical. However, for large, richly connecte... |

4957 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...proach with Neural Networks (ANN), Support vector machines (SVM), K-Nearest Neighbor classifier (KNN), and decision trees. 5.1 Comparison Some of the most well-known decision tree algorithms are C4.5 =-=[34]-=- and its improved successor C5. 1 We chose a C5 decision tree package for the purpose of comparison since it has many nice features like accurate and fast rule sets and fuzzy thresholding. The applica... |

3647 |
Neural Networks: A Comprehensive Foundation
- Haykin
- 1998
(Show Context)
Citation Context ...y on decreasing , the number of partitions can be increased. The trade off in performance by increasing the number of partitions is the increase in size of Bayesian CBR network. It has been argued in =-=[19]-=- that the structure of AI tools like Neural networks and support vector machines needs to be altered with changes in the dimensionality of the descriptor space, i.e., the size of the input vector, or ... |

2663 |
Introduction to statistical pattern recognition (2nd Ed
- Fukunaga
- 1990
(Show Context)
Citation Context ...[12]) and the other is to extract a smaller set of “features” as linear or nonlinear functions of the original set of “features” using Principal Component Analysis (PCA) [39] or discriminant analysis =-=[17]-=-. The present approach is based on the former technique i.e., choosing few “features” from the original set because of two reasons: First, the approach is more appropriate for meaningful “feature” eva... |

1290 |
Local computation with probabilities on graphical structures and their application to expert systems
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...ain Monte Carlo simulations (Pearl [32]). For multiply connected networks, the standard ways of dealing with loops are clustering and conditioning. Clustering (as given in Lauritzen and Spiegelhalter =-=[24]-=-) involves forming compound variables in such a way that the resulting network of clusters is singly connected. Conditioning involves breaking the communication pathways along the loops by instantiati... |

646 |
Pattern Recognition: A Statistical Approach. Englewood Cliffs
- Devijver, Kittler
- 1982
(Show Context)
Citation Context ...reduction: one is to select a limited set of “features” (here, “feature” is used as is generally used in pattern recognition task and is equivalent to dimension in our notation) out of the total set (=-=[12]-=-) and the other is to extract a smaller set of “features” as linear or nonlinear functions of the original set of “features” using Principal Component Analysis (PCA) [39] or discriminant analysis [17]... |

252 | SVMTorch: Support vector machines for large-scale regression problems
- Collobert, Bengio
- 2001
(Show Context)
Citation Context ...s like accurate and fast rule sets and fuzzy thresholding. The application of SVM to a domain of more than two target classes is still in the development phase; however, we use SVMTorch 2 C++ package =-=[9]-=- where the iterative process is performed by treating one class as þ1 and the others as 1, thereby getting jV j SVM models, where V is the set of classes. In neural networks, a feedforward backpropaga... |

150 | Stochastic simulation algorithms for dynamic probabilistic networks
- Kanazawa, Koller, et al.
- 1995
(Show Context)
Citation Context ...idence. Given the intractability of exact inference on large, complex networks, researchers have pursued general purpose approximate methods based on stochastic sampling such as likelihood weighting (=-=[21]-=-) and Markov chain Monte Carlo simulations (Pearl [32]). For multiply connected networks, the standard ways of dealing with loops are clustering and conditioning. Clustering (as given in Lauritzen and... |

96 | Learning boolean concepts in the presence of many irrelevant features
- Almuallim, Dietterich
- 1994
(Show Context)
Citation Context ...sing PCA or discriminant analysis a formidable exercise. Some of the well known “features” selection algorithms are not apposite in their application in the domain of multimedia classification. FOCUS =-=[2]-=- is intractable in data mining applications with thousands or even hundreds of “features” because it selects the minimal subset of “features” by exhaustively examining all the subsets of “features.” P... |

96 |
Probabilistic Inference Using Belief Networks is NP-Hard
- Cooper
- 1987
(Show Context)
Citation Context ...s breaking the communication pathways along the loops by instantiating a select group of variables. Both the methods are liable to combinatorial problems if there are many intersecting cycles. Cooper =-=[10]-=- has shown that the problem of inference to obtain conditional probabilities in an arbitrary belief network is NP-hard. This suggests that its240 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, V... |

73 | Feature selection for case-based classification of cloud types: an empirical comparison
- Aha, Bankert
(Show Context)
Citation Context ...y reduction can eliminate some irrelevant and/or redundant dimensions of the descriptors. By using feature selection, classification algorithms can in general improve their predictive accuracy (as in =-=[1]-=-), shorten the learning period [25], and result in saving in the memory requirements and the computation time. There are two techniques which are commonly found for dimensionality reduction: one is to... |

61 | Techniques and Systems for Image and Vïdeo Retrieval
- Aslandogan, Yu
- 1999
(Show Context)
Citation Context ...o the field of ContentBased Retrieval (CBR). The goal of CBR systems is to retrieve images or video sequences (called, in short, segments) as per the interest of the user (for review on CBR, refer to =-=[4]-=-). The challenges inherent in video classification in CBR systems include, among others, 1) forming close association between the descriptor space and the meaningful classes, 2) performing . A. Mittal... |

49 |
Semantic Modeling and Knowledge Representation in Multimedia Databases
- Al-Khatib, Day, et al.
- 1999
(Show Context)
Citation Context ...database along with the segments and through query manager and querying session are matched to the user’s choice and retrieved. 2.2 Feature Extraction Descriptors can be classified as global or local =-=[22]-=-. Global or coarse-grained feature extraction techniques transform the whole image into a functional representation where minute details within the individual portion of the multimedia are ignored. It... |

46 |
Feature Selection Using Rough Sets Theory
- Modrzejewski
- 1993
(Show Context)
Citation Context ...tractable in data mining applications with thousands or even hundreds of “features” because it selects the minimal subset of “features” by exhaustively examining all the subsets of “features.” PRESET =-=[28]-=- works only in a noise-free domain. We devise a class-pair distinctive “feature” selection which is efficient in computation even for a large data and can work on noisy data as well. 3.2.1 Relationshi... |

20 |
Structure and parameter learning for causal independence and causal interaction models
- Meek, Heckerman
- 1997
(Show Context)
Citation Context ...CBR system. Hidden nodes and, subsequently, the structure learning algorithm (likesMITTAL AND CHEONG: ADDRESSING THE PROBLEMS OF BAYESIAN NETWORK CLASSIFICATION OF VIDEO USING HIGH-DIMENSIONAL... 243 =-=[26]-=-) would be then necessary in such cases. The discretization and dimensionality reduction algorithms would have to be extended for these cases. APPENDIX On taking the ratio of MH with MH , we get: MH M... |

18 |
JACOB: just a content-based query system for video databases
- ML, Ardizzone
- 1996
(Show Context)
Citation Context ...elevance of a dimension sf is reflected in its weight wf. sl2s This approach of weighing the dimensions is similar to that of some of the present CBR systems such as QBIC [16], Virage [15], and JACOB =-=[7]-=- which use a weighted linear method to combine the similarity measures of different dimensions. They rely on the user to specify the relative weights to the dimensions. However, a user has to be knowl... |

18 |
et al., “Query by image and video content
- Flickner, Sawhney, et al.
- 1995
(Show Context)
Citation Context ...l decision for vmax and the relevance of a dimension sf is reflected in its weight wf. sl2s This approach of weighing the dimensions is similar to that of some of the present CBR systems such as QBIC =-=[16]-=-, Virage [15], and JACOB [7] which use a weighted linear method to combine the similarity measures of different dimensions. They rely on the user to specify the relative weights to the dimensions. How... |

14 | An iterative improvement approach for the discretization of numeric attributes in Bayesian classifiers
- Pazzani
- 1995
(Show Context)
Citation Context ...he values for a descriptor are normally distributed about some mean. Mean and standard deviation of a class for a descriptor are evaluated using a common statistical approach. It was shown in Pazzani =-=[30]-=- that Gaussian assumption of numeric data may lead to poor performance in many practical systems like electrical faults. He suggested the discretization of the variables into a small fixed number of p... |

9 | Dimensionality reduction via discretization
- Liu, Setiono
- 1996
(Show Context)
Citation Context ...ass 2 points from that of class 8, the dimension could be effectively used. DABER algorithm strategy differs from several other well known algorithms such as by Pfahringer [33] and by Liu and Setiono =-=[25]-=-. Pfahringer partitions the variable value to a large number of partitions in a binary tree and uses the MDL metric in a best first search to determine best partitions. Liu and Setiono partition the d... |

9 | Estimating the Bayes error rate through classifier combining
- Tumer, Ghosh
(Show Context)
Citation Context ...e functions and therefore, in practice the Bayes error can be computed directly only for a limited number of problems. Approximations and bounds on the Bayes error are instead commonly calculated. In =-=[37]-=-, the outputs of various classifiers are used to calculate the upper and the lower bounds on the Bayes error rate. Similarly, an approximation of the Bayes error was used by Kohn et al. [23] based on ... |

8 |
Bounds on the Bayes classification error based on pairwise risk functions
- Garber, Djouadi
- 1988
(Show Context)
Citation Context ...ass i. jV j is the number of video classes. pðsÞ, the probability distribution function of s, is given by: PjV j i1 pðs j viÞ i. The Bayes error which is associated with Bayes classifier is given by =-=[18]-=-: Z Es 1 max pðvi j sÞ Š pðsÞds; ð3Þ R i where R is the descriptor space and pðvi j sÞ is the a posteriori probability of class vi, i 1; 2; 3; ...; jV j . Evaluating the Bayes error Es might ent... |

7 |
The virage search engine: An open framework for image management
- Bach, Fuller, et al.
- 1996
(Show Context)
Citation Context ...r vmax and the relevance of a dimension sf is reflected in its weight wf. sl2s This approach of weighing the dimensions is similar to that of some of the present CBR systems such as QBIC [16], Virage =-=[15]-=-, and JACOB [7] which use a weighted linear method to combine the similarity measures of different dimensions. They rely on the user to specify the relative weights to the dimensions. However, a user ... |

5 |
Kuo : “A Semantic Classification and Composite Indexing Approach to Robust Image Retrieval,” The
- Yang, Jay
- 1999
(Show Context)
Citation Context ... to develop with the application of tools like Neural networks (for example, see Doulamis et al. [13]), decision trees (see Demsar and Solino [11]) and K-nearest neighbor classifier (see Yang and Kuo =-=[40]-=-). These works have different paradigms of operation from our CBR system in the sense that they do not envisage autonomous development of high-level classes from the knowledge extraction processes as ... |

4 | Entropy and mdl discretization of continuous variables for bayesian belief networks
- Clarke, Barton
- 2000
(Show Context)
Citation Context ...ons for each descriptor. It is not optimal to have fixed partitions all of equal sizes as some partitions become densely populated leading to poor discrimination (see several examples and analysis in =-=[8]-=-). Thus, we propose a statistical approach for finding the boundary points of a variable number of partitions. The discretization algorithm gives optimum number of partitions and with good discriminab... |

4 |
Using machine learning for content-based image retrieving
- Demsar, Solina
- 1996
(Show Context)
Citation Context ...assification in content-based classification are beginning to develop with the application of tools like Neural networks (for example, see Doulamis et al. [13]), decision trees (see Demsar and Solino =-=[11]-=-) and K-nearest neighbor classifier (see Yang and Kuo [40]). These works have different paradigms of operation from our CBR system in the sense that they do not envisage autonomous development of high... |

2 |
Feature Selection via Concave Minimization and
- Bradley, Mangasarian
- 1998
(Show Context)
Citation Context ...n rule and, since no probability density is estimated, it becomes highly sensitive to the curse of dimensionality [35]. With a finite training sample, a high-dimensional feature space is almost empty =-=[6]-=- and many separators in SVM tool may perform well on the training data, but only few would generalize well. It has been shown by Wetson et al. [38] that both linear SVMs and nonlinear SVMs perform bad... |

2 | A Neural Network Approach to Interactive Content-Based Retrieval of Video Databases
- Doulamis, Doulamis, et al.
- 1999
(Show Context)
Citation Context ...2. The ideas of performing association and classification in content-based classification are beginning to develop with the application of tools like Neural networks (for example, see Doulamis et al. =-=[13]-=-), decision trees (see Demsar and Solino [11]) and K-nearest neighbor classifier (see Yang and Kuo [40]). These works have different paradigms of operation from our CBR system in the sense that they d... |

1 |
Construction of a Classifier with Prior Domain Knowledge Formalized as Bayesian Network
- Antal
- 1998
(Show Context)
Citation Context ... Bayesian network. Bayesian network shows the dependence-independence relations in an understandable form that renders the tasks of decomposition, feature selection, or transformation more principled =-=[3]-=-, besides providing a sound inference mechanism. However, Bayesian Network requires a priori knowledge of many probabilities, which are usually estimated based on assumptions about the form of the und... |

1 |
Selection of Features for the Classification of Wood Board Defects
- Estevez, Fernandez, et al.
- 1999
(Show Context)
Citation Context ... dimensionality reduction is superior to the standard statistical methods which have been used in diverse applications involving regression or classification tasks like classification of wood defects =-=[14]-=-, or numeral recognition [20]. These methods use measures like intraclass variation, interclass variation, or correlation, etc., to differentiate how well the dimensions differentiate between the clas... |

1 |
Achieving Semantic Coupling in the Domain of High-Dimensional Video Indexing Application
- Mittal, Cheong
- 2001
(Show Context)
Citation Context ... SVMs and nonlinear SVMs perform badly in the situation of many irrelevant features and they show how SVM performance can be improved by feature selection. In fact, we have shown in our previous work =-=[27]-=- that feature selection can improve SVM accuracy to 88 percent on similar video classification problem. The performance of tools is also dependent on the distribution of the data. For instance, SVM is... |

1 |
Compression Based Discretization of Continuous Variables
- Pfahringer
- 1995
(Show Context)
Citation Context ...purpose of distinguishing class 2 points from that of class 8, the dimension could be effectively used. DABER algorithm strategy differs from several other well known algorithms such as by Pfahringer =-=[33]-=- and by Liu and Setiono [25]. Pfahringer partitions the variable value to a large number of partitions in a binary tree and uses the MDL metric in a best first search to determine best partitions. Liu... |

1 |
How Good are SVMS
- Raudys
- 2000
(Show Context)
Citation Context ...(usually small) number of training-set vectors determine the parameters of the decision rule and, since no probability density is estimated, it becomes highly sensitive to the curse of dimensionality =-=[35]-=-. With a finite training sample, a high-dimensional feature space is almost empty [6] and many separators in SVM tool may perform well on the training data, but only few would generalize well. It has ... |

1 |
The Connection between the Bayesian Risk and the Kolmogorov Distance and a Modification of It inRecognitionProblems,”Eng.Cybernetics
- Smirnov, Tikheyeva
(Show Context)
Citation Context ...te in (3) can be expressed as: Es 1 2 1 Z j pðs j v1Þ 1 pðs j v2Þ 2 j ds ; ð8Þ R where i is the prior probability of class vi. The integral in the above equation is known as the Kolmogorov distance =-=[36]-=- which is theoretically a sound distance measure as compared to the other measures. Our distance measure is a modification of the Kolmogorov distance for the discrete case. To map the problem of multi... |

1 |
A Critical Evaluation of Intrinsic Dimensionality Reduction Algorithms,” Pattern Recognition in Practise
- Wyse, Dubes, et al.
- 1980
(Show Context)
Citation Context ...tation) out of the total set ([12]) and the other is to extract a smaller set of “features” as linear or nonlinear functions of the original set of “features” using Principal Component Analysis (PCA) =-=[39]-=- or discriminant analysis [17]. The present approach is based on the former technique i.e., choosing few “features” from the original set because of two reasons: First, the approach is more appropriat... |