Results 1 -
9 of
9
Association analysis-based transformations for protein interaction networks: a function prediction case study
- In KDD ’07: Proceedings of the 13th ACM SIGKDD international
, 2007
"... Protein interaction networks are one of the most promising types of biological data for the discovery of functional modules and the prediction of individual protein functions. However, it is known that these networks are both incomplete and inaccurate, i.e., they have spurious edges and lack biologi ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Protein interaction networks are one of the most promising types of biological data for the discovery of functional modules and the prediction of individual protein functions. However, it is known that these networks are both incomplete and inaccurate, i.e., they have spurious edges and lack biologically valid edges. One way to handle this problem is by transforming the original interaction graph into new graphs that remove spurious edges, add biologically valid ones, and assign reliability scores to the edges constituting the final network. We investigate currently existing methods, as well as propose a robust association analysis-based method for this task. This method is based on the concept of h-confidence, which is a measure that can be used to extract groups of objects having high similarity with each other. Experimental evaluation on several protein interaction data sets show that hyperclique-based transformations enhance the performance of standard function prediction algorithms significantly, and thus have merit. Categories and Subject Descriptors
Association Analysis Techniques for Bioinformatics Problems
"... Abstract. Association analysis is one of the most popular analysis paradigms in data mining. Despite the solid foundation of association analysis and its potential applications, this group of techniques is not as widely used as classification and clustering, especially in the domain of bioinformatic ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Association analysis is one of the most popular analysis paradigms in data mining. Despite the solid foundation of association analysis and its potential applications, this group of techniques is not as widely used as classification and clustering, especially in the domain of bioinformatics and computational biology. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for pre-processing protein interaction networks for the task of protein function prediction. Finally, we discuss some of the challenges that need to be addressed to make association analysis-based techniques more applicable for a number of interesting problems in bioinformatics.
ASSOCIATION ANALYSIS TECHNIQUES FOR ANALYZING COMPLEX BIOLOGICAL DATA SETS
"... Association analysis is one of the most popular analysis paradigms in data mining. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Association analysis is one of the most popular analysis paradigms in data mining. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for pre-processing protein interaction networks. Finally, we discuss some of the challenges that need to be addressed to make association analysis-based techniques more applicable for bioinformatics. 1.
Enhancing Concept Detection by Pruning Data with MCA-based Transaction Weights
"... Abstract—With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem- the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target ..."
Abstract
- Add to MetaCart
Abstract—With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem- the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target concept/object/event is much smaller, which is also called the data imbalance issue. Therefore, one of the popular topics in multimedia information processing and retrieval is data pruning, a technique that can automatically identify and prune the data instances from the training data set so that the pruned data set is able to enhance the performance of model learning, classification, and concept detection. In this paper, a novel data pruning framework which gives each transaction a weight based on multiple correspondence analysis (MCA) is proposed. These transaction weights are used as the measure for pruning the training data set. Meanwhile, the testing data set could be weighted and pruned as well so that the computational cost is reduced not only when building the model but also when applying the classifiers. Experimenting with 18 high-level concepts and the benchmark (both balanced and imbalanced) data sets from TRECVID, our proposed framework achieves promising results to enhance the concept detection performance of three well-known classifiers commonly used for concept detection. Keywords-data pruning; transaction weight; concept detection; multiple correspondence analysis. I.
Unsupervised Clustering Using Hyperclique Pattern Constraints
"... A novel unsupervised clustering algorithm called Hyperclique Pattern-KMEANS (HP-KMEANS) is presented. Considering recent success in semisupervised clustering using pair-wise constraints, an unsupervised clustering method that selects constraints automatically based on Hyperclique patterns is propose ..."
Abstract
- Add to MetaCart
A novel unsupervised clustering algorithm called Hyperclique Pattern-KMEANS (HP-KMEANS) is presented. Considering recent success in semisupervised clustering using pair-wise constraints, an unsupervised clustering method that selects constraints automatically based on Hyperclique patterns is proposed. The COP-KMEANS framework is then adopted to cluster instances of data sets into corresponding groups. Experiments demonstrate promising results compared to classical unsupervised k-means clustering. 1.
1 � An Empirical Study of Class Noise Impacts on Supervised Learning Algorithms and Measures
"... Abstract- Noise in data is an effective cause of concern for many machine learning techniques. Researchers have studied the noise impacts only on some particular learning algorithm. We empirically study the noise impacts on four different representative learning algorithms and the two popular measur ..."
Abstract
- Add to MetaCart
Abstract- Noise in data is an effective cause of concern for many machine learning techniques. Researchers have studied the noise impacts only on some particular learning algorithm. We empirically study the noise impacts on four different representative learning algorithms and the two popular measures (accuracy and AUC) under different intensities of noise, particularly decision tree, naïve bayes, support vector machine, and logistic regression. Our empirical results show that AUC is more tolerant to noise. Among the four algorithms, naïve bayes is the most resistant to noise, but it performs the worst in accuracy. The other algorithms perform much better than naïve bayes especially after the noisy level is lower than 40%. When we develop approaches to improve the data quality (reduce the noise level) and build model with higher accuracy, decision tree is the most preferred one, followed by logistic regression and support vector machine. However, logistic regression performs the best in AUC.
Intelligent Data Analysis Using Data Mining Techniques
"... Abstract-The power of modern computing technology makes data gathering and storage easier. This leads to create new range of problems and challenges for data analysis. In this study a proposed approach based on clustering techniques for outlier detection is presented. At first EM-Cluster algorithm i ..."
Abstract
- Add to MetaCart
Abstract-The power of modern computing technology makes data gathering and storage easier. This leads to create new range of problems and challenges for data analysis. In this study a proposed approach based on clustering techniques for outlier detection is presented. At first EM-Cluster algorithm is performed to identify the missing values through which small clusters are formed. Then univariate outlier detection method is applied to identify outliers. The proposed approach gave effective results within optimum time and space when applied to synthetic data set.
Testing Various Similarity Metrics and their Permutations with Clustering Approach in Context Free Data Cleaning
"... Organizations can sustain growth in this knowledge era by proficient data analysis, which heavily relies on quality of data. This paper emphasizes on usage of sequence similarity metric with clustering approach in context free data cleaning to improve the quality of data by reducing noise. Authors p ..."
Abstract
- Add to MetaCart
Organizations can sustain growth in this knowledge era by proficient data analysis, which heavily relies on quality of data. This paper emphasizes on usage of sequence similarity metric with clustering approach in context free data cleaning to improve the quality of data by reducing noise. Authors propose an algorithm to test suitability of value to correct other values of attribute based on distance between them. The sequence similarity metrics like Needlemen-Wunch, Jaro-Winkler, Chapman Ordered Name Similarity and Smith-Waterman are used to find distance of two values. Experimental results show that how the approach can effectively clean the data without reference data.
Knowledge Discovery from Road Traffic Accident Data in Ethiopia: Data Quality, Ensembling and Trend Analysis for Improving Road Safety
"... Abstract — Descriptive analyses of the magnitude and situation of road safety in general and road accidents in particular is important, but understanding of data quality, factors related with dangerous situations and different interesting patterns in a data is of even greater importance. Under the u ..."
Abstract
- Add to MetaCart
Abstract — Descriptive analyses of the magnitude and situation of road safety in general and road accidents in particular is important, but understanding of data quality, factors related with dangerous situations and different interesting patterns in a data is of even greater importance. Under the umbrella of an information architecture research for road safety in developing countries, the objective of this machine learning experimental research is to explore data quality issues, analyze trends and predict the role of road users on possible injury risks. The research employed TreeNet, Classification and Adaptive Regression Trees (CART), Random Forest (RF) and hybrid ensemble approach. To identify relevant patterns and illustrate the performance of the techniques for the road safety domain, road accident data collected from Addis Ababa Traffic Office is exposed to several analyses. Empirical results illustrate that data quality is a major problem that needs architectural guideline and the prototype models could classify accidents with promising accuracy. In addition an ensemble technique proves to be better in terms of predictive accuracy in the domain under study.

