Results 1 -
5 of
5
2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering Mining Positional Association Super-Rules on Fixed-Size Protein Sequence Motifs
"... Abstract — Protein sequence motifs information is crucial to the analysis of biologically significant regions. The conserved regions have the potential to determine the role of the proteins. Many algorithms or techniques to discover motifs require a predefined fixed window size in advance. Due to th ..."
Abstract
- Add to MetaCart
Abstract — Protein sequence motifs information is crucial to the analysis of biologically significant regions. The conserved regions have the potential to determine the role of the proteins. Many algorithms or techniques to discover motifs require a predefined fixed window size in advance. Due to the fixed size, these approaches often deliver a number of similar motifs simply shifted by some bases or including mismatches. To confront the shifted motifs problem, we cooperate the Super-Rule-Tree (SRT) concept, which is designed for solving the mismatched motifs problem, and propose a new Positional Association Rules algorithm. In Positional Association Rules algorithm, a new parameter named distance assurance is created to search frequent distances appearing in association rules. By analyzing the motifs results generated by our approach on our dataset, we provide the optimal minimum support, confidence, and distance assurance. We believe the Positional Association Super-Rules algorithm can play an important role in similar researches which requires predefined fixed window size. Index Terms — Positional Association Rules, Super-rules, protein sequence motif. I.
Extraction of Protein Sequence Motifs Information by Bi-Clustering Algorithm
"... Abstract- The activities and function of proteins can potentially be determined by protein sequence motifs. Therefore, obtaining the universally conserved and crossed protein family boundaries protein sequence motifs is crucial. In this study, a fuzzy C-means and an improved K-means clustering algor ..."
Abstract
- Add to MetaCart
Abstract- The activities and function of proteins can potentially be determined by protein sequence motifs. Therefore, obtaining the universally conserved and crossed protein family boundaries protein sequence motifs is crucial. In this study, a fuzzy C-means and an improved K-means clustering algorithm are applied to granulize the entire dataset and analyze each granular respectively. In addition, a modified bi-clustering algorithm is employed to improve clusters’ quality. This is the first time bi-clustering algorithm is implemented for clusters extraction proposes. By comparing with the traditional shrink method, the modified bi-clustering algorithm generates more clusters with secondary structure similarity greater than 60 % at the same data filtering percentage. Moreover, bi-clustering algorithm is shown to have the ability to select meaningful amino acids that biologists are interested at. Keywords: Bi-Clustering, Protein Sequence Motifs, FGK 1.
Constructing Super Rule Tree (SRT) for Protein Motif Clusters Using DBSCAN
"... Searching for protein sequence and structural motifs is one of the most important topics in Bioinformatics, because the motifs are able to determine the role of the proteins. A fixed window size is usually defined in advance for the most of motif searching algorithms. The fixed window size may resul ..."
Abstract
- Add to MetaCart
Searching for protein sequence and structural motifs is one of the most important topics in Bioinformatics, because the motifs are able to determine the role of the proteins. A fixed window size is usually defined in advance for the most of motif searching algorithms. The fixed window size may result in generating a number of similar motifs shifted by one to several bases or including mismatches. In this study, to confront the mismatched motifs problem, we use the super-rule concept to construct a Super-Rule-Tree (SRT) which is generated by the DBSCAN clustering algorithm. This SRT recognizes the similar motifs. Analysis of the hierarchical DBSCAN generated Super-Rule-Tree shows a better quality in secondary structure similarity evaluation than the previous studies’. We believe that the combination of DBSCAN and SRT concept may provide a new point of view to similar researches which require predefined fixed window size. Keywords: Super-Rule-Tree (SRT), DBSCAN, protein sequence motif. A 1.
and Mutlu Mete 2
"... Abstract — The role of protein sequence motifs is in predicting functional or structural portion of other proteins including prosthetic attachment sites, enzyme-binding sites and DNA /RNA binding sites, and so on. A fixed window size is usually predefined to discover protein sequence motifs for many ..."
Abstract
- Add to MetaCart
Abstract — The role of protein sequence motifs is in predicting functional or structural portion of other proteins including prosthetic attachment sites, enzyme-binding sites and DNA /RNA binding sites, and so on. A fixed window size is usually predefined to discover protein sequence motifs for many algorithms and techniques. However, the predefined window size may deliver a number of similar motifs simply shifted by some bases or including mismatches. In this paper, we use the positional association rules algorithm to form motifs network and adapt a Structural Clustering Algorithm for Networks named SCAN to recognize similar motifs. Although association rule based algorithms have been widely adapted in association analysis and classification, few of those are designed as clustering methods. With the SCAN analysis, the qualities of the clusters are further improved.
Effective Clustering Algorithms for Gene Expression Data
"... Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or gene expression data analysis and is an important task in Bi ..."
Abstract
- Add to MetaCart
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or gene expression data analysis and is an important task in Bioinformatics research. In this paper, K-Means algorithm hybridised with Cluster Centre Initialization Algorithm (CCIA) is proposed Gene Expression Data. The proposed algorithm overcomes the drawbacks of specifying the number of clusters in the K-Means methods. Experimental analysis shows that the proposed method performs well on gene Expression Data when compare with the traditional K- Means clustering and Silhouette Coefficients cluster measure.

