Results 1 - 10
of
399
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions
- J. MOL. BIOL
, 1997
"... We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the ..."
Abstract
-
Cited by 393 (70 self)
- Add to MetaCart
as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current database-derived scoring functions noted by Thomas and Dill. The simulated
Parallel and distributed frequent itemset mining on dynamic datasets
- In Proc. of the High Performance Computing Conference, HiPC
, 2003
"... Abstract Traditional methods for data mining typically make the assumption that data is centralized and static. This assumption is no longer tenable. Such methods waste computational and I/O resources when data is dynamic, and they impose excessive communication overhead when data is distributed. As ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. As a result, the knowledge discovery process is harmed by slow response times. Efcient im-plementation of incremental data mining ideas in distributed computing environ-ments is thus becoming crucial for ensuring scalability and facilitate knowledge discovery when data is dynamic and distributed
Distributed Frequent Closed Itemsets Mining
, 2008
"... As many large organizations have multiple data sources and the scale of dataset becomes larger and larger, it is in-evitable to carry out data mining in the distributed envi-ronment. In this paper, we address the problem of mining global frequent closed itemsets in distributed environment. A novel a ..."
Abstract
- Add to MetaCart
As many large organizations have multiple data sources and the scale of dataset becomes larger and larger, it is in-evitable to carry out data mining in the distributed envi-ronment. In this paper, we address the problem of mining global frequent closed itemsets in distributed environment. A novel
Finding the True Frequent Itemsets
, 2014
"... Frequent Itemsets (FIs) mining is a fundamental primitive in knowledge discovery. It requires to identify all itemsets appearing in at least a fraction θ of a transactional dataset D. Often though, the ultimate goal of mining D is not an analysis of the dataset per se, but the understanding of the u ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
of the underlying process that generated it. Specifically, in many applications D is a collection of samples obtained from an unknown probability distribution pi on transactions, and by extracting the FIs in D one attempts to infer itemsets that are frequently (i.e., with probability at least θ) generated by pi
Parallel and distributed methods for incremental frequent itemset mining
- IEEE Transactions on Systems, Man and Cybernetics
, 2004
"... Abstract—Traditional methods for data mining typically make the assumption that the data is centralized, memory-resident, and static. This assumption is no longer tenable. Such methods waste computational and input/output (I/O) resources when data is dynamic, and they impose excessive communication ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
, which imposes minimal communication overhead for mining distributed dynamic datasets. Our distributed approach is capable of generating local models (in which each site has a summary of its own database) as well as the global model of frequent itemsets (in which all sites have a summary of the entire
A Novel Methodology of Frequent Itemset Mining on Hadoop
"... Abstract — Frequent Itemset Mining is one of the classical data mining problems in most of the data mining applications. It requires very large computations and I/O traffic capacity. Also resources like single processor’s memory and CPU are very limited, which degrades the performance of algorithm. ..."
Abstract
- Add to MetaCart
. In this paper we have proposed one such distributed algorithm which will run on Hadoop – one of the recent most popular distributed frameworks which mainly focus on mapreduce paradigm. The proposed approach takes into account inherent characteristics of the Apriori algorithm related to the frequent itemset
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
"... ABSTRACT Finding frequent itemsets in a data source is a fundamental operation behind Association Rule Mining. Generally, many algorithms use either the bottom-up or top-down approaches for finding these frequent itemsets. When the length of frequent itemsets to be found is large, the traditional a ..."
Abstract
- Add to MetaCart
minimal key discovery and theory extraction. In this paper, we suggest a novel method for finding the maximal frequent itemset from huge data sources using the concept of segmentation of data source and prioritization of segments. Empirical evaluation shows that this method outperforms various other known
Distributed Frequent Itemsets Mining in Heterogeneous Platforms
"... Abstract. Huge amounts of datasets with different sizes are naturally distributed over the network. In this paper we propose a distributed algorithm for frequent itemsets generation on heterogeneous clus-ters and grid environments. In addition to the disparity in the performance and the workload cap ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Huge amounts of datasets with different sizes are naturally distributed over the network. In this paper we propose a distributed algorithm for frequent itemsets generation on heterogeneous clus-ters and grid environments. In addition to the disparity in the performance and the workload
DTFIM: Distributed Trie-based Frequent Itemset Mining
"... Abstract — Finding association rules is one of the most investigated fields of data mining. Computation and communication are two important factors in distributed association rule mining. In this problem Association rules are generated by first mining of frequent itemsets in distributed data. In thi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. In this paper we proposed a new distributed trie-based algorithm (DTFIM) to find frequent itemsets. This algorithm is proposed for a multi-computer environment. In second phase we added an idea from FDM algorithm for candidate generation step. Experimental evaluations on different sort of distributed data show
Fast Parallel Mining of Frequent Itemsets
"... Association rule mining has become an essential data mining technique in various fields and the massive growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a par ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
dynamic task scheduling strategy at different stages of the algorithm to achieve good workload balancing among processors at runtime. According to experimental results with data sets generated by the IBM synthetic data generator on a 32 processor distributed memory environment (Terascale Computing System
Results 1 - 10
of
399