• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 399
Next 10 →

Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions

by Kim T. Simons, Charles Kooperberg, Enoch Huang, David Baker - J. MOL. BIOL , 1997
"... We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the ..."
Abstract - Cited by 393 (70 self) - Add to MetaCart
as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current database-derived scoring functions noted by Thomas and Dill. The simulated

Parallel and distributed frequent itemset mining on dynamic datasets

by Adriano Veloso, Matthew Erick Otey, Srinivasan Parthasarathy, Wagner Meira - In Proc. of the High Performance Computing Conference, HiPC , 2003
"... Abstract Traditional methods for data mining typically make the assumption that data is centralized and static. This assumption is no longer tenable. Such methods waste computational and I/O resources when data is dynamic, and they impose excessive communication overhead when data is distributed. As ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
. As a result, the knowledge discovery process is harmed by slow response times. Efcient im-plementation of incremental data mining ideas in distributed computing environ-ments is thus becoming crucial for ensuring scalability and facilitate knowledge discovery when data is dynamic and distributed

Distributed Frequent Closed Itemsets Mining

by Zheng Zheng, Chun Liu, Zheng Zheng, Kai-yuan Cai, Shichao Zhang , 2008
"... As many large organizations have multiple data sources and the scale of dataset becomes larger and larger, it is in-evitable to carry out data mining in the distributed envi-ronment. In this paper, we address the problem of mining global frequent closed itemsets in distributed environment. A novel a ..."
Abstract - Add to MetaCart
As many large organizations have multiple data sources and the scale of dataset becomes larger and larger, it is in-evitable to carry out data mining in the distributed envi-ronment. In this paper, we address the problem of mining global frequent closed itemsets in distributed environment. A novel

Finding the True Frequent Itemsets

by Matteo Riondato , Fabio Vandin , 2014
"... Frequent Itemsets (FIs) mining is a fundamental primitive in knowledge discovery. It requires to identify all itemsets appearing in at least a fraction θ of a transactional dataset D. Often though, the ultimate goal of mining D is not an analysis of the dataset per se, but the understanding of the u ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
of the underlying process that generated it. Specifically, in many applications D is a collection of samples obtained from an unknown probability distribution pi on transactions, and by extracting the FIs in D one attempts to infer itemsets that are frequently (i.e., with probability at least θ) generated by pi

Parallel and distributed methods for incremental frequent itemset mining

by Matthew Eric Otey, Srinivasan Parthasarathy, Chao Wang, Adriano Veloso, Wagner Meira - IEEE Transactions on Systems, Man and Cybernetics , 2004
"... Abstract—Traditional methods for data mining typically make the assumption that the data is centralized, memory-resident, and static. This assumption is no longer tenable. Such methods waste computational and input/output (I/O) resources when data is dynamic, and they impose excessive communication ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
, which imposes minimal communication overhead for mining distributed dynamic datasets. Our distributed approach is capable of generating local models (in which each site has a summary of its own database) as well as the global model of frequent itemsets (in which all sites have a summary of the entire

A Novel Methodology of Frequent Itemset Mining on Hadoop

by Dhamdhere Jyoti L, Prof Deshp, E Kiran B
"... Abstract — Frequent Itemset Mining is one of the classical data mining problems in most of the data mining applications. It requires very large computations and I/O traffic capacity. Also resources like single processor’s memory and CPU are very limited, which degrades the performance of algorithm. ..."
Abstract - Add to MetaCart
. In this paper we have proposed one such distributed algorithm which will run on Hadoop – one of the recent most popular distributed frameworks which mainly focus on mapreduce paradigm. The proposed approach takes into account inherent characteristics of the Apriori algorithm related to the frequent itemset

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

by M Rajalakshmi , Dr T Purusothaman , Dr R Nedunchezhian
"... ABSTRACT Finding frequent itemsets in a data source is a fundamental operation behind Association Rule Mining. Generally, many algorithms use either the bottom-up or top-down approaches for finding these frequent itemsets. When the length of frequent itemsets to be found is large, the traditional a ..."
Abstract - Add to MetaCart
minimal key discovery and theory extraction. In this paper, we suggest a novel method for finding the maximal frequent itemset from huge data sources using the concept of segmentation of data source and prioritization of segments. Empirical evaluation shows that this method outperforms various other known

Distributed Frequent Itemsets Mining in Heterogeneous Platforms

by Lamine M. Aouad, Nhien-an Le-khac, Tahar M. Kechadi
"... Abstract. Huge amounts of datasets with different sizes are naturally distributed over the network. In this paper we propose a distributed algorithm for frequent itemsets generation on heterogeneous clus-ters and grid environments. In addition to the disparity in the performance and the workload cap ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract. Huge amounts of datasets with different sizes are naturally distributed over the network. In this paper we propose a distributed algorithm for frequent itemsets generation on heterogeneous clus-ters and grid environments. In addition to the disparity in the performance and the workload

DTFIM: Distributed Trie-based Frequent Itemset Mining

by E. Ansari, G. H. Dastghaibifard, M. Keshtkaran
"... Abstract — Finding association rules is one of the most investigated fields of data mining. Computation and communication are two important factors in distributed association rule mining. In this problem Association rules are generated by first mining of frequent itemsets in distributed data. In thi ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
. In this paper we proposed a new distributed trie-based algorithm (DTFIM) to find frequent itemsets. This algorithm is proposed for a multi-computer environment. In second phase we added an idea from FDM algorithm for candidate generation step. Experimental evaluations on different sort of distributed data show

Fast Parallel Mining of Frequent Itemsets

by H. D. K. Moonesinghe, Moon-jung Chung, Pang-ning Tan
"... Association rule mining has become an essential data mining technique in various fields and the massive growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a par ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
dynamic task scheduling strategy at different stages of the algorithm to achieve good workload balancing among processors at runtime. According to experimental results with data sets generated by the IBM synthetic data generator on a 32 processor distributed memory environment (Terascale Computing System
Next 10 →
Results 1 - 10 of 399
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University