Results 1  10
of
28
Scalable Algorithms for Association Mining
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2000
"... Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery ..."
Abstract

Cited by 182 (22 self)
 Add to MetaCart
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery of frequent itemsets, which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented, which quickly identify all the long frequent itemsets, and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining ...
Subgraph Isomorphism in Planar Graphs and Related Problems
, 1999
"... We solve the subgraph isomorphism problem in planar graphs in linear time, for any pattern of constant size. Our results are based on a technique of partitioning the planar graph into pieces of small treewidth, and applying dynamic programming within each piece. The same methods can be used to ..."
Abstract

Cited by 109 (3 self)
 Add to MetaCart
We solve the subgraph isomorphism problem in planar graphs in linear time, for any pattern of constant size. Our results are based on a technique of partitioning the planar graph into pieces of small treewidth, and applying dynamic programming within each piece. The same methods can be used to solve other planar graph problems including connectivity, diameter, girth, induced subgraph isomorphism, and shortest paths.
Theoretical foundations of association rules
 In 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 1998
"... In this paper we describe a formal framework for the problem of mining association rules. The theoretical foundation is based on the field of formal concept analysis. A concept is composed of closed subsets of attributes (itemsets) and objects (transactions). We show that all frequent itemsets are u ..."
Abstract

Cited by 94 (10 self)
 Add to MetaCart
In this paper we describe a formal framework for the problem of mining association rules. The theoretical foundation is based on the field of formal concept analysis. A concept is composed of closed subsets of attributes (itemsets) and objects (transactions). We show that all frequent itemsets are uniquely determined by the frequent concepts. We further show how this latticetheoretic framework can be used to find a small rule generating set, from which one can infer all other association rules. 1
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets
, 2002
"... Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such addit ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such additional maximal tfrequent column sets may be exponentially large. In contrast, all minimal tinfrequent sets of columns of A can be enumerated in incremental quasipolynomial time. The proof of the latter result follows from the inequality # t + 1)#, where # and # are respectively the numbers of all maximal tfrequent and all minimal tinfrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed tfrequent column sets for a given binary matrix.
New Algorithms for Enumerating All Maximal Cliques
, 2004
"... Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs with O( ∆ 4) time delay and in O(n + m) space, where ∆ denotes the maximum degree of G, M(n) denotes the time needed to multiply two n × n matrices, and the latter one requires O(nm) time as a preprocessing. For a given bipartite graph G, we propose three algorithms for enumerating all maximal bipartite cliques. The first algorithm runs with O(M(n)) time delay and in O(n 2) space, which immediately follows from the algorithm for the nonbipartite case. The second one runs with O( ∆ 3) time delay and in O(n + m) space, and the last one runs with O( ∆ 2) time delay and in O(n + m + N∆) space, where N denotes the number of all maximal bipartite cliques in G and both algorithms require O(nm) time as a preprocessing. Our algorithms improve upon all the existing algorithms, when G is either dense or sparse. Furthermore, computational experiments show that our algorithms for sparse graphs have significantly good performance for graphs which are generated randomly and appear in realworld problems. 1
Confluent drawings: Visualizing NonPlanar Diagrams in a Planar Way
 GRAPH DRAWING (PROC. GD ’03), VOLUME 2912 OF LECTURE NOTES COMPUT. SCI
, 2003
"... We introduce a new approach for drawing diagrams. Our approach is to use a technique we call confluent drawing for visualizing nonplanar graphs in a planar way. This approach allows us to draw, in a crossingfree manner, graphs—such as software interaction diagrams—that would normally have many cro ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
We introduce a new approach for drawing diagrams. Our approach is to use a technique we call confluent drawing for visualizing nonplanar graphs in a planar way. This approach allows us to draw, in a crossingfree manner, graphs—such as software interaction diagrams—that would normally have many crossings. The main idea of this approach is quite simple: we allow groups of edges to be merged together and drawn as “tracks” (similar to train tracks). Producing such confluent drawings automatically from a graph with many crossings is quite challenging, however, we offer a heuristic algorithm (one version for undirected graphs and one version for directed ones) to test if a nonplanar graph can be drawn efficiently in a confluent way. In addition, we identify several large classes of graphs that can be completely categorized as being either confluently drawable or confluently nondrawable.
Consensus Algorithms for the Generation of All Maximal Bicliques
, 2002
"... We describe a new algorithm for generating all maximal bicliques (i.e. complete bipartite, not necessarily induced subgraphs) of a graph. The algorithm is inspired by, and is quite similar to, the consensus method used in propositional logic. We show that some variants of the algorithm are totally p ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
We describe a new algorithm for generating all maximal bicliques (i.e. complete bipartite, not necessarily induced subgraphs) of a graph. The algorithm is inspired by, and is quite similar to, the consensus method used in propositional logic. We show that some variants of the algorithm are totally polynomial, and even incrementally polynomial. The total complexity of the most efficient variant of the algorithms presented here is polynomial in the input size, and only linear in the output size. Computational experiments demonstrate its high efficiency on randomly generated graphs with up to 2,000 vertices and 20,000 edges.
On Maximal Frequent and Minimal Infrequent Sets In Binary Matrices
, 2002
"... Given an m x n binary matrix A, a subset C of the columns is called tfrequent if there are at least t rows in A in which all entries belonging to C are nonzero. Let us denote by # the number of maximal tfrequent sets of A, and let # denote the number of those minimal column subsets of A which ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Given an m x n binary matrix A, a subset C of the columns is called tfrequent if there are at least t rows in A in which all entries belonging to C are nonzero. Let us denote by # the number of maximal tfrequent sets of A, and let # denote the number of those minimal column subsets of A which are not tfrequent (so called tinfrequent sets). We prove
Undue influence: Eliminating the impact of link plagiarism on web search rankings
 In Proceedings of the 21st Annual ACM Symposium on Applied Computing
, 2006
"... Link farm spam and replicated pages can greatly deteriorate linkbased ranking algorithms like HITS. In order to identify and neutralize link farm spam and replicated pages, we look for sufficient material copied from one page to another. In particular, we focus on the use of “complete hyperlinks” t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Link farm spam and replicated pages can greatly deteriorate linkbased ranking algorithms like HITS. In order to identify and neutralize link farm spam and replicated pages, we look for sufficient material copied from one page to another. In particular, we focus on the use of “complete hyperlinks” to distinguish link targets by the anchor text used. We build and analyze the bipartite graph of documents and their complete hyperlinks to find pages that share anchor text and link targets. Link farms and replicated pages are identified in this process, permitting the influence of problematic links to be reduced in a weighted adjacency matrix. Experiments and user evaluation show significant improvement in the quality of results produced using HITSlike methods. 1