Results 1 - 10
of
21
Scalable Algorithms for Association Mining
- IEEE Transactions on Knowledge and Data Engineering
, 2000
"... Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery ..."
Abstract
-
Cited by 138 (21 self)
- Add to MetaCart
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery of frequent itemsets, which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sub-lattices, which can be solved in memory. Ecient lattice traversal techniques are presented, which quickly identify all the long frequent itemsets, and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining ...
Subgraph Isomorphism in Planar Graphs and Related Problems
, 1999
"... We solve the subgraph isomorphism problem in planar graphs in linear time, for any pattern of constant size. Our results are based on a technique of partitioning the planar graph into pieces of small tree-width, and applying dynamic programming within each piece. The same methods can be used to ..."
Abstract
-
Cited by 89 (1 self)
- Add to MetaCart
We solve the subgraph isomorphism problem in planar graphs in linear time, for any pattern of constant size. Our results are based on a technique of partitioning the planar graph into pieces of small tree-width, and applying dynamic programming within each piece. The same methods can be used to solve other planar graph problems including connectivity, diameter, girth, induced subgraph isomorphism, and shortest paths.
Theoretical foundations of association rules
- In 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 1998
"... In this paper we describe a formal framework for the problem of mining association rules. The theoretical foundation is based on the field of formal concept analysis. A concept is composed of closed subsets of attributes (itemsets) and objects (transactions). We show that all frequent itemsets are u ..."
Abstract
-
Cited by 77 (10 self)
- Add to MetaCart
In this paper we describe a formal framework for the problem of mining association rules. The theoretical foundation is based on the field of formal concept analysis. A concept is composed of closed subsets of attributes (itemsets) and objects (transactions). We show that all frequent itemsets are uniquely determined by the frequent concepts. We further show how this lattice-theoretic framework can be used to find a small rule generating set, from which one can infer all other association rules. 1
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets
, 2002
"... Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal t-frequent column sets for A, it is NP-complete to decide whether A has any further maximal t-frequent sets, or not, even when the number of such addit ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal t-frequent column sets for A, it is NP-complete to decide whether A has any further maximal t-frequent sets, or not, even when the number of such additional maximal t-frequent column sets may be exponentially large. In contrast, all minimal t-infrequent sets of columns of A can be enumerated in incremental quasi-polynomial time. The proof of the latter result follows from the inequality # t + 1)#, where # and # are respectively the numbers of all maximal t-frequent and all minimal t-infrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed t-frequent column sets for a given binary matrix.
Confluent drawings: Visualizing Non-Planar Diagrams in a Planar Way
- GRAPH DRAWING (PROC. GD ’03), VOLUME 2912 OF LECTURE NOTES COMPUT. SCI
, 2003
"... We introduce a new approach for drawing diagrams. Our approach is to use a technique we call confluent drawing for visualizing non-planar graphs in a planar way. This approach allows us to draw, in a crossing-free manner, graphs—such as software interaction diagrams—that would normally have many cro ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
We introduce a new approach for drawing diagrams. Our approach is to use a technique we call confluent drawing for visualizing non-planar graphs in a planar way. This approach allows us to draw, in a crossing-free manner, graphs—such as software interaction diagrams—that would normally have many crossings. The main idea of this approach is quite simple: we allow groups of edges to be merged together and drawn as “tracks” (similar to train tracks). Producing such confluent drawings automatically from a graph with many crossings is quite challenging, however, we offer a heuristic algorithm (one version for undirected graphs and one version for directed ones) to test if a non-planar graph can be drawn efficiently in a confluent way. In addition, we identify several large classes of graphs that can be completely categorized as being either confluently drawable or confluently non-drawable.
New Algorithms for Enumerating All Maximal Cliques
, 2004
"... Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs with O( ∆ 4) time delay and in O(n + m) space, where ∆ denotes the maximum degree of G, M(n) denotes the time needed to multiply two n × n matrices, and the latter one requires O(nm) time as a preprocessing. For a given bipartite graph G, we propose three algorithms for enumerating all maximal bipartite cliques. The first algorithm runs with O(M(n)) time delay and in O(n 2) space, which immediately follows from the algorithm for the nonbipartite case. The second one runs with O( ∆ 3) time delay and in O(n + m) space, and the last one runs with O( ∆ 2) time delay and in O(n + m + N∆) space, where N denotes the number of all maximal bipartite cliques in G and both algorithms require O(nm) time as a preprocessing. Our algorithms improve upon all the existing algorithms, when G is either dense or sparse. Furthermore, computational experiments show that our algorithms for sparse graphs have significantly good performance for graphs which are generated randomly and appear in real-world problems. 1
Consensus Algorithms for the Generation of All Maximal Bicliques
, 2002
"... We describe a new algorithm for generating all maximal bicliques (i.e. complete bipartite, not necessarily induced subgraphs) of a graph. The algorithm is inspired by, and is quite similar to, the consensus method used in propositional logic. We show that some variants of the algorithm are totally p ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
We describe a new algorithm for generating all maximal bicliques (i.e. complete bipartite, not necessarily induced subgraphs) of a graph. The algorithm is inspired by, and is quite similar to, the consensus method used in propositional logic. We show that some variants of the algorithm are totally polynomial, and even incrementally polynomial. The total complexity of the most efficient variant of the algorithms presented here is polynomial in the input size, and only linear in the output size. Computational experiments demonstrate its high efficiency on randomly generated graphs with up to 2,000 vertices and 20,000 edges.
Undue influence: Eliminating the impact of link plagiarism on web search rankings
- In Proceedings of the 21st Annual ACM Symposium on Applied Computing
, 2006
"... Link farm spam and replicated pages can greatly deteriorate link-based ranking algorithms like HITS. In order to identify and neutralize link farm spam and replicated pages, we look for sufficient material copied from one page to another. In particular, we focus on the use of “complete hyperlinks” t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Link farm spam and replicated pages can greatly deteriorate link-based ranking algorithms like HITS. In order to identify and neutralize link farm spam and replicated pages, we look for sufficient material copied from one page to another. In particular, we focus on the use of “complete hyperlinks” to distinguish link targets by the anchor text used. We build and analyze the bipartite graph of documents and their complete hyperlinks to find pages that share anchor text and link targets. Link farms and replicated pages are identified in this process, permitting the influence of problematic links to be reduced in a weighted adjacency matrix. Experiments and user evaluation show significant improvement in the quality of results produced using HITS-like methods. 1
On maximal frequent and minimal infrequent sets in binary matrices
- Annals of Mathematics and Artificial Intelligence
, 2003
"... Abstract. Given an m × n binary matrix A, a subset C of the columns is called t-frequent if there are at least t rows in A in which all entries belonging to C are nonzero. Let us denote by α the number of maximal t-frequent sets of A, and let β denote the number of those minimal column subsets of A ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. Given an m × n binary matrix A, a subset C of the columns is called t-frequent if there are at least t rows in A in which all entries belonging to C are nonzero. Let us denote by α the number of maximal t-frequent sets of A, and let β denote the number of those minimal column subsets of A which are not t-frequent (so called t-infrequent sets). We prove that the inequality α ≤ (m − t + 1)β holds for any binary matrix A in which not all column subsets are t-frequent. This inequality is sharp, and allows for an incremental quasi-polynomial algorithm for generating all minimal t-infrequent sets. We also prove that the analogous generation problem for maximal t-frequent sets is NP-hard. Finally, we discuss the complexity of generating closed frequent sets and some other related problems.
An Approximation Ratio for Biclustering
, 2007
"... The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters is as uniform as possible. In this paper we approximate the optimal biclustering by applying one-way clustering algorithm ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters is as uniform as possible. In this paper we approximate the optimal biclustering by applying one-way clustering algorithms independently on the rows and on the columns of the input matrix. We show that such a solution yields a worst-case approximation ratio of 1+ √ 2 under L1-norm for 0–1 valued matrices, and of 2 under L2-norm for real valued matrices. Keywords: Approximation algorithms; Biclustering; One-way clustering 1

