## Automated Database Schema Design Using Mined Data Dependencies (1998)

Venue: | J. Amer. Soc. Inform. Sci |

Citations: | 6 - 0 self |

### BibTeX

@ARTICLE{Wong98automateddatabase,

author = {S. K. M. Wong and C. J. Butz and Y. Xiang},

title = {Automated Database Schema Design Using Mined Data Dependencies},

journal = {J. Amer. Soc. Inform. Sci},

year = {1998},

volume = {49},

pages = {455--470}

}

### OpenURL

### Abstract

Data dependencies are used in database schema design to enforce the correctness of a database as well as to reduce redundant data. These dependencies are usually determined from the semantics of the attributes and are then enforced upon the relations. This paper describes a bottom-up procedure for discovering multivalued dependencies (MVDs) in observed data without knowing `a priori the relationships amongst the attributes. The proposed algorithm is an application of the technique we designed for learning conditional independencies in probabilistic reasoning. A prototype system for automated database schema design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of our method. 1

### Citations

7052 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...s equivalent to probabilistic conditional independence in a uniform distribution. While many researchers (Dechter, 1990; Hill, 1993; Lauritzen & Spiegelhalter, 1988; Lee, 1983; Pearl and Verma, 1987; =-=Pearl, 1988-=-) have noticed similarities between these two distinct but closely related knowledge systems, the relationship delves far beyond mere similarities. It has been shown that a Bayesian network can be rep... |

1284 | Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems (with Discussion - Lauritzen, Spiegelhalter - 1988 |

1240 | A.: On information and sufficiency - KULLBACK, LEIBLER - 1951 |

1132 |
Algorithmic graph theory and perfect graphs
- Golumbic
- 1980
(Show Context)
Citation Context ...ize of the clique searched for during the execution of Algorithm 1. We now analyze the worst case time complexity of the algorithm. Testing the chordality of G-pass can be performed in O(j N j) time (=-=Golumbic, 1980-=-). A hypertree (junction tree) can be computed by a maximal spanning tree algorithm (Jensen, 1988). A maximal spanning tree of a graph with v nodes and e links can be computed in O((v + e) log v) time... |

1075 | Herskovitz: A Bayesian Method for the Induction - Cooper, E - 1992 |

905 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ..., the above example clearly demonstrates that the notion of multivalued dependency in relational databases is closely connected to that of probabilistic conditional independence in Bayesian networks (=-=Jensen, 1996-=-; Pearl, 1988). We can take advantage of this relationship by applying our algorithm for learning probabilistic conditional independencies to one which learns multivalued dependencies. This is achieve... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data - Heckerman, Geiger, et al. - 1995 |

431 |
The Theory of Relational Databases
- Maier
- 1983
(Show Context)
Citation Context ... design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of our method. 1 Introduction Traditionally, data dependencies in relational databases (=-=Maier, 1983-=-) are constraints inferred by the database designer from the semantics of the attributes involved. These constraints are the rules that the data must obey in all instances to safeguard the correctness... |

239 |
The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks
- Beinlich, Suermondt, et al.
- 1989
(Show Context)
Citation Context ...sisted of a large database which contained duplicate tuples. These large databases were originally constructed from Bayesian networks of famous probabilistic experiments, namely the Alarm experiment (=-=Beinlich et al., 1989-=-) and the Fire experiment (Poole & Neufeld, 1988). As no duplicates are allowed in relational theory, the only task remaining was to remove all duplicates from the database. All probabilistic conditio... |

164 |
On the desirability of acyclic database schemes
- Beeri, Fagin, et al.
- 1983
(Show Context)
Citation Context ...g all the discovered MVDs. In fact, the output schema satisfies an acyclic join dependency (Wong, Xiang & Nie, 1994c; Wong, Butz & Xiang, 1995) which is known to possess several desirable properties (=-=Beeri et al., 1983-=-) in database applications. Thereby, not only do we mine dependencies which hold in the observed data, but we also determine a desirable database schema. Our method can be seen as a bottom-up approach... |

151 |
Introduction to Algorithms: A Creative Approach
- Manber
- 1989
(Show Context)
Citation Context ... A hypertree (junction tree) can be computed by a maximal spanning tree algorithm (Jensen, 1988). A maximal spanning tree of a graph with v nodes and e links can be computed in O((v + e) log v) time (=-=Manber, 1989-=-). Since a complete graph has O(v 2 ) links, a maximal spanning tree can be computed in O(v 2 log v) time. Equivalently, computation of a hypertree of a chordal graph with k nodes with v cliques takes... |

150 |
Probabilistic Reasoning in Expert Systems
- Neapolitan
- 1990
(Show Context)
Citation Context ...re similarities. It has been shown that a Bayesian network can be represented as an extended relational data model (Wong, Xiang & Nie, 1994c). A probabilistic model (Hajek, Havranek & Jirousek, 1992; =-=Neapolitan, 1990-=-; Pearl, 1988) can actually be implemented as a generalized relational database (Wong, Butz & Xiang, 1995). Furthermore, the Chase, a relational technique for determining the implication of a set of d... |

117 | Multivalued dependencies and a new normal form for relational databases
- Fagin
- 1977
(Show Context)
Citation Context ... dependencies amongst the attributes. Dependencies such as functional dependencies (FDs) and multivalued dependencies (MVDs) have been studied extensively (Beeri, Fagin & Howard, 1977; Delobel, 1978; =-=Fagin, 1977-=-) as they play a key role in the design of desirable database schemas. In many applications, however, the exact relationships (i.e., data dependencies), determined by the semantics of the attributes i... |

81 | An algorithm for fast recovery of sparse causal graphs - Spirtes, Glymour - 1991 |

62 | A Complete Axiomatization for Functional and Multivalued Dependencies in Database Relations - Beeri, Fagin, et al. |

59 | KUTATO: An Entropy-Driven System for Construction of Probabilistic Expert Systems from Databases - Herskovits, Cooper - 1990 |

57 | A simplified universal relation assumption and its properties - Fagin, Mendelzon, et al. - 1982 |

41 |
Properties of Bayesian belief network learning algorithms
- Bouckaert
- 1994
(Show Context)
Citation Context ...a 3 ; a 4 ) = p(a 1 ; a 3 ) \Delta p(a 2 ) \Delta p(a 4 ) determined using the marginal distributions shown in Figure 20. As already stated, the problem of finding all MVDs which hold is NP-complete (=-=Bouckaert, 1994-=-). We should perhaps comment here why the MVD a 4 !! a 3 is not found. Algorithm 1 derives an acyclic hypergraph (i.e., a hypertree) using the observed data. Beeri et al. (Beeri et al., 1983) proved t... |

35 | Uncertain Information Processing in Expert Systems - Hájek, Havránek, et al. - 1992 |

31 |
Junction tree and decomposable hypergraphs
- Jensen
- 1988
(Show Context)
Citation Context ...ime complexity of the algorithm. Testing the chordality of G-pass can be performed in O(j N j) time (Golumbic, 1980). A hypertree (junction tree) can be computed by a maximal spanning tree algorithm (=-=Jensen, 1988-=-). A maximal spanning tree of a graph with v nodes and e links can be computed in O((v + e) log v) time (Manber, 1989). Since a complete graph has O(v 2 ) links, a maximal spanning tree can be compute... |

29 | A method for implementing a probabilistic model as a relational database - Wong, Butz, et al. - 1995 |

28 | Critical remarks on single link search in learning belief networks - Xiang, Wong, et al. - 1996 |

27 | Learning Bayesian networks: an approach based on the MDL principle - Lam, Bacchus - 1994 |

26 | Bottom-up Induction of Functional Dependencies from Relations - Savnik, Flach - 1993 |

23 | A ‘microscopic’ study of minimum entropy search in learning decomposable Markov networks - Xiang, Wong, et al. - 1997 |

21 | Construction of a Markov network from data for probabilistic inference
- Wong, Xiang
- 1994
(Show Context)
Citation Context ...ustrate the close relationship between MVD and probabilistic conditional independence. A formal examination of the relationship between MVD and probabilistic conditional independence can be found in (=-=Wong, 1994-=-a). Definition 2 Let r[R] be a relation with X;Y ` R. We say that the multivalued dependency (MVD) X !! Y holds on relation r[R] if for any t 1 ; t 2 2 r with t 1 [X] = t 2 [X], there exists a tuple t... |

20 | The Logic of Representing Dependencies by Directed Graphs - Pearl, Verma - 1987 |

19 |
Decomposing a relation into a tree of binary relations
- Dechter
- 1990
(Show Context)
Citation Context ...en probabilistic reasoning systems and traditional relational database systems. In fact, MVD is equivalent to probabilistic conditional independence in a uniform distribution. While many researchers (=-=Dechter, 1990-=-; Hill, 1993; Lauritzen & Spiegelhalter, 1988; Lee, 1983; Pearl and Verma, 1987; Pearl, 1988) have noticed similarities between these two distinct but closely related knowledge systems, the relationsh... |

19 |
An axiomatic study of computation in hypertrees
- Shafer
- 1991
(Show Context)
Citation Context ...ditional independencies from a joint distribution to discover multivalued dependencies in observed raw data. Before discussing our learning algorithm, let us first introduce the notions of hypertree (=-=Shafer, 1991-=-) and Markov distributions (Hajek, Havranek & Jirousek, 1992). Let N = fa 1 ; a 2 ; :::; amg denote a set of attributes. We say that G is a hypergraph, if G is a subset of the power set 2 N . An eleme... |

17 |
An Extended Relational Data Model for Probabilistic Reasoning
- Wong
- 1997
(Show Context)
Citation Context ...ustrate the close relationship between MVD and probabilistic conditional independence. A formal examination of the relationship between MVD and probabilistic conditional independence can be found in (=-=Wong, 1994-=-a). Definition 2 Let r[R] be a relation with X;Y ` R. We say that the multivalued dependency (MVD) X !! Y holds on relation r[R] if for any t 1 ; t 2 2 r with t 1 [X] = t 2 [X], there exists a tuple t... |

13 | Representation of Bayesian networks as relational databases
- Wong, Xiang, et al.
- 1994
(Show Context)
Citation Context ...uation (15). Thus, for a given p, minimizing I(p; p 0 ) is equivalent to minimizing the entropy H(p 0 ). That is, min p 00 2fp 0 g (I(p; p 00 )) = min p002fp0g (H(p 00 )): (22) An approximate method (=-=Wong & Xiang, 1994-=-b) for computing p 0 is outlined as follows. Initially, we may assume that all the attributes are probabilistically independent, i.e., there exists no edge between any two nodes (attributes) in the un... |

12 |
Algebraic theory of relational databases
- Lee
- 1983
(Show Context)
Citation Context ...l database systems. In fact, MVD is equivalent to probabilistic conditional independence in a uniform distribution. While many researchers (Dechter, 1990; Hill, 1993; Lauritzen & Spiegelhalter, 1988; =-=Lee, 1983-=-; Pearl and Verma, 1987; Pearl, 1988) have noticed similarities between these two distinct but closely related knowledge systems, the relationship delves far beyond mere similarities. It has been show... |

11 |
Normalization and hierarchical dependencies in the relational data model
- Delobel
- 1978
(Show Context)
Citation Context ...on the semantic dependencies amongst the attributes. Dependencies such as functional dependencies (FDs) and multivalued dependencies (MVDs) have been studied extensively (Beeri, Fagin & Howard, 1977; =-=Delobel, 1978-=-; Fagin, 1977) as they play a key role in the design of desirable database schemas. In many applications, however, the exact relationships (i.e., data dependencies), determined by the semantics of the... |

8 | Sound probabilistic inference in Prolog: an executable specification of influence diagrams - Poole, Neufeld - 1988 |

1 | Inductive characterization of database relations. Methodologies for Intelligent Systems - Flach - 1990 |

1 |
The recovery of causal play-trees from statistical data
- Rebane
- 1987
(Show Context)
Citation Context ...itional independencies in probabilistic reasoning. There are many possible techniques (Cooper & Herskovits, 1992; Heckerman, Geiger & Chickering, 1995; Herskovits & Cooper, 1990; Lam & Bacchus, 1994; =-=Rebane, 1987-=-; Spirtes & Glymour, 1991) for learning a Bayesian distribution from observed data. As already mentioned, a Bayesian distribution involves embedded conditional independencies, whereas a Markov distrib... |