Results 1  10
of
31
Topologyfree querying of protein interaction networks
 In Proceedings of 13th RECOMB
, 2009
"... Abstract. In the network querying problem, one is given a protein complex or pathway of species A and a protein–protein interaction network of species B; the goal is to identify subnetworks of B that are similar to the query. Existing approaches mostly depend on knowledge of the interaction topology ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In the network querying problem, one is given a protein complex or pathway of species A and a protein–protein interaction network of species B; the goal is to identify subnetworks of B that are similar to the query. Existing approaches mostly depend on knowledge of the interaction topology of the query in the network of species A; however, in practice, this topology is often not known. To combat this problem, we develop a topologyfree querying algorithm, which we call Torque. Given a query, represented as a set of proteins, Torque seeks a matching set of proteins that are sequencesimilar to the query proteins and span a connected region of the network, while allowing both insertions and deletions. The algorithm uses alternatively dynamic programming and integer linear programming for the search task. We test Torque with queries from yeast, fly, and human, where we compare it to the QNet topologybased approach, and with queries from less studied species, where only topologyfree algorithms apply. Torque detects many more matches than QNet, while in both cases giving results that are highly functionally coherent. 1
GADDI: Distance index based subgraph matching in biological networks
 In Proceedings of the 12th international conference on extending database technology (EDBT’09
, 2009
"... Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain thousands (or more) vertices. Most of the previous work focuses on indexing a set of small or medium sized database graphs (with only tens of vertices) and finding whether a query graph occurs in any of these. In this paper, we are interested in finding all the matches of a query graph in a given large graph of thousands of vertices, which is a very important task in many biological applications. This increases the complexity significantly. We propose a novel distance measurement which reintroduces the idea of frequent substructures in a single large graph. We devise the novel structure distance based approach (GADDI) to efficiently find matches of the query graph. GADDI is further optimized by the use of a dynamic matching scheme to minimize redundant calculations. Last but not least, a number of real and synthetic data sets are used to evaluate the efficiency and scalability of our proposed method. 1.
SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs
"... With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise (e.g., mi ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise (e.g., missing edges) in the large database graph, we investigate the problem of approximate subgraph indexing, i.e., finding the occurrences of a query graph in a large database graph with (possible) missing edges. The SAPPER method is proposed to solve this problem. Utilizing the hybrid neighborhood unit structures in the index, SAPPER takes advantage of pregenerated random spanning trees and a carefully designed graph enumeration order. Real and synthetic data sets are employed to demonstrate the efficiency and scalability of our approximate subgraph indexing method.
Parameterized Algorithms and Hardness Results for Some Graph Motif Problems
"... Abstract. We study the NPcomplete Graph Motif problem: given a vertexcolored graph G = (V, E) and a multiset M of colors, does there exist an S ⊆ V such that G[S] is connected and carries exactly (also with respect to multiplicity) the colors in M? We present an improved randomized algorithm for G ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We study the NPcomplete Graph Motif problem: given a vertexcolored graph G = (V, E) and a multiset M of colors, does there exist an S ⊆ V such that G[S] is connected and carries exactly (also with respect to multiplicity) the colors in M? We present an improved randomized algorithm for Graph Motif with running time O(4.32 M  · M  2 · E). We extend our algorithm to listcolored graph vertices and the case where the motif G[S] needs not be connected. By way of contrast, we show that extending the request for motif connectedness to the somewhat “more robust ” motif demands of biconnectedness or bridgeconnectedness leads to W[1]complete problems. Actually, we show that the presumably simpler problems of finding (uncolored) biconnected or bridgeconnected subgraphs are W[1]complete with respect to the subgraph size. Answering an open question from the literature, we further show that the parameter “number of connected motif components ” leads to W[1]hardness even when restricted to graphs that are paths. 1
Counting Stars and Other Small Subgraphs in Sublinear Time
"... Detecting and counting the number of copies of certain subgraphs (also known as network motifs or graphlets), is motivated by applications in a variety of areas ranging from Biology to the study of the WorldWideWeb. Several polynomialtime algorithms have been suggested for counting or detecting t ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Detecting and counting the number of copies of certain subgraphs (also known as network motifs or graphlets), is motivated by applications in a variety of areas ranging from Biology to the study of the WorldWideWeb. Several polynomialtime algorithms have been suggested for counting or detecting the number of occurrences of certain network motifs. However, a need for more efficient algorithms arises when the input graph is very large, as is indeed the case in many applications of motif counting. In this paper we design sublineartime algorithms for approximating the number of copies of certain constantsize subgraphs in a graph G. That is, our algorithms do not read the whole graph, but rather query parts of the graph. Specifically, we consider algorithms that may query the degree of any vertex of their choice and may ask for any neighbor of any vertex of their choice. The main focus of this work is on the basic problem of counting the number of length2 paths and more generally on counting the number of stars of a certain size. Specifically, we design an algorithm that, given an approximation parameter 0 < ɛ < 1 and query access to a graph G, outputs an estimate ˆνs such that with high constant probability, (1−ɛ)νs(G) ≤ ˆνs ≤ (1+ɛ)νs(G), where νs(G) denotes the number of stars of size s + 1 in the graph. The expected query ( complexity and { running time of}) the algorithm are O
Algorithm Engineering for ColorCoding with Applications to Signaling Pathway Detection
, 2007
"... Colorcoding is a technique to design fixedparameter algorithms for several NPcomplete subgraph isomorphism problems. Somewhat surprisingly, not much work has so far been spent on the actual implementation of algorithms that are based on colorcoding, despite the elegance of this technique and its ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Colorcoding is a technique to design fixedparameter algorithms for several NPcomplete subgraph isomorphism problems. Somewhat surprisingly, not much work has so far been spent on the actual implementation of algorithms that are based on colorcoding, despite the elegance of this technique and its wide range of applicability to practically important problems. This work gives various novel algorithmic improvements for colorcoding, both from a worstcase perspective as well as under practical considerations. We apply the resulting implementation to the identification of signaling pathways in protein interaction networks, demonstrating that our improvements speed up the colorcoding algorithm by orders of magnitude over previous implementations. This allows more complex and larger structures to be identified in reasonable time; many biologically relevant instances can even be solved in seconds where, previously, hours were required.
Parameterized Algorithmics for Finding Connected Motifs in Biological Networks
 IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
"... We study the NPhard LISTCOLORED GRAPH MOTIF problem which, given an undirected listcolored graph G = (V, E) and a multiset M of colors, asks for maximumcardinality sets S ⊆ V and M ′ ⊆ M such that G[S] is connected and contains exactly (with respect to multiplicity) the colors in M ′. LISTCOLO ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We study the NPhard LISTCOLORED GRAPH MOTIF problem which, given an undirected listcolored graph G = (V, E) and a multiset M of colors, asks for maximumcardinality sets S ⊆ V and M ′ ⊆ M such that G[S] is connected and contains exactly (with respect to multiplicity) the colors in M ′. LISTCOLORED GRAPH MOTIF has applications in the analysis of biological networks. We study LISTCOLORED GRAPH MOTIF with respect to three different parameterizations. For the parameters motif size M  and solution size S  we present fixedparameter algorithms, whereas for the parameter V −M  we show W[1]hardness for general instances and achieve fixedparameter tractability for a special case of LISTCOLORED GRAPH MOTIF. We implemented the fixedparameter algorithms for parameters M  and S, developed further speedup heuristics for these algorithms, and applied them in the context of querying proteininteraction networks, demonstrating their usefulness for realistic instances. Furthermore, we show that extending the request for motif connectedness to stronger demands such as biconnectedness or bridgeconnectedness leads to W[1]hard problems when the parameter is the motif size M.
Approximating the number of Network Motifs
"... Abstract. World Wide Web, the Internet, coupled biological and chemical systems, neural networks, and social interacting species, are only a few examples of systems composed by a large number of highly interconnected dynamical units. These networks contain characteristic patterns, termed network mot ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. World Wide Web, the Internet, coupled biological and chemical systems, neural networks, and social interacting species, are only a few examples of systems composed by a large number of highly interconnected dynamical units. These networks contain characteristic patterns, termed network motifs, which occur far more often than in randomized networks with the same degree sequence. Several algorithms have been suggested for counting or detecting the number of induced or noninduced occurrences of network motifs in the form of trees and bounded treewidth subgraphs of size O(log n), and of size at most 7 for some motifs. In addition, counting the number of motifs a node is part of was recently suggested as a method to classify nodes in the network. The promise is that the distribution of motifs a node participate in is an indication of its function in the network. Therefore, counting the number of network motifs anodeispartofprovides a major challenge. However, no such practical algorithm exists. We present several algorithms with time complexity O ( e 2k k · n ·E·log 1 δ /ɛ2) that, for the first time, approximate for every vertex the number of noninduced occurrences of the motif the vertex is part of, for klength cycles, klength cycles with a chord, and (k − 1)length paths, where k = O(log n), and for all motifs of size of at most four. In addition, we show algorithms that approximate the total number of noninduced occurrences of these network motifs, when no efficient algorithm exists. Some of our algorithms use the color coding technique.