Results 1 - 10
of
42
Tensor Decompositions and Applications
- SIAM REVIEW
, 2009
"... This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ⥠3) have applications in psychometrics, chemometrics, signal proce ..."
Abstract
-
Cited by 95 (13 self)
- Add to MetaCart
This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N -way array. Decompositions of higher-order tensors (i.e., N -way arrays with N ⥠3) have applications in psychometrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, etc. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decompo-
sition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal components analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox and Tensor Toolbox, both for MATLAB, and the Multilinear Engine are examples of software packages for working with tensors.
Efficient MATLAB computations with sparse and factored tensors
- SIAM JOURNAL ON SCIENTIFIC COMPUTING
, 2007
"... In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose stori ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: A Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.
Unsupervised multiway data analysis: A literature survey
- IEEE Transactions on Knowledge and Data Engineering
, 2008
"... Multiway data analysis captures multilinear structures in higher-order datasets, where data have more than two modes. Standard two-way methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. With increasing number of application areas, multiway data anal ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Multiway data analysis captures multilinear structures in higher-order datasets, where data have more than two modes. Standard two-way methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. With increasing number of application areas, multiway data analysis has become popular as an exploratory analysis tool. We provide a review of significant contributions in literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, computer vision, and social network analysis. 1.
Scalable tensor decompositions for multi-aspect data mining
- In ICDM 2008: Proceedings of the 8th IEEE International Conference on Data Mining
, 2008
"... Modern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation for such data. Consequently, tensor decompositi ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Modern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation for such data. Consequently, tensor decompositions such as Tucker become important tools for summarization and analysis. One major challenge is how to deal with highdimensional, sparse data. In other words, how do we compute decompositions of tensors where most of the entries of the tensor are zero. Specialized techniques are needed for computing the Tucker decompositions for sparse tensors because standard algorithms do not account for the sparsity of the data. As a result, a surprising phenomenon is observed by practitioners: Despite the fact that there is enough memory to store both the input tensors and the factorized output tensors, memory overflows occur during the tensor factorization process. To address this intermediate blowup problem, we propose Memory-Efficient Tucker (MET). Based on the available memory, MET adaptively selects the right execution strategy during the decomposition. We provide quantitative and qualitative evaluation of MET on real tensors. It achieves over 1000X space reduction without sacrificing speed; it also allows us to work with much larger tensors that were too big to handle before. Finally, we demonstrate a data mining case-study using MET. 1
Proximity Tracking on Time-Evolving Bipartite Graphs
"... Given an author-conference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large time-evolving bipartite graphs appear in many settings, such as social networks, co-citations, market-basket analysis, and co ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Given an author-conference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large time-evolving bipartite graphs appear in many settings, such as social networks, co-citations, market-basket analysis, and collaborative filtering. Our goal is to monitor (i) the centrality of an individual node (e.g., who are the most important authors?); and (ii) the proximity of two nodes or sets of nodes (e.g., who are the most important authors with respect to a particular conference?) Moreover, we want to do this efficiently and incrementally, and to provide “any-time ” answers. We propose pTrack and cTrack, which are based on random walk with restart, and use powerful matrix tools. Experiments on real data show that our methods are effective and efficient: the mining results agree with intuition; and we achieve up to 15∼176 times speed-up, without any quality loss. 1
Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams
"... Data stream values are often associated with multiple aspects. For example, each value from environmental sensors may have an associated type (e.g., temperature, humidity, etc) as well as location. Aside from timestamp, type and location are the two additional aspects. How to model such streams? How ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Data stream values are often associated with multiple aspects. For example, each value from environmental sensors may have an associated type (e.g., temperature, humidity, etc) as well as location. Aside from timestamp, type and location are the two additional aspects. How to model such streams? How to simultaneously find patterns within and across the multiple aspects? How to do it incrementally in a streaming fashion? In this paper, all these problems are addressed through a general data model, tensor streams, and an effective algorithmic framework, window-based tensor analysis (WTA). Two variations of WTA, independentwindow tensor analysis (IW) and moving-window tensor analysis (MW), are presented and evaluated extensively on real datasets. Finally, we illustrate one important application, Multi-Aspect Correlation Analysis (MACA), which uses WTA and we demonstrate its effectiveness on an environmental monitoring application. 1
Colibri: Fast Mining of Large Static and Dynamic Graphs
"... Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have thousands or millions of nodes, but are usually very sparse. However, standard decompositions such as SVD do not preserve sparsity. This has led to the development of methods such as CUR and CMD, which seek a nonorthogonal basis by sampling the columns and/or rows of the sparse matrix. However, these approaches will typically produce overcomplete bases, which wastes both space and time. In this paper we propose the family of Colibri methods to deal with these challenges. Our version for static graphs, Colibri-S, iteratively finds a nonredundant basis and we prove that it has no loss of accuracy compared to the best competitors (CUR and CMD), while achieving significant savings in space and time: on real data, Colibri-S requires much less space and is orders of magnitude faster (in proportion to the square of the number of non-redundant columns). Additionally, we propose an efficient update algorithm for dynamic, time-evolving graphs, Colibri-D. Our evaluation on a large, real network traffic dataset shows that Colibri-D is over 100 times faster than the best published competitor (CMD).
Closed Patterns Meet n-ary Relations
, 2009
"... Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be import ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be important for many applications, for example, when adding the time dimension to the popular objects × features binary case. The generality of the task (no assumption being made on the relation arity or on the size of its attribute domains) makes it computationally challenging. We introduce an algorithm called DATA-PEELER. From an n-ary relation, it extracts all closed n-sets satisfying given piecewise (anti) monotonic constraints. This new class of constraints generalizes both monotonic and antimonotonic constraints. Considering the special case of ternary relations, DATA-PEELER outperforms the state-of-the-art algorithms CUBEMINER and TRIAS by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing to efficiently enforce the closeness property. The relevance of the extracted closed n-sets is assessed on real-life 3-and 4-ary relations. Beyond natural 3-or 4-ary relations, expanding a relation with an additional attribute can help in enforcing rather abstract constraints such as the robustness with respect to binarization. Furthermore, a collection of closed n-sets is shown to be an excellent starting point
Multivis: Content-based social network exploration through multi-way visual analysis
- In SDM
, 2009
"... With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge datasets to efficiently extract patterns, 2) data understanding: how to facilitate understanding of the patterns by users? To address both aspects of the scalability challenge, we present a hybrid approach that leverages two complementary disciplines, data mining and information visualization. In particular, we propose 1) an analytic data model for content-based networks using tensors; 2) an efficient high-order clustering framework for analyzing the data; 3) a scalable context-sensitive graph visualization to present the clusters. We evaluate the proposed methods using both synthetic and real datasets. In terms of computational efficiency, the proposed methods are an order of magnitude faster compared to the baseline. In terms of effectiveness, we present several case studies of real corporate social networks. 1
Learning patterns in the dynamics of biological networks
- In KDD
, 2009
"... Our dynamic graph-based relational mining approach has been developed to learn structural patterns in biological networks as they change over time. The analysis of dynamic networks is important not only to understand life at the system-level, but also to discover novel patterns in other structural d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Our dynamic graph-based relational mining approach has been developed to learn structural patterns in biological networks as they change over time. The analysis of dynamic networks is important not only to understand life at the system-level, but also to discover novel patterns in other structural data. Most current graph-based data mining approaches overlook dynamic features of biological networks, because they are focused on only static graphs. Our approach analyzes a sequence of graphs and discovers rules that capture the changes that occur between pairs of graphs in the sequence. These rules represent the graph rewrite rules that the first graph must go through to be isomorphic to the second graph. Then, our approach feeds the graph rewrite rules into a machine learning system that learns general transformation rules describing the types of changes that occur for a class of dynamic biological networks. The discovered graph-rewriting rules show how biological networks change over time, and the transformation rules show the repeated patterns in the structural changes. In this paper, we apply our approach to biological networks to evaluate our approach and to understand how the biosystems change over time. We evaluate our results using coverage and prediction metrics, and compare to biological literature.

