Results 1 - 10
of
23
Inventing discovery tools: combining information visualization with data mining
- Information Visualization
, 2002
"... The growing use of information visualization tools and data mining algorithms stems from two separate lines of research. Information visualization researchers believe in the importance of giving users an overview and insight into the data distributions, while data mining researchers believe that sta ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
The growing use of information visualization tools and data mining algorithms stems from two separate lines of research. Information visualization researchers believe in the importance of giving users an overview and insight into the data distributions, while data mining researchers believe that statistical algorithms and machine learning can be relied on to find the interesting patterns. This paper discusses two issues that influence design of discovery tools: statistical algorithms vs. visual data presentation, and hypothesis testing vs. exploratory data analysis. I claim that a combined approach could lead to novel discovery tools that preserve user control, enable more effective exploration, and promote responsibility.
Combining Visual and Automated Data Mining for Near-Real-Time Anomaly Detection and Analysis in BGP
, 2004
"... The security of Internet routing is a major concern because attacks and errors can result in data packets not reaching their intended destination and/or falling into the wrong hands. A key step in improving routing security is to analyze and understand it. In the past, we and other researchers have ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The security of Internet routing is a major concern because attacks and errors can result in data packets not reaching their intended destination and/or falling into the wrong hands. A key step in improving routing security is to analyze and understand it. In the past, we and other researchers have presented various visual-based, statistical-based, and signature-based methods of analyzing Internet routing data. In this paper, we describe an integration of visual and automated data mining methods for discovering and investigating anomalies in Internet routing. We show how these different components are combined in such a way as to complement each other, creating a very effective and useful analysis tool. In addition to performing analysis on archived data, our system is able to collect, process and visualize data in near-real-time.
PaintingClass: interactive construction, visualization and exploration of decision trees
- In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
, 2003
"... Decision trees are commonly used for classification. We propose to use decision trees not just for classification but also for the wider purpose of knowledge discovery, because visualizing the decision tree can reveal much valuable information in the data. We introduce PaintingClass, a system for in ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Decision trees are commonly used for classification. We propose to use decision trees not just for classification but also for the wider purpose of knowledge discovery, because visualizing the decision tree can reveal much valuable information in the data. We introduce PaintingClass, a system for interactive construction, visualization and exploration of decision trees. PaintingClass provides an intuitive layout and convenient navigation of the decision tree. PaintingClass also provides the user the means to interactively construct the decision tree. Each node in the decision tree is displayed as a visual projection of the data. Through actual examples and comparison with other classification methods, we show that the user can effectively use PaintingClass to construct a decision tree and explore the decision tree to gain additional knowledge.
Towards Effective and Interpretable Data Mining by Visual Interaction
- In SIGKDD Explorations
, 2002
"... The primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data. However, many of the current formalizations of data mining algorithms have not quite reached this goal. One of the reasons for this is that the focus on ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data. However, many of the current formalizations of data mining algorithms have not quite reached this goal. One of the reasons for this is that the focus on using purely automated techniques has imposed several constraints on data mining algorithms. For example, any data mining problem such as clustering or association rules requires the specification of particular problem formulations, objective functions, and parameters. Such systems fail to take the user's needs into account very effectively. This makes it necessary to keep the user in the loop in a way which is both efficient and interpretable. One unique way of achieving this is by leveraging human visual perceptions on intermediate data mining results. Such a system combines the computational power of a computer and the intuitive abilities of a human to provide solutions which cannot be achieved by either. This paper will discuss a number of recent approaches to several data mining algorithms along these lines.
Towards Meaningful High-Dimensional Nearest Neighbor Search by Human-Computer Interaction
- In ICDE
, 2002
"... Nearest Neighbor search is an important and widely used problem in a number of important application domains. In many of these domains, the dimensionality of the data representation is often very high. Recent theoretical results have shown that the concept of proximity or nearest neighbors may not b ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Nearest Neighbor search is an important and widely used problem in a number of important application domains. In many of these domains, the dimensionality of the data representation is often very high. Recent theoretical results have shown that the concept of proximity or nearest neighbors may not be very meaningful for the high dimensional case. Therefore, it is often a complex problem to find good quality nearest neighbors in such data sets. Furthermore, it is also difficult to judge the value and relevance of the returned results. In fact, it is hard for any fully automated system to satisfy a user about the quality of the nearest neighbors found unless he is directly involved in the process. This is especially the case for high dimensional data in which the meaningfulness of the nearest neighbors found is questionable. In this paper, we address the complex problem of high dimensional nearest neighbor search from the user perspective by designing a system which uses effective cooperation between the human and the computer. The system provides the user with visual representations of carefully chosen subspaces of the data in order to repeatedly elicit his preferences about the data patterns which are most closely related to the query point. These preferences are used in order to determine and quantify the meaningfulness of the nearest neighbors. Our system is not only able to find and quantify the meaningfulness of the nearest neighbors, but is also able to diagnose situations in which the nearest neighbors found are truly not meaningful.
Neural Projection Techniques for the Visual Inspection of Network Traffic
"... A crucial aspect in network monitoring for security purposes is the visual inspection of the traffic pattern, mainly aimed to provide the network manager with a synthetic and intuitive representation of the current situation. Toward that end, neural projection techniques can map high-dimensional dat ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
A crucial aspect in network monitoring for security purposes is the visual inspection of the traffic pattern, mainly aimed to provide the network manager with a synthetic and intuitive representation of the current situation. Toward that end, neural projection techniques can map high-dimensional data into a low-dimensional space adaptively, for the user-friendly visualization of monitored network traffic. This work proposes two projection methods, namely, Cooperative Maximum Likelihood Hebbian Learning and Auto-Associative Back-Propagation networks, for the visual inspection of network traffic. This set of methods may be seen as a complementary tool in network security as it allows the visual inspection and comprehension of the traffic data internal structure. The proposed methods have been evaluated in two complementary and practical networksecurity scenarios: the on-line processing of network traffic at packet level, and the offline processing of connection records, e.g. for post-mortem analysis or batch investigation. The empirical verification of the projection methods involved two experimental domains derived from the standard corpora for evaluation of computer network intrusion detection: the MIT Lincoln Laboratory DARPA dataset. 1.
Model-Driven Visual Analytics
"... We describe a Visual Analytics (VA) infrastructure, rooted on techniques in machine learning and logic-based deductive reasoning that will assist analysts to make sense of large, complex data sets by facilitating the generation and validation of models representing relationships in the data. We use ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We describe a Visual Analytics (VA) infrastructure, rooted on techniques in machine learning and logic-based deductive reasoning that will assist analysts to make sense of large, complex data sets by facilitating the generation and validation of models representing relationships in the data. We use Logic Programming (LP) as the underlying computing machinery to encode the relations as rules and facts and compute with them. A unique aspect of our approach is that the LP rules are automatically learned, using Inductive Logic Programming, from examples of data that the analyst deems interesting when viewing the data in the highdimensional visualization interface. Using this system, analysts will be able to construct models of arbitrary relationships in the data, explore the data for scenarios that fit the model, refine the model if necessary, and query the model to automatically analyze incoming (future) data exhibiting the encoded relationships. In other words it will support both model-driven data exploration, as well as data-driven model evolution. More importantly, by basing the construction of models on techniques from machine learning and logic-based deduction, the VA process will be both flexible in terms of modeling arbitrary, user-driven relationships in the data as well as readily scale across different data domains.
StarClass: Interactive Visual Classification Using Star Coordinates
- SIAM SDM
, 2003
"... Classification operations in a data-mining task are often performed using decision trees. The visual-based approach to decision tree construction has gained increasing popularity. We present Star-Class, a new interactive visual classification method. This method maps multi-dimensional data to the vi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Classification operations in a data-mining task are often performed using decision trees. The visual-based approach to decision tree construction has gained increasing popularity. We present Star-Class, a new interactive visual classification method. This method maps multi-dimensional data to the visual display using star coordinates, allowing the user to interact with the display to create a decision tree. Preliminary evaluation indicates that this new technique is as effective as state-of-the-art algorithmic classification methods, and more effective than the previous visual-based methods. Star-Class also offers additional advantages such as improving the user’s understanding of the data. ¡£¢¥¤§¦©¨������� � visual data mining, classification, decision trees, information visualization, interactive visualization 1 Introduction.
Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery
- SigKDD Explorations Journal
, 2009
"... The aim of this work is to survey and reflect on the various ways visualization and data mining can be integrated to achieve effective knowledge discovery by involving the best of human and machine capabilities. Following a bottom-up bibliographic research approach, the article categorizes the obser ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The aim of this work is to survey and reflect on the various ways visualization and data mining can be integrated to achieve effective knowledge discovery by involving the best of human and machine capabilities. Following a bottom-up bibliographic research approach, the article categorizes the observed techniques in classes, highlighting current trends, gaps, and potential future directions for research. In particular it looks at strengths and weaknesses of information visualization (infovis) and data mining, and for which purposes researchers in infovis use data mining techniques and reversely how researchers in data mining employ infovis techniques. The article then proposes, on the basis of the extracted patterns, a series of potential extensions not found in literature. Finally, we use this information to analyze the discovery process by comparing the analysis steps from the perspective of information visualization and data mining. The comparison brings to light new perspectives on how mining and visualization can best employ human and machine strengths. This activity leads to a series of reflections and research questions that can help to further advance the science of visual analytics.
Efficient mining of understandable patterns from multivariate interval time series
- Data Mining and Knowledge Discovery
, 2007
"... Abstract. We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. The patterns have a hierarchical structure, with levels corresponding to the temporal concepts duration, coincidence, and partial order. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. The search for coincidence and partial order in interval data can be formulated as instances of the well known frequent itemset problem. Efficient algorithms for the discovery of the patterns are adapted accordingly. A novel form of search space pruning effectively reduces the size of the mining result to ease interpretation and speed up the algorithms. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using two real life data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field. Keywords: knowledge discovery, time series, interval patterns, Allen’s relations 1

