Results 11 - 20
of
4,362
Mining time-changing data streams
- IN PROC. OF THE 2001 ACM SIGKDD INTL. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2001
"... Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a station-ary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying pro-cesses genera ..."
Abstract
-
Cited by 338 (5 self)
- Add to MetaCart
Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a station-ary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying pro
Frequent Subgraph Discovery
, 2001
"... Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of th ..."
Abstract
-
Cited by 406 (10 self)
- Add to MetaCart
of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a
The Determinants of Credit Spread Changes.
- Journal of Finance
, 2001
"... ABSTRACT Using dealer's quotes and transactions prices on straight industrial bonds, we investigate the determinants of credit spread changes. Variables that should in theory determine credit spread changes have rather limited explanatory power. Further, the residuals from this regression are ..."
Abstract
-
Cited by 422 (2 self)
- Add to MetaCart
is that, after controlling for these aggregate determinants, the systematic movement of credit spread changes still remains, and indeed, is the dominant factor. Brown The rest of the paper is organized as follows. In Section I, we examine the theoretical determinants of credit spread changes from a
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we surve ..."
Abstract
-
Cited by 349 (23 self)
- Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we
Interactive Control of Avatars Animated with Human Motion Data
, 2002
"... Real-time control of three-dimensional avatars is an important problem in the context of computer games and virtual environments. Avatar animation and control is difficult, however, because a large repertoire of avatar behaviors must be made available, and the user must be able to select from this s ..."
Abstract
-
Cited by 369 (38 self)
- Add to MetaCart
for controlling avatar motion using this data structure: the user selects from a set of available choices, sketches a path through an environment, or acts out a desired motion in front of a video camera. We demonstrate the flexibility of the approach through four different applications and compare the avatar
Mining anomalies using traffic feature distributions
- In ACM SIGCOMM
, 2005
"... The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue tha ..."
Abstract
-
Cited by 322 (8 self)
- Add to MetaCart
The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue
Finding Interesting Rules from Large Sets of Discovered Association Rules
, 1994
"... Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Efficient methods exist for discovering association rules from large collections o ..."
Abstract
-
Cited by 240 (9 self)
- Add to MetaCart
of data. The number of discovered rules can, however, be so large that browsing the rule set and finding interesting rules from it can be quite difficult for the user. We show how a simple formalism of rule templates makes it possible to easily describe the structure of interesting rules. We also give
An effective hash-based algorithm for mining association rules
, 1995
"... In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transac ..."
Abstract
-
Cited by 283 (3 self)
- Add to MetaCart
where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm
Efficient data mining for path traversal patterns
- IEEE Transactions on Knowledge and Data Engineering
, 1998
"... Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we der ..."
Abstract
-
Cited by 217 (16 self)
- Add to MetaCart
Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
"... DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by s ..."
Abstract
-
Cited by 273 (27 self)
- Add to MetaCart
made up of thousands of computers, ensures efficient, reliable execution of this plan. We describe the implementation of the DryadLINQ compiler and runtime. We evaluate DryadLINQ on a varied set of programs drawn from domains such as web-graph analysis, large-scale log mining, and machine learning. We
Results 11 - 20
of
4,362