Results 1 -
5 of
5
Statistical Learning Techniques for Costing XML Queries
- In VLDB
, 2005
"... Developing cost models for query optimization is significantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called C ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Developing cost models for query optimization is significantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called Comet, to modeling the cost of XML operators; to our knowledge, Comet is the first method ever proposed for addressing the XML query costing problem. As in relational cost estimation, Comet exploits a set of system catalog statistics that summarizes the XML data; the set of "simple path" statistics that we propose is new, and is well suited to the XML setting.
Sequential Cost-Sensitive Decision Making with Reinforcement Learning
, 2002
"... Recently, there has been increasing interest in the issues of cost-sensitive learning and decision making in a variety of applications of data mining. A number of approaches have been developed that are effective at optimizing cost-sensitive decisions when each decision is considered in isolation. H ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Recently, there has been increasing interest in the issues of cost-sensitive learning and decision making in a variety of applications of data mining. A number of approaches have been developed that are effective at optimizing cost-sensitive decisions when each decision is considered in isolation. However, the issue of sequential decision making, with the goal of maximizing total benefits accrued over a period of time instead of immediate benefits, has rarely been addressed. In the present paper, we propose a novel approach to sequential decision making based on the reinforcement learning framework. Our approach attempts to learn decision rules that optimize a sequence of cost-sensitive decisions so as to maximize the total benefits accrued over time. We use the domain of targeted marketing as a testbed for empirical evaluation of the proposed method. We conducted experiments using approximately two years of monthly promotion data derived from the well-known KDD Cup 1998 donation data set. The experimental results show that the proposed method for optimizing total accrued benefits outperforms the usual targeted-marketing methodology of optimizing each promotion in isolation. We also analyze the behavior of the targeting rules that were obtained and discuss their appropriateness to the application domain.
Transform Regression and the Kolmogorov Superposition Theorem
- IBM Research Report RC 23227, IBM Research Division, Yorktown Heights, NY 10598
, 2004
"... This paper presents a new predictive modeling algorithm that draws inspiration from the Kolmogorov superposition theorem. An initial version of the algorithm is presented that combines gradient boosting with decisiontree methods to construct models that have the same overall mathematical structure a ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This paper presents a new predictive modeling algorithm that draws inspiration from the Kolmogorov superposition theorem. An initial version of the algorithm is presented that combines gradient boosting with decisiontree methods to construct models that have the same overall mathematical structure as Kolmogorov’s superposition equation. Improvements to the algorithm are then presented that significantly increase its rate of convergence. The resulting algorithm, dubbed “transform regression,” generates surprisingly good models compared to those produced by the underlying decision-tree method when the latter is applied outside the transform regression framework. 1
A grid-based approach for enterprise-scale data mining, Future Generation Computer Systesm 23
, 2007
"... Abstract — We describe a grid-based approach for enterprisescale data mining that leverages database technology for I/O parallelism, and on-demand compute servers for compute parallelism in the statistical computations. By enterprise-scale, we mean the highly-automated use of data mining in vertical ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — We describe a grid-based approach for enterprisescale data mining that leverages database technology for I/O parallelism, and on-demand compute servers for compute parallelism in the statistical computations. By enterprise-scale, we mean the highly-automated use of data mining in vertical business applications, where the data is stored on one or more relational database systems, and where a distributed architecture comprising of high-performance compute servers or a network of low-cost, commodity processors is used to improve application performance and provide the application deployment flexibility for overall workload management. The approach relies on an algorithmic decomposition of the data mining kernel on the data and compute grids, which makes it possible to exploit the parallelism on the respective grids in a simple way, while minimizing the data transfer between them. The overall approach is compatible with existing database standards for data mining task specification and results reporting, and hence external applications using these standardsbased interfaces do not have to be modified in order to realize the benefits of this grid-based approach. Index Terms—Data mining, Grid computing, Predictive modeling, Parallel databases. Data-mining technologies that automate the generation and application of statistical models from data are of interest in a variety of applications cutting across industry sectors. These applications include, for example, customer relationship management (Retail, Banking and Finance, Telecom), fraud detection (Banking and Finance, Telecom), lead generation
Empirical Comparison of Various Reinforcement Learning Strategies for Sequential Targeted Marketing
- In Proceedings of the IEEE International Conference on Data Mining
, 2002
"... We empirically evaluate the performance of various reinforcement learning methods in applications to sequential targeted marketing. In particular, we propose and evaluate a progression of reinforcement learning methods, ranging from the "direct" or "batch" methods to "indirect" or "simulation based" ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We empirically evaluate the performance of various reinforcement learning methods in applications to sequential targeted marketing. In particular, we propose and evaluate a progression of reinforcement learning methods, ranging from the "direct" or "batch" methods to "indirect" or "simulation based" methods, and those that we call "semidirect " methods that fall between them. We conduct a number of controlled experiments to evaluate the performance of these competing methods. Our results indicate that while the indirect methods can perform better in a situation in which nearly perfect modeling is possible, under the more realistic situations in which the system's modeling parameters have restricted attention, the indirect methods' performance tend to degrade. We also show that semi-direct methods are effective in reducing the amount of computation necessary to attain a given level of performance, and often result in more profitable policies.

