Results 1 -
6 of
6
Reducing the risk of query expansion via robust constrained optimization
- Proceedings of the Eighteenth International Conference on Information and Knowledge Management (CIKM 2009). ACM. Hong
"... We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expans ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.
Compressed sensing with quantized measurements
, 2010
"... We consider the problem of estimating a sparse signal from a set of quantized, Gaussian noise corrupted measurements, where each measurement corresponds to an interval of values. We give two methods for (approximately) solving this problem, each based on minimizing a differentiable convex function p ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We consider the problem of estimating a sparse signal from a set of quantized, Gaussian noise corrupted measurements, where each measurement corresponds to an interval of values. We give two methods for (approximately) solving this problem, each based on minimizing a differentiable convex function plus an regularization term. Using a first order method developed by Hale et al, we demonstrate the performance of the methods through numerical simulation. We find that, using these methods, compressed sensing can be carried out even when the quantization is very coarse, e.g., 1 or 2 bits per measurement.
ℓ1 Trend Filtering
, 2007
"... The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on Hodrick-Prescott (H-P) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on Hodrick-Prescott (H-P) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., an ℓ1-norm) for the sum of squares used in H-P filtering to penalize variations in the estimated trend. The ℓ1 trend filtering method produces trend estimates that are piecewise linear, and therefore is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope, of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interior-point methods, ℓ1 trend filtering can be carried out with not much more effort than H-P filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties, and give some illustrative examples. We show how the method is related to ℓ1 regularization based methods in sparse signal recovery and feature selection, and list some extensions of the basic method.
SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING 1 Fault Identification via Non-parametric Belief Propagation
"... Abstract—We consider the problem of identifying a pattern of faults from a set of noisy linear measurements. Unfortunately, maximum a posteriori probability estimation of the fault pattern is computationally intractable. To solve the fault identification problem, we propose a non-parametric belief p ..."
Abstract
- Add to MetaCart
Abstract—We consider the problem of identifying a pattern of faults from a set of noisy linear measurements. Unfortunately, maximum a posteriori probability estimation of the fault pattern is computationally intractable. To solve the fault identification problem, we propose a non-parametric belief propagation approach. We show empirically that our belief propagation solver is more accurate than recent state-of-the-art algorithms including interior point methods and semidefinite programming. Our superior performance is explained by the fact that we take into account both the binary nature of the individual faults and the sparsity of the fault pattern arising from their rarity. Index Terms—compressed sensing, fault identification, message passing, non-parametric belief propagation, stochastic approximation. I.
Probabilistic Management of OCR Data using an RDBMS Arun Kumar University of Wisconsin-Madison
"... The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to s ..."
Abstract
- Add to MetaCart
The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-ofthe-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to store the resulting ASCII text in a relational database. The OCR problem is challenging, and so the output of OCR often contains errors. In turn, queries on the output of OCR may fail to retrieve relevant answers. State-of-theart OCR programs, e.g., the OCR powering Google Books, use a probabilistic model that captures many alternatives during the OCR process. Only when the results of OCR are stored in the database, do these approaches discard the uncertainty. In this work, we propose to retain the probabilistic models produced by OCR process in a relational database management system. A key technical challenge is that the probabilistic data produced by OCR software is very large (a single book blows up to 2GB from 400kB as ASCII). As a result, a baseline solution that integrates these models with an RDBMS is over 1000x slower versus standard text processing for single table select-project queries. However, many applications may have quality-performance needs that are in between these two extremes of ASCII and the complete model output by the OCR software. Thus, we propose a novel approximation scheme called Staccato that allows a user to trade recall for query performance. Additionally, we provide a formal analysis of our scheme’s properties, and describe how we integrate our scheme with standard-RDBMS text indexing. 1.
Real-Time Convex Optimization . . . -- Recent advances that make it easier to design and implement algorithms
, 2010
"... Convex optimization has been used in signal processing for a long time to choose coefficients for use in fast (linear) algorithms, such as in filter or array design; more recently, it has been used to carry out (nonlinear) processing on the signal itself. Examples of the latter case include total va ..."
Abstract
- Add to MetaCart
Convex optimization has been used in signal processing for a long time to choose coefficients for use in fast (linear) algorithms, such as in filter or array design; more recently, it has been used to carry out (nonlinear) processing on the signal itself. Examples of the latter case include total variation denoising, compressed sensing, fault detection, and image classification. In both scenarios, the optimization is carried out on time scales of seconds or minutes and without strict time constraints. Convex optimization has traditionally been considered computationally expensive, so its use has been limited to applications where plenty of time is available. Such restrictions are no longer justified. The combination of dramatically increased computing power, modern algorithms, and new coding approaches has delivered an enormous speed increase, which makes it possible to solve modest-sized convex optimization problems on microsecond or millisecond time scales and with strict deadlines. This enables real-time convex optimization in signal processing.

