Results 1  10
of
65
Approximating Polygons and Subdivisions with MinimumLink Paths
, 1991
"... We study several variations on one basic approach to the task of simplifying a plane polygon or subdivision: Fatten the given object and construct an approximation inside the fattened region. We investigate fattening by convolving the segments or vertices with disks and attempt to approximate object ..."
Abstract

Cited by 63 (12 self)
 Add to MetaCart
We study several variations on one basic approach to the task of simplifying a plane polygon or subdivision: Fatten the given object and construct an approximation inside the fattened region. We investigate fattening by convolving the segments or vertices with disks and attempt to approximate objects with the minimum number of line segments, or with near the minimum, by using efficient greedy algorithms. We give some variants that have linear or O(n log n) algorithms approximating polygonal chains of n segments. We also show that approximating subdivisions and approximating with chains with no selfintersections are NPhard.
Time Series Segmentation for Context Recognition in Mobile Devices
, 2001
"... Recognizing the context of use is important in making mobile devices as simple to use as possible. Finding out what the user's situation is can help the device and underlying service in providing an adaptive and personalized user interface. The device can infer parts of the context of the user ..."
Abstract

Cited by 52 (7 self)
 Add to MetaCart
Recognizing the context of use is important in making mobile devices as simple to use as possible. Finding out what the user's situation is can help the device and underlying service in providing an adaptive and personalized user interface. The device can infer parts of the context of the user from sensor data: the mobile device can include sensors for acceleration, noise level, luminosity, humidity, etc. In this paper we consider context recognition by unsupervised segmentation of time series produced by sensors. Dynamic programming can be used to find segments that minimize the intrasegment variances. While this method produces optimal solutions, it is too slow for long sequences of data. We present and analyze randomized variations of the algorithm. One of them, Global Iterative Replacement or GIR, gives approximately optimal results in a fraction of the time required by dynamic programming. We demonstrate the use of time series segmentation in context recognition for mobile phone applications. 1
Online Amnesic Approximation of Streaming Time Series
 In ICDE
, 2004
"... The past decade has seen a wealth of research on time series representations, because the manipulation, storage, and indexing of large volumes of raw time series data is impractical. The vast majority of research has concentrated on representations that are calculated in batch mode and represent eac ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
The past decade has seen a wealth of research on time series representations, because the manipulation, storage, and indexing of large volumes of raw time series data is impractical. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment of mobile devices and real time sensors has brought home the need for representations that can be incrementally updated, and can approximate the data with fidelity proportional to its age. The latter property allows us to answer queries about the recent past with greater precision, since in many domains recent information is more useful than older information. We call such representations amnesic.
Finding recurrent sources in sequences
, 2003
"... Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. Th ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. That is, there are h hidden sources such that the sequence can be written as a concatenation of k> h pieces, each of which stems from one of the h sources. We define this (k, h)segmentation problem and show that it is NPhard in the general case. We give approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √ 5 for the L2 error measure, and generalize the results to higher dimensions. We give empirical results on real (chromosome 22) and artificial data showing that the methods work well in practice.
ℓ1 Trend Filtering
, 2007
"... The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on HodrickPrescott (HP) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on HodrickPrescott (HP) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., an ℓ1norm) for the sum of squares used in HP filtering to penalize variations in the estimated trend. The ℓ1 trend filtering method produces trend estimates that are piecewise linear, and therefore is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope, of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interiorpoint methods, ℓ1 trend filtering can be carried out with not much more effort than HP filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties, and give some illustrative examples. We show how the method is related to ℓ1 regularization based methods in sparse signal recovery and feature selection, and list some extensions of the basic method.
Harvesting Relational Tables from Lists on the Web
"... A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multicolumn tables, which can then be used in more semantically meaningful tasks. However, harvesting relational tables from such lists can be a challenging task. The lists are manu ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multicolumn tables, which can then be used in more semantically meaningful tasks. However, harvesting relational tables from such lists can be a challenging task. The lists are manually generated and hence need not have well defined templates – they have inconsistent delimiters (if any) and often have missing information. We propose a novel technique for extracting tables from lists. The technique is domainindependent and operates in a fully unsupervised manner. We first use multiple sources of information to split individual lines into multiple fields, and then compare the splits across multiple lines to identify and fix incorrect splits and bad alignments. In particular, we exploit a corpus of HTML tables, also extracted from the Web, to identify likely fields and good alignments. For each extracted table, we compute an extraction score that reflects our confidence in the table’s quality. We conducted an extensive experimental study using both real web lists and lists derived from tables on the Web. The experiments demonstrate the ability of our technique to extract tables with high accuracy. In addition, we applied our technique on a large sample of about 100,000 lists crawled from the Web. The analysis of the extracted tables have led us to believe that there are likely to be tens of millions of useful and queryable relational tables extractable from lists on the Web. 1.
Efficient algorithms for sequence segmentation
"... The sequence segmentation problem asks for a partition of the sequence into k nonoverlapping segments that cover all data points such that each segment is as homogeneous as possible. This problem can be solved optimally using dynamic programming in O(n² k) time, where n is the length of the sequen ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
The sequence segmentation problem asks for a partition of the sequence into k nonoverlapping segments that cover all data points such that each segment is as homogeneous as possible. This problem can be solved optimally using dynamic programming in O(n² k) time, where n is the length of the sequence. Given that sequences in practice are too long, a quadratic algorithm is not an adequately fast solution. Here, we present an alternative constantfactor approximation algorithm with running time O(n 4/3 k 5/3). We call this algorithm the DNS algorithm. We also consider the recursive application of the DNS algorithm, that results in a faster algorithm (O(n log log n) running time) with O(log n) approximation factor, and study the accuracy/efficiency tradeoff. Extensive experimental results show that these algorithms outperform other widelyused heuristics. The same algorithms can speed up solutions for other variants of the basic segmentation problem while maintaining constant their approximation factors. Our techniques can also be used in a streaming setting, with sublinear memory requirements.
Robust sketched symbol fragmentation using templates
 In IUI ’04 (2004
, 2004
"... Analysis of sketched digital ink is often aided by the division of stroke points into perceptuallysalient fragments based on geometric features. Fragmentation has many applications in intelligent interfaces for digital ink capture and manipulation, as well as higherlevel symbolic and structural an ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Analysis of sketched digital ink is often aided by the division of stroke points into perceptuallysalient fragments based on geometric features. Fragmentation has many applications in intelligent interfaces for digital ink capture and manipulation, as well as higherlevel symbolic and structural analyses. It is our intuitive belief that the most robust fragmentations closely match a user’s natural perception of the ink, thus leading to more effective recognition and useful user feedback. We present two optimal fragmentation algorithms that fragment common geometries into a basis set of line segments and elliptical arcs. The first algorithm uses an explicit template in which the order and types of bases are specified. The other only requires the number of fragments of each basis type. For the set of symbols under test, both algorithms achieved 100 % fragmentation accuracy rate for symbols with line bases,>99 % accuracy for symbols with elliptical bases, and>90 % accuracy for symbols with mixed line and elliptical bases.
Operationally Optimal VertexBased Shape Coding
 IEEE Signal Processing Magazine
, 1998
"... In this paper, we present a review of our work on ratedistortion based operationally optimal vertexbased lossy shape encoding techniques. We approximate the boundary of a given shape by a low order curve, such as a polygon, and consider the problem of finding the approximation which leads to the s ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
In this paper, we present a review of our work on ratedistortion based operationally optimal vertexbased lossy shape encoding techniques. We approximate the boundary of a given shape by a low order curve, such as a polygon, and consider the problem of finding the approximation which leads to the smallest distortion for a given number of bits. We also address the dual problem of finding the approximation which leads to the smallest bit rate for a given distortion. We consider two different classes of distortion measures. The first class is based on the maximum operator (such as the maximum distance between a boundary and its approximation) and the second class is based on the summation operator (such as the total number of error pels between a boundary and its approximation). For the first class, we derive a fast and operationally optimal scheme which is based on a shortest path algorithm for a weighted directed acyclic graph. For the second class, we propose a solution approach which...
RelationBased Aggregation: Finding Objects In Large Spatial Datasets
, 2000
"... Regularities exist in datasets describing spatially distributed physical phenomena. Human experts often understand and verbalize the regularities as abstract spatial objects evolving coherently and interacting with each other in the domain space. We describe a novel computational approach for ident ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Regularities exist in datasets describing spatially distributed physical phenomena. Human experts often understand and verbalize the regularities as abstract spatial objects evolving coherently and interacting with each other in the domain space. We describe a novel computational approach for identifying and extracting these abstract spatial objects through the construction of ahierarchy of spatial relations. We demonstrate the approach with an application to finding pressure trough features in weather data sets.