Results 11 
15 of
15
A Polygonbased Methodology for Mining Related Spatial Datasets
"... Polygons can serve an important role in the analysis of georeferenced data as they provide a natural representation for particular types of spatial objects and in that they can be used as models for spatial clusters. This paper claims that polygon analysis is particularly useful for mining related, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Polygons can serve an important role in the analysis of georeferenced data as they provide a natural representation for particular types of spatial objects and in that they can be used as models for spatial clusters. This paper claims that polygon analysis is particularly useful for mining related, spatial datasets. A novel methodology for clustering polygons that have been extracted from different spatial datasets is proposed which consists of a meta clustering module that clusters polygons and a summary generation module that creates a final clustering from a polygonal meta clustering based on user preferences. Moreover, a densitybased polygon clustering algorithm is introduced. Our methodology is evaluated in a realworld case study involving ozone pollution in Texas; it was able to reveal interesting relationships between different ozone hotspots and interesting associations between ozone hotspots and other meteorological variables. Keywords spatial data mining, polygon clustering algorithms, mining related
Problem of Matching Far InfraRed Astronomical Sources to Optical Counterparts
, 2005
"... Abstract: The problem of record linkage is often seen simply in terms of making links between data points that might be generated from the same source. However, in many cases the grounds for linking items is itself not certain. In fact it is often desirable to learn, in an unsupervised manner, what ..."
Abstract
 Add to MetaCart
Abstract: The problem of record linkage is often seen simply in terms of making links between data points that might be generated from the same source. However, in many cases the grounds for linking items is itself not certain. In fact it is often desirable to learn, in an unsupervised manner, what form linked objects take in different databases. One simple case of this is the “one to many ” linkage problem, where each object in one dataset is potentially linked to one of many objects in another dataset, and where the candidate matches are mutually exclusive. We show how the Expectation Maximisation algorithm can be used for this matching problem, both to calculate the probability of a match, and to learn something about the characteristics that matched objects have. The approach is derived for the specific astronomical problem of linking far infrared observations to optical counterparts, but is generally applicable. This report outlines the theory of this record linkage procedure, but does not discuss its application or any
Research Track Poster A CrossCollection Mixture Model for Comparative Text Mining
"... In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of comparative text mining is to discover any latent common themes across all collections as well as summarize the similarity and d ..."
Abstract
 Add to MetaCart
In this paper, we define and study a novel text mining problem, which we refer to as Comparative Text Mining (CTM). Given a set of comparable text collections, the task of comparative text mining is to discover any latent common themes across all collections as well as summarize the similarity and differences of these collections along each common theme. This general problem subsumes many interesting applications, including business intelligence and opinion summarization. We propose a generative probabilistic mixture model for comparative text mining. The model simultaneously performs crosscollection clustering and withincollection clustering, and can be applied to an arbitrary set of comparable text collections. The model can be estimated efficiently using the ExpectationMaximization (EM) algorithm. We evaluate the model on two different text data sets (i.e., a news article data set and a laptop review data set), and compare it with a baseline clustering method also based on a mixture model. Experiment results show that the model is quite effective in discovering the latent common themes across collections and performs significantly better than our baseline mixture model.
RANKING RELATIONS USING ANALOGIES IN BIOLOGICAL AND INFORMATION NETWORKS 1
, 2009
"... Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects S ={A (1) : B (1),A (2): B (2),...,A (N) : B (N)}, measures how well other pairs A: B fit in with th ..."
Abstract
 Add to MetaCart
Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects S ={A (1) : B (1),A (2): B (2),...,A (N) : B (N)}, measures how well other pairs A: B fit in with the set S. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions
Features
, 2004
"... We develop an event detection framework that has two significant advantages over past work. First, we introduce an extended set of timewise and objectwise statistical features including not only the trajectories but also histograms and HMM’s of speed, orientation, location, size and aspect ratio. ..."
Abstract
 Add to MetaCart
We develop an event detection framework that has two significant advantages over past work. First, we introduce an extended set of timewise and objectwise statistical features including not only the trajectories but also histograms and HMM’s of speed, orientation, location, size and aspect ratio. The proposed features are more expressive and enable detection of events that cannot be detected with trajectorybased features reported so far. Second, we introduce a spectral clustering method that can estimate the optimal number of clusters automatically. This novel clustering technique that is not adversely affected by high dimensionality. Unlike the conventional approaches that fit predefined models to events, we determine unusual events by analyzing the conformity scores. We compute affinity matrices and apply eignenvalue decomposition to find clusters to obtain the usual events. We prove that the number of clusters governs the number of eigenvectors used to span the feature similarity space. We also improve the feature selection process.