Results 1 - 10
of
27
From Baby Steps to Leapfrog: How “Less is More” in unsupervised dependency parsing
- IN NAACL-HLT
"... We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, bootstraps itself via iterated learning of increasingly longer sentences and requires no initialization. Th ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, bootstraps itself via iterated learning of increasingly longer sentences and requires no initialization. This method substantially exceeds Klein and Manning’s published scores and achieves 39.4 % accuracy on Section 23 (all sentences) of the Wall Street Journal corpus. The second, Less is More, uses a low-complexity subset of the available data: sentences up to length 15. Focusing on fewer but simpler examples trades off quantity against ambiguity; it attains 44.1% accuracy, using the standard linguisticallyinformed prior and batch training, beating state-of-the-art. Leapfrog, our third heuristic, combines Less is More with Baby Steps by mixing their models of shorter sentences, then rapidly ramping up exposure to the full training set, driving up accuracy to 45.0%. These trends generalize to the Brown corpus; awareness of data complexity may improve other parsing models and unsupervised algorithms.
Evidence Combination for Multi-Point Query Learning in Content-Based Image Retrieval
, 2004
"... In Multi-Point Query Learning a number of query representatives are selected based on the positive feedback samples. The similarity score to a multi-point query is obtained from merging the individual scores. In this paper, we investigate three different combination strategies and present a comparat ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
In Multi-Point Query Learning a number of query representatives are selected based on the positive feedback samples. The similarity score to a multi-point query is obtained from merging the individual scores. In this paper, we investigate three different combination strategies and present a comparative evaluation of their performance. Results show that the performance of multi-point queries relies heavily on the right choice of settings for the fusion. Unlike previous results, suggesting that multi-point queries generally perform better than a single query representation, our evaluation results do not allow such an overall conclusion. Instead our study points to the type of queries for which query expansion is better suited than a single query, and vice versa.
An information fusion demonstrator for tactical intelligence processing in network-based defense
- Information Fusion
, 2007
"... The Swedish Defence Research Agency (FOI) has developed a concept demonstrator called the Information Fusion Demonstrator 2003 (IFD03) for demonstrating information fusion methodology suitable for a future Network Based Defense (NBD) C4ISR system. The focus of the demonstrator is on real-time tactic ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
The Swedish Defence Research Agency (FOI) has developed a concept demonstrator called the Information Fusion Demonstrator 2003 (IFD03) for demonstrating information fusion methodology suitable for a future Network Based Defense (NBD) C4ISR system. The focus of the demonstrator is on real-time tactical intelligence processing at the division level in a ground warfare scenario. The demonstrator integrates novel force aggregation, particle filtering, and sensor allocation methods to create, dynamically update, and maintain components of a tactical situation picture. This is achieved by fusing physically modelled and numerically simulated sensor reports from several different sensor types with realistic a priori information sampled from both a high-resolution terrain model and an enemy organizational and behavioral model. This represents a key step toward the goal of creating in real time a dynamic, high fidelity representation of a moving battalion-sized organization, based on sensor data as well as a priori intelligence and terrain information, employing fusion, tracking, aggregation, and resource allocation methods all built on well-founded theories of uncertainty. The motives behind this project, the fusion methods developed for the system, as well as its scenario model and simulator architecture are described. The main services of the demonstrator are discussed and early experience from using the system is shared.
Sequential clustering with particle filtering - Estimating the number of clusters from data
- Proceedings of the Eighth International Conference on Information Fusion (FUSION 2005)
, 2005
"... In this paper we develop a particle filtering approach for grouping observations into an unspecified number of clusters. Each cluster corresponds to a potential target from which the observations originate. A potential clustering with a specified number of clusters is represented by an association h ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
In this paper we develop a particle filtering approach for grouping observations into an unspecified number of clusters. Each cluster corresponds to a potential target from which the observations originate. A potential clustering with a specified number of clusters is represented by an association hypothesis. Whenever a new report arrives, a posterior distribution over all hypotheses is iteratively calculated from a prior distribution, an update model and a likelihood function. The update model is based on an association probability for clusters given the probability of false detection and a derived probability of an unobserved target. The likelihood of each hypothesis is derived from a cost value of associating the current report with its corresponding cluster according to the hypothesis. A set of hypotheses is maintained by Monte Carlo sampling. In this case, the state-space, i.e., the space of all hypotheses, is discrete with a linearly growing dimensionality over time. To lower the complexity further, hypotheses are combined if their clusters are close to each other in the observation space. Finally, for each time-step, the posterior distribution is projected into a distribution over the number of clusters. Compared to earlier information theoretic approaches for finding the number of clusters this approach does not require a large number of trial clusterings, since it maintains an estimate of the number of clusters along with the cluster configuration.
Supervised Clustering: Algorithms and Application
- Suffolk University Law Review
, 2005
"... This work centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering assumes that the examples are classified and has the goal of identifying class-uniform clusters that have high probability densities. Three representative–based algo ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This work centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering assumes that the examples are classified and has the goal of identifying class-uniform clusters that have high probability densities. Three representative–based algorithms for supervised clustering are introduced: two greedy algorithms SRIDHCR and SPAM, and an evolutionary computing algorithm named SCEC. The three algorithms were evaluated using a benchmark consisting of UCI machine learning datasets. Study of the solution landscape for the fitness function used by supervised clustering shows that the landscape seems to have a “Canyonland ” shape, thereby, increasing the difficulty of the clustering task for the greedy algorithms. Furthermore, we introduce a technique for class decomposition and demonstrate with experimental results how it could enhance the performance of simple classifiers. We, also, present a dataset editing technique, we call supervised clustering editing (SCE), which replaces examples of a learned cluster by the cluster representative. Our experimental results demonstrate how dataset editing techniques in general and SCE technique in particular enhance the performance of NN classifiers. Other potential applications of supervised clustering such as summary generation, discovery of interesting regions in spatial databases, and distance function learning are discussed as well 1.
Learning states and rules for detecting anomalies in time series
- Applied Intelligence
"... The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko segmentation algorithm is comparable to a human expert in identifying states, and our L method performs better than the existing permutation tests method when determining the number of segments to return in segmentation algorithms. Empirical results have also shown that our overall system can track normal behavior and detect anomalies.
Clustering decomposed belief functions using generalized weights of conflict
, 2008
"... We develop a method for clustering all types of belief functions, in particular non-consonant belief functions. Such clustering is done when the belief functions concern multiple events, and all belief functions are mixed up. Clustering is performed by decomposing all belief functions into simple su ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We develop a method for clustering all types of belief functions, in particular non-consonant belief functions. Such clustering is done when the belief functions concern multiple events, and all belief functions are mixed up. Clustering is performed by decomposing all belief functions into simple support and inverse simple support functions that are clustered based on their pairwise generalized weights of conflict, constrained by weights of attraction assigned to keep track of all decompositions. The generalized conflict c 2 ð 1; 1Þ and generalized weight of conflict J 2 ð 1; 1Þ are derived in the combination of simple support and inverse simple support functions.
EGO: A personalised multimedia management tool
- In Proc. of the 2nd Int. Workshop on Adaptive Multimedia Retrieval
, 2004
"... Abstract. The problems of Content-Based Image Retrieval (CBIR) systems can be attributed to the semantic gap between the low-level data representation and the high-level concepts the user associates with images, on the one hand, and the time-varying and often vague nature of the underlying informati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. The problems of Content-Based Image Retrieval (CBIR) systems can be attributed to the semantic gap between the low-level data representation and the high-level concepts the user associates with images, on the one hand, and the time-varying and often vague nature of the underlying information need, on the other. These problems can be addressed by improving the interaction between the user and the system. In this paper, we sketch the development of CBIR interfaces, and introduce our view on how to solve some of the problems of the studied interfaces. To address the semantic gap and long-term multifaceted information needs, we propose a “retrieval in context ” system. EGO is a tool for the management of image collections, supporting the user through personalisation and adaptation. We will describe how it learns from the user’s personal organisation, allowing it to recommend relevant images to the user. The recommendation algorithm is detailed, which is based on relevance feedback techniques. 1
Clump: A scalable and robust framework for structure discovery
- In ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Mining
, 2005
"... kunal @ ece.utexas.edu We introduce a robust and efficient framework called CLUMP (CLustering Using Multiple Prototypes) for unsupervised discovery of structure in data. CLUMP relies on finding multiple prototypes that summarize the data. Clustering the prototypes enables our algorithm to scale up t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
kunal @ ece.utexas.edu We introduce a robust and efficient framework called CLUMP (CLustering Using Multiple Prototypes) for unsupervised discovery of structure in data. CLUMP relies on finding multiple prototypes that summarize the data. Clustering the prototypes enables our algorithm to scale up to extremely large and high-dimensional domains such as text data. Other desirable properties include robustness to noise and parameter choices. In this paper, we describe the approach in detail, characterize its performance on a variety of datasets, and compare it to some existing model selection approaches. 1.
A Spectroscopy of Texts for Effective Clustering
- In: Proc. 8th PKDD
, 2004
"... For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of di#erent data characteristics and analysis contexts, it is often di#c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of di#erent data characteristics and analysis contexts, it is often di#cult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: "How can we e#ectively estimate the natural number of clusters in a given text collection?". We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.

