Results 1 - 10
of
30
A Polynomial Time Computable Metric Between Point Sets
, 2000
"... Measuring the similarity or distance between two sets of points in a metric space is an important problem in machine learning and has also applications in other disciplines e.g. in computational geometry, philosophy of science, methods for updating or changing theories, . . . . Recently Eiter and Ma ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Measuring the similarity or distance between two sets of points in a metric space is an important problem in machine learning and has also applications in other disciplines e.g. in computational geometry, philosophy of science, methods for updating or changing theories, . . . . Recently Eiter and Mannila have proposed a new measure which is computable in polynomial time. However, it is not a distance function in the mathematical sense because it does not satisfy the triangle inequality.
Using Semantic Role to Improve Question Answering
- In Proceedings of EMNLP 2007
, 2007
"... Shallow semantic parsing, the automatic identification and labeling of sentential constituents, has recently received much attention. Our work examines whether semantic role information is beneficial to question answering. We introduce a general framework for answer extraction which exploits semanti ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Shallow semantic parsing, the automatic identification and labeling of sentential constituents, has recently received much attention. Our work examines whether semantic role information is beneficial to question answering. We introduce a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm. We view semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching. Experimental results on the TREC datasets demonstrate improvements over state-of-the-art models. 1
Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects
- IN PROC. ACM SIGMOD INT. CONF. ON MANAGEMENT OF DATA (SIGMOD’03)
, 2003
"... In modern application domains such as multimedia, molecular biology and medical imaging, similarity search in database systems is becoming an increasingly important task. Especially for CAD applications, suitable similarity models can help to reduce the cost of developing and producing new parts by ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
In modern application domains such as multimedia, molecular biology and medical imaging, similarity search in database systems is becoming an increasingly important task. Especially for CAD applications, suitable similarity models can help to reduce the cost of developing and producing new parts by maximizing the reuse of existing parts. Most of the existing similarity models are based on feature vectors. In this paper, we shortly review three models which pursue this paradigm. Based on the most promising of these three models, we explain how sets of feature vectors can be used for more e#ective and still e#cient similarity search. We first introduce an intuitive distance measure on sets of feature vectors together with an algorithm for its e#cient computation. Furthermore, we present a method for accelerating the processing of similarity queries on vector set data. The experimental evaluation is based on two real world test data sets and points out that our new similarity approach yields more meaningful results in comparatively short time.
A Framework for Defining Distances Between First-Order Logic Objects
, 1998
"... this paper we develop a framework for distances between clauses and distances between models. The framework can be parametrised by a measure for the distance between atoms. It takes into account subterms common to distinct atoms of a set of atoms in the measurement of the distance between sets. More ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
this paper we develop a framework for distances between clauses and distances between models. The framework can be parametrised by a measure for the distance between atoms. It takes into account subterms common to distinct atoms of a set of atoms in the measurement of the distance between sets. Moreover, for a constant number of variables, the complexity of the distance computation is polynomially bounded by the size of the objects. Initial experiments show that the framework can be the basis of good clustering algorithms. The framework consists of three levels: At the first level one chooses a distance between atoms . The second level upgrades this distance to a distance between sets of atoms. We propose a framework that is a generalisation of three polynomial time computable similarity measures proposed by Eiter and Mannila, and an instance which is a real distance function, computable in polynomial time. We develop also a binary prototype function for sets of points. Prototype fun
Thesus: Organizing Web Document Collections Based on Link Semantics
- VLDB J
, 2003
"... The requirements for effective search and management of the WWW are stronger than ever. Currently web documents are classified based on their content not taking into account the fact that these documents are connected to each other by links. We claim that a page's classification is enriched by the d ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
The requirements for effective search and management of the WWW are stronger than ever. Currently web documents are classified based on their content not taking into account the fact that these documents are connected to each other by links. We claim that a page's classification is enriched by the detection of its incoming links' semantics. This would enable effective browsing and enhance the validity of search results in the WWW context. Another aspect that is under addressed and is strictly related to the tasks of browsing and searching is the similarity of documents at the semantic level. The above observations lead us to the adoption of a hierarchy of concepts (ontology) and a thesaurus to exploit links and provide a better characterization of web documents. The enhancement of the documents characterization makes operations such as clustering and labeling become very interesting. To this end, we devised a system called THESUS. The system deals with an initial sets of web documents, extracts keywords from all pages' incoming links and converts them to semantics by mapping them to a domain's ontology. Subsequently, a clustering algorithm is applied to discover groups of web documents. The effectiveness of the clustering process is based on the use of a novel similarity measure between documents characterized by sets of terms. Web documents are organized into thematic subsets based on their semantics. The subsets are then labeled, thus enabling easier management (browsing, searching, querying) of the Web. In this article, we detail the process of this system and give an experimental analysis of its restfits.
Context-Based Similarity Measures for Categorical Databases
- In PKDD
"... Similarity between complex data objects is one of the central notions in data mining. We propose certain similarity (or distance) measures between various components of a 0/1 relation. We define measures between attributes, between rows, and between subrelations of the database. They find import ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Similarity between complex data objects is one of the central notions in data mining. We propose certain similarity (or distance) measures between various components of a 0/1 relation. We define measures between attributes, between rows, and between subrelations of the database. They find important applications in clustering, classification, and several other data mining processes. Our measures are based on the contexts of individual components. For example, two products (i.e., attributes) are deemed similar if their respective sets of customers (i.e., subrelations) are similar. This reveals more subtle relationships between components, something that is usually missing in simpler measures. Our problem of finding distance measures can be formulated as a system of nonlinear equations. We present an iterative algorithm which, when seeded with random initial values, converges quickly to stable distances in practice (typically requiring less than five iterations). The algorithm requires only one database scan. Results on artificial and real data show that our method is efficient, and produces results with intuitive appeal.
Factors influencing the origins of colour categories
- Laboratory Vrije Universiteit Brussel
, 2002
"... van de academische graad van doctor in de wetenschappen, in het openbaar te verdedigen op vrijdag 8 maart 2002. Acknowledgements I started as a research assistant in the Artificial Intelligence Laboratory in autumn 1996. My first interests were into behavioural robotics and robot ecosystems. As a co ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
van de academische graad van doctor in de wetenschappen, in het openbaar te verdedigen op vrijdag 8 maart 2002. Acknowledgements I started as a research assistant in the Artificial Intelligence Laboratory in autumn 1996. My first interests were into behavioural robotics and robot ecosystems. As a continuation to my “licentiaats ” thesis I started building a camera system to extend the sensory perception of the lab’s robots (Belpaeme and Birk, 1997a,b; Belpaeme, 1998; Birk and Belpaeme, 1998; Birk et al., 1998, 1999; Belpaeme and Birk, 2001). It was around that time when Luc Steels got interested in the origins of language. His early experiments formed the seed for what is now one of the most important paradigms for exploring linguistic interactions with computer simulations. Luc soon wanted more and had plans to implement a language experiment in the real world, for which I delivered the visual perception (Belpaeme et al., 1998; Belpaeme, 1999). This got me interested in visual features, and my research
Similarity Search on Time Series based on Threshold Queries
"... Abstract. Similarity search in time series data is required in many application fields. The most prominent work has focused on similarity search considering either complete time series or similarity according to subsequences of time series. For many domains like financial analysis, medicine, environ ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. Similarity search in time series data is required in many application fields. The most prominent work has focused on similarity search considering either complete time series or similarity according to subsequences of time series. For many domains like financial analysis, medicine, environmental meteorology, or environmental observation, the detection of temporal dependencies between different time series is very important. In contrast to traditional approaches which consider the course of the time series for the purpose of matching, coarse trend information about the time series could be sufficient to solve the above mentioned problem. In particular, temporal dependencies in time series can be detected by determining the points of time at which the time series exceeds a specific threshold. In this paper, we introduce the novel concept of threshold queries in time series databases which report those time series exceeding a user-defined query threshold at similar time frames compared to the query time series. We present a new efficient access method which uses the fact that only partial information of the time series is required at query time. The performance of our solution is demonstrated by an extensive experimental evaluation on real world and artificial time series data. 1
Kernel-based distances for relational learning
- In Proceedings of the Workshop on Multi-Relational Data Mining at KDD-2004
, 2004
"... Abstract. In this paper we present a novel and general framework for kernel-based learning over relational schemata. We exploit the notion of foreign keys to perform the leap from a flat attribute-value representation to a structured representation that underlines relational learning. We define a ne ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. In this paper we present a novel and general framework for kernel-based learning over relational schemata. We exploit the notion of foreign keys to perform the leap from a flat attribute-value representation to a structured representation that underlines relational learning. We define a new attribute type which builds on the notion of foreign keys that we call instance-set. It is shown that this more database oriented approach enables intuitive modeling of relational problems. We also define some kernel functions over relational schemata and adapt them so that they are used as a basis for a relational instance-based learning algorithm. We check the performance of our algorithm on a number of well known relational benchmark datasets. 1
Distance semantics for database repair
- Annals of Mathematics and Artificial Intelligence
"... Abstract. In many scenarios, a database instance violates a given set of integrity constraints. In such cases, it is often required to repair the database, that is, to restore its consistency. A primary motif behind the repairing approaches is the principle of minimal change, which is the aspiration ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. In many scenarios, a database instance violates a given set of integrity constraints. In such cases, it is often required to repair the database, that is, to restore its consistency. A primary motif behind the repairing approaches is the principle of minimal change, which is the aspiration to keep the recovered data as faithful as possible to the original (inconsistent) database. In this paper, we represent this qualitative principle quantitatively, in terms of distance functions and some underlying metrics, and so introduce a general framework for repairing inconsistent databases by distance-based considerations. The uniform way of representing repairs and their semantics clarifies the essence behind several approaches to consistency restoration in database systems, helps to compare the underlying formalisms, and relates them to existing methods of defining belief revision operators, merging data sets, and integrating information systems. 1

