Results 1 - 10
of
86
Query answering and ontology population: An inductive approach
- IN PROC. ESWC-2008
"... Abstract. In order to overcome the limitations of deductive logic-based approaches to deriving operational knowledge from ontologies, especially when data come from distributed sources, inductive (instance-based) methods may be better suited, since they are usually efficient and noisetolerant. In th ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
Abstract. In order to overcome the limitations of deductive logic-based approaches to deriving operational knowledge from ontologies, especially when data come from distributed sources, inductive (instance-based) methods may be better suited, since they are usually efficient and noisetolerant. In this paper we propose an inductive method for improving the instance retrieval and enriching the ontology population. By casting retrieval as a classification problem with the goal of assessing the individual class-memberships w.r.t. the query concepts, we propose an extension of the k-Nearest Neighbor algorithm for OWL ontologies based on an entropic distance measure. The procedure can classify the individuals w.r.t. the known concepts but it can also be used to retrieve individuals belonging to query concepts. Experimentally we show that the behavior of the classifier is comparable with the one of a standard reasoner. Moreover we show that new knowledge (not logically derivable) is induced. It can be suggested to the knowledge engineer for validation, during the ontology population task. 1
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and K-nearest neighbor (K-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high-dimensional, as is the case in m ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and K-nearest neighbor (K-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high-dimensional, as is the case in many pattern recognition tasks. This, for example, renders the K-NN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against state-of-the-art exact and approximate techniques, both in synthetic and real, metric and non-metric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
Large-Scale Malware Indexing Using Function-Call Graphs
"... A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design, implement and evaluate a malware database management system called SMIT (Symantec Malware Indexing Tree) that can efficiently make such determination based on malware’s function-call graphs, which is a structural representation known to be less susceptible to instruction-level obfuscations commonly employed by malware writers to evade detection of AV software. Because each malware program is represented as a graph, the problem of searching for the most similar malware program in a database to a given malware sample is cast into a nearest-neighbor search problem in a graph database. To speed
Graphs for Metric Space Searching
, 2008
"... The problem of Similarity Searching consists in finding the elements from a set which are similar to a given query under some criterion. If the similarity is expressed by means of a metric, the problem is called Metric Space Searching. In this thesis we present new methodologies to solve this prob ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
The problem of Similarity Searching consists in finding the elements from a set which are similar to a given query under some criterion. If the similarity is expressed by means of a metric, the problem is called Metric Space Searching. In this thesis we present new methodologies to solve this problem using graphs G(V,E) to represent the metric database. In G, the set V corresponds to the objects from the metric space and E to a small subset of edges from V × V, whose weights are computed according to the metric of the space under consideration. In particular, we study k-nearest neighbor graphs (knngs). The knng is a weighted graph connecting each element from V —or equivalently, each object from the metric space — to its k nearest neighbors. We develop algorithms both to construct knngs in general metric spaces, and to use
Spatial selection of sparse pivots for similarity search in metric spaces
- IN: SOFSEM 2007: 33RD CONFERENCE ON CURRENT TRENDS IN THEORY AND PRACTICE OF COMPUTER SCIENCE. LNCS (4362
, 2007
"... Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). The main characteristic of this method is that it guarantees a good pivot selection ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). The main characteristic of this method is that it guarantees a good pivot selection more efficiently than other methods previously proposed. In addition, SSS adapts itself to the dimensionality of the metric space we are working with, without being necessary to specify in advance the number of pivots to use. Furthermore, SSS is dynamic, that is, it is capable to support object insertions in the database efficiently, it can work with both continuous and discrete distance functions, and it is suitable for secondary memory storage. In this work we provide experimental results that confirm the advantages of the method with several vector and metric spaces. We also show that the efficiency of our proposal is similar to that of other existing ones over vector spaces, although it is better over general metric spaces.
Randomized Metric Induction and Evolutionary Conceptual Clustering for Semantic Knowledge Bases
- ACM-CIKM 2007
, 2007
"... We present an evolutionary clustering method which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Semantic Web. The method exploits an effective and languageindependent semi-distance measure defined for the space of individ ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We present an evolutionary clustering method which can be applied to multi-relational knowledge bases storing resource annotations expressed in the standard languages for the Semantic Web. The method exploits an effective and languageindependent semi-distance measure defined for the space of individual resources, that is based on a finite number of dimensions corresponding to a committee of discriminating features (represented by concept descriptions). A maximally discriminating group of features can be obtained with the randomized optimization methods described in the paper. The clustering algorithm represents the possible clusterings as strings of central elements (medoids, w.r.t. the given metric) of variable length. Hence, the number of clusters is not required as a parameter since the method is able to find an optimal choice by means of the evolutionary operators and of a proper fitness function. We also show how to assign each cluster with a newly constructed intensional definition in the employed concept language. An experimentation with some ontologies proves the feasibility of our method and its effectiveness in terms of clustering validity indices.
Similarity search using sparse pivots for efficient multimedia information retrieval
- IN: PROC. OF THE 8TH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM’06
, 2006
"... Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). This method guarantees a good pivot selection more efficiently than other methods pr ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). This method guarantees a good pivot selection more efficiently than other methods previously proposed. In addition, SSS adapts itself to the dimensionality of the metric space we are working with, and it is not necessary to specify in advance the number of pivots to extract. Furthermore, SSS is dynamic, it supports object insertions in the database efficiently, it can work with both continuous and discrete distance functions, and it is suitable for secondary memory storage. In this work we provide experimental results that confirm the advantages of the method with several vector and metric spaces.
A Dynamic Pivot Selection Technique for Similarity Search ∗
"... All pivot-based algorithms for similarity search use a set of reference points called pivots. The pivot-based search algorithm precomputes some distances to these reference points, which are used to discard objects during a search without comparing them directly with the query. Though most of the al ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
All pivot-based algorithms for similarity search use a set of reference points called pivots. The pivot-based search algorithm precomputes some distances to these reference points, which are used to discard objects during a search without comparing them directly with the query. Though most of the algorithms proposed to date select these reference points at random, previous works have shown the importance of intelligently selecting these points for the index performance. However, the proposed pivot selection techniques need to know beforehand the complete database to obtain good results, which inevitably makes the index static. More recent works have addressed this problem, proposing techniques that dynamically select pivots as the database grows. This paper presents a new technique for choosing pivots, that combines the good properties of previous proposals with the recently proposed dynamic selection. The experimental evaluation provided in this paper shows that the new proposed technique outperforms the state-of-art methods for selecting pivots. 1
Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design
- In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’09
, 2009
"... We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y ′ answers whether y or y ′ is closer to x. We assume that the similarity order of the dataset satisfies the fo ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y ′ answers whether y or y ′ is closer to x. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if x is the a’th most similar object to y and y is the b’th most similar object to z, then x is among the D(a + b) most similar objects to z, where D is a relatively small disorder constant. Though the oracle gives much less information compared to the standard general metric space model where distance values are given, one can still design very efficient algorithms for various fundamental computational tasks. For nearest neighbor search we present deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and near-logarithmic time complexity of search. Then, for near-duplicate detection we present the first known deterministic algorithm that requires just near-linear time + time proportional to the size of output. Finally, we show that for any dataset satisfying the disorder inequality a visibility graph can be constructed: all outdegrees are near-logarithmic and greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known work-around for Navarro’s impossibility of generalizing Delaunay graphs. The technical contribution of the paper consists of handling “false positives ” in data structures and an algorithmic technique up-aside-down-filter.
DLMedia: An ontology mediated multimedia information retrieval system
- IN: PROC. DL-2007
, 2007
"... We outline DLMedia, an ontology mediated multimedia information retrieval system, which combines logic-based retrieval with multimedia featurebased similarity retrieval. An ontology layer may be used to define (in terms of a DLR-Lite like description logic) the relevant abstract concepts and relati ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We outline DLMedia, an ontology mediated multimedia information retrieval system, which combines logic-based retrieval with multimedia featurebased similarity retrieval. An ontology layer may be used to define (in terms of a DLR-Lite like description logic) the relevant abstract concepts and relations of the application domain, while a content-based multimedia retrieval system is used for feature-based retrieval.

