Results 1 - 10
of
15
F.: Unifying reasoning and search to web scale
- IEEE Internet Computing
, 2007
"... We recently heard about a telecom project that required reasoning about 10 billion RDF triples (statements of the form ) in less than 100 ms. The use case was defined around generating revenue streams through new context-sensitive and personalized mobile services. Existing ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
We recently heard about a telecom project that required reasoning about 10 billion RDF triples (statements of the form <subject, relation, object>) in less than 100 ms. The use case was defined around generating revenue streams through new context-sensitive and personalized mobile services. Existing approaches can handle Resource Description Framework Schema (RDFS) queries for roughly 100 million triples in 100 ms, but this project required sophisticated reasoning with a set of triples that’s two orders of magnitude larger — and the requirements will certainly grow. Indeed, scale requirements could increase much faster over time than any progress in reasoning algorithms, clever coding, and improved hardware can compensate. Being forced to turn away potential customers led us to wonder why this problem even existed. Problems usually become intractable through improper conceptualization — asking intelligence to introduce assumptions that make the problem solvable, on the one hand, without restricting them on the other hand to ensure usefulness. So the question is: Why isn’t reasoning scaling for the Web and how can this be fixed?
Querying the Web: A Multiontology Disambiguation Method
, 2006
"... The lack of explicit semantics in the current Web can lead to ambiguity problems: for example, current search engines return unwanted information since they do not take into account the exact meaning given by user to the keywords used. Though disambiguation is a very well-known problem in Natural La ..."
Abstract
-
Cited by 30 (10 self)
- Add to MetaCart
The lack of explicit semantics in the current Web can lead to ambiguity problems: for example, current search engines return unwanted information since they do not take into account the exact meaning given by user to the keywords used. Though disambiguation is a very well-known problem in Natural Language Processing and other domains, traditional methods are not flexible enough to work in a Webbased context. In this paper we have identified some desirable properties that a Web-oriented disambiguation method should fulfill, and make a proposal according to them. The proposed method processes a set of related keywords in order to discover and extract their implicit semantics, obtaining their most suitable senses according to their context. The possible senses are extracted from the knowledge represented by a pool of ontologies available in the Web. This method applies an iterative disambiguation algorithm that uses a semantic relatedness measure based on Google frequencies. Our proposal makes explicit the semantics of keywords by means of ontology terms; this information can be used for different purposes, such as improving the search and retrieval of underlying relevant information.
Automatic Extraction of Meaning from the Web
- IEEE International Symposium on Information Theory
, 2006
"... Abstract — We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have like “red ” or “christianity. ” For the first type we consider a family of computable distance measures c ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract — We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have like “red ” or “christianity. ” For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches. I.
Using Semantic Distances for Reasoning with Inconsistent Ontologies
"... Abstract. Re-using and combining multiple ontologies on the Web is bound to lead to inconsistencies between the combined vocabularies. Even many of the ontologies that are in use today turn out to be inconsistent once some of their implicit knowledge is made explicit. However, robust and efficient m ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Re-using and combining multiple ontologies on the Web is bound to lead to inconsistencies between the combined vocabularies. Even many of the ontologies that are in use today turn out to be inconsistent once some of their implicit knowledge is made explicit. However, robust and efficient methods to deal with inconsistencies are lacking from current Semantic Web reasoning systems, which are typically based on classical logic. In earlier papers, we have proposed the use of syntactic relevance functions as a method for reasoning with inconsistent ontologies. In this paper, we extend that work to the use of semantic distances. We show how Google distances can be used to develop semantic relevance functions to reason with inconsistent ontologies. In essence we are using the implicit knowledge hidden in the Web for explicit reasoning purposes. We have implemented this approach as part of the PION reasoning system. We report on experiments with several realistic ontologies. The test results show that a mixed syntactic/semantic approach can significantly improve reasoning performance over the purely syntactic approach. Furthermore, our methods allow to trade-off computational cost for inferential completeness. Our experiment shows that we only have to give up a little quality to obtain a high performance gain. 1
Similarity of objects and the meaning of words
- In Proc. 3rd Annual Conferene on Theory and Applications of Models of Computation (TAMC’06), volume 3959 of LNCS
, 2006
"... Abstract. We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for tha ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. We survey the emerging area of compression-based, parameter-free, similarity distance measures useful in data-mining, pattern recognition, learning and automatic semantics extraction. Given a family of distances on a set of objects, a distance is universal up to a certain precision for that family if it minorizes every distance in the family between every two objects in the set, up to the stated precision (we do not require the universal distance to be an element of the family). We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodyments like the first type, but may also be abstract like “red ” or “christianity. ” For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches. 1
Dialogue Systems: Simulations or Interfaces
- in Dialor'05: Proceedings of the ninth workshop on the semantics and pragmatics of dialogue, C. Gardent and
, 2005
"... This paper raises the question of the aim and scope of formal research on dialogue. Two possible answers are distinguished – the “engineering” and the “simulation ” view – and an argument against the soundness of the “simulation ” position is reviewed. This argument centres on the (im)possibility of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper raises the question of the aim and scope of formal research on dialogue. Two possible answers are distinguished – the “engineering” and the “simulation ” view – and an argument against the soundness of the “simulation ” position is reviewed. This argument centres on the (im)possibility of formalising the context (or “background”) needed for human-level language understanding. This argument is then applied to formal dialogue research and some consequences are discussed. 1
Social search and discovery using a unified approach
- In Proceedings of HyperText
"... We explore new ways of improving a search engine using data from Web 2.0 applications such as blogs and social bookmarks. This data contains entities such as documents, people and tags, and relationships between them. We propose a simple yet effective method, based on faceted search, that treats all ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We explore new ways of improving a search engine using data from Web 2.0 applications such as blogs and social bookmarks. This data contains entities such as documents, people and tags, and relationships between them. We propose a simple yet effective method, based on faceted search, that treats all entities in a unified manner: returning all of them (documents, people and tags) on every search, and allowing all of them to be used as search terms. We describe an implementation of such a social search engine on the intranet of a large enterprise, and present large-scale experiments which verify the validity of our approach.
Using Term-matching Algorithms for the Annotation of Geo-services. Web Mining 2.0, Workshop at ECML
, 2007
"... Abstract. This paper presents an approach for automating semantic annotation within service-oriented architectures that provide interfaces to databases of spatial-information objects. The automation of the annotation process facilitates the transition from the current state-of-the-art architectures ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. This paper presents an approach for automating semantic annotation within service-oriented architectures that provide interfaces to databases of spatial-information objects. The automation of the annotation process facilitates the transition from the current state-of-the-art architectures towards semantically-enabled architectures. We see the annotation process as the task of matching an arbitrary word or term with the most appropriate concept in the domain ontology. The term matching techniques that we present are based on text mining. To determine the similarity between two terms, we first associate a set of documents [that we obtain from a Web search engine] with each term. We then transform the documents into feature vectors and thus transition the similarity assessment into the feature space. After that, we compute the similarity by training a classifier to distinguish between ontology concepts. Apart from text mining approaches, we also present two alternative techniques, namely hypothesis checking (i.e. using linguistic patterns such as “term 1 is a term 2 ” as a query to a search engine) and Google Distance.
Using Ontologies and the Web to Learn Lexical Semantics
"... A variety of text processing tasks require or benefit from semantic resources such as ontologies and lexicons. Creating these resources manually is tedious, time consuming, and prone to error. We present a new algorithm for using the web to determine the correct concept in an existing ontology to le ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A variety of text processing tasks require or benefit from semantic resources such as ontologies and lexicons. Creating these resources manually is tedious, time consuming, and prone to error. We present a new algorithm for using the web to determine the correct concept in an existing ontology to lexicalize previously unknown words, such as might be discovered while processing texts. A detailed empirical comparison of our algorithm with two existing algorithms (Cilibrasi & Vitanyi 2004, Maedche et al. 2002) is described, leading to insights into the sources of the algorithms ’ strengths and weaknesses. 1
EFFECTIVE SOLUTIONS FOR NAME LINKAGE AND THEIR APPLICATIONS
"... In order to identify entities, their names (e.g., the names of persons or movies) are among the most commonly chosen identifiers. However, since names are often ambiguous and not unique, confusion inevitably occurs. In particular, when a variety of names are used for the same real-world entity, dete ..."
Abstract
- Add to MetaCart
In order to identify entities, their names (e.g., the names of persons or movies) are among the most commonly chosen identifiers. However, since names are often ambiguous and not unique, confusion inevitably occurs. In particular, when a variety of names are used for the same real-world entity, detecting all variants and consolidating them into a single canonical entity is a significant problem. This problem has been known as the record linkage or entity resolution problem. In order to solve this problem effectively, we first propose a novel approach that advocates the use of the Web as the source of collective knowledge, especially for cases where current approaches fail due to incompleteness of data. Secondly, we attempt to mine semantic knowledge hidden in the entity context and use it with the existing approaches to improve the performance further. Finally, we the mixed type of linkage problem where contents of different entities are mixed in the same pool. Our goal is to group different contents into different clusters by focusing on extraction of the most relevant input pieces. We also illustrate the use of the proposed techniques in various real world applications.

