Results 1 -
6 of
6
Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. ∗
"... Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the web ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves a major room for improvement. This paper presents a new serverside WePS approach. It is based on collecting co-occurrence information from the Web and thus it uses the Web as an external data source. A skyline-based classification technique
Comparing Classification Tree Structures: A Special Case of Comparing q-Ary Relations
, 1997
"... Comparing q-ary relations on a set O of elementary objects is one of the most fundamental problems of classification and combinatorial data analysis. In this paper the specific comparison task that involves classification tree structures (binary or not) is considered in this context. Two mathematica ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Comparing q-ary relations on a set O of elementary objects is one of the most fundamental problems of classification and combinatorial data analysis. In this paper the specific comparison task that involves classification tree structures (binary or not) is considered in this context. Two mathematical representations are proposed. One is defined in terms of a weighted binary relation; the second uses a four-ary relation. The most classical approaches to tree comparison, are discussed in the contex of a set theoretic representation of these relations. Formal and combinatorial computing aspects of a construction method for a very general family of association coefficients between relations are presented. The main purpose of this article is to specify the components of this construction, based on a permutational procedure, when the structures to be compared are classification trees.
Knowledge Discovery From Symbolic Data And The Sodas Software
- Conf. on Principles and Practice of Knowledge Discovery in Databases, PPKDD-2000
, 2000
"... The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their under ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explanatory results, that why, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover they can be used in order to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. In this paper we give an overview on recent development on SDA. We present some tools and methods of SDA and introduce the SODAS software prototype (issued from the work of 17 teams of nine countries involved in an European project of EUROSTAT). 1
Similarity measures for binary and numerical data: a survey
"... Abstract: Similarity measures aim at quantifying the extent to which objects resemble each other. Many techniques in data mining, data analysis or information retrieval require a similarity measure, and selecting an appropriate measure for a given problem is a difficult task. In this paper, the dive ..."
Abstract
- Add to MetaCart
Abstract: Similarity measures aim at quantifying the extent to which objects resemble each other. Many techniques in data mining, data analysis or information retrieval require a similarity measure, and selecting an appropriate measure for a given problem is a difficult task. In this paper, the diverse forms similarity measures can take are examined, as well as their relationships and respective properties. Their semantic differences are highlighted and numerical tools to quantify these differences are proposed, considering several points of view and including global and local comparisons, order-based and value-based comparisons, and mathematical properties such as derivability. The paper studies similarity measures for two types of data: binary and numerical data, i.e., set data represented by the presence or absence of characteristics and data represented by real vectors.
Search engines are among the most important Web techno...
"... Searching for people on the Web is one of the most common query types submitted to Web search engines today. However, when a person name is queried, the returned Webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding t ..."
Abstract
- Add to MetaCart
Searching for people on the Web is one of the most common query types submitted to Web search engines today. However, when a person name is queried, the returned Webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the Webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves major room for improvement. In this article, we present a new WePS approach. It is based on issuing additional auxiliary queries to the Web to gain additional knowledge about the Webpages that need to be disambiguated. Thus, the approach uses the Web as an external data source by issuing queries to collect co-occurrence statistics. These statistics are used to assess the overlap of the contextual entities extracted from the Webpages. The article also proposes a methodology to make this Web querying technique efficient. Further, the article proposes an approach that is capable of combining various types of disambiguating information, including other common types of similarities, by applying a correlation clustering approach with after-clustering of singleton clusters. These properties allow

