Results 21 - 30
of
47
Hierarchical Clustering for Datamining
- in Proceedings of KES-2001 Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies
, 2001
"... . This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model. A soft version of the Generalizable Gaussian Mixture mod ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
. This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model. A soft version of the Generalizable Gaussian Mixture model is also discussed. The proposed hierarchical scheme is agglomerative and based on a L 2 distance metric. Unsupervised and supervised schemes are successfully tested on artificially data and for segmention of e-mails. 1
Probabilistic Hierarchical Clustering with Labeled and Unlabeled Data
- International Journal of Knowledge-Based Intelligent Engineering Systems
, 2001
"... . This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications, where supervised learning is performed using both labeled and unlabeled examples. The probabilistic clustering is based on the previously suggested Generalizable G ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
. This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications, where supervised learning is performed using both labeled and unlabeled examples. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model and is extended using a modified Expectation Maximization procedure for learning with both unlabeled and labeled examples. The proposed hierarchical scheme is agglomerative and based on probabilistic similarity measures. Here, we compare a L 2 dissimilarity measure, error confusion similarity, and accumulated posterior cluster probability measure. The unsupervised and supervised schemes are successfully tested on artificially data and for e-mails segmentation. 1
Hypergraph Models and Algorithms for Data-Pattern Based Clustering
- DATA MINING AND KNOWLEDGE DISCOVERY
, 2004
"... In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern semantics. By clustering the patterns in the dataset, we infer a clustering of the transactions represented this way. For this, we propose a novel hypergraph model to represent the relations among the patterns. Instead of a local measure that depends only on common items among patterns, we propose a global measure that is based on the cooccurences of these patterns in the overall data. The success of existing hypergraph partitioning based algorithms in other domains depend on sparsity of the hypergraph and explicit objective metrics. For this, we propose a two phase clustering approach for the above hypergraph, which is expected to be dense. In the first phase, the vertices of the hypergraph are merged in a multilevel algorithm to obtain large number of high quality clusters. Here, we propose new quality metrics for merging decisions in hypergraph clustering specifically for this domain. In order to enable the use of existing metrics in the second phase, we introduce a vertex-to-cluster affinity concept to devise a method for constructing a sparse hypergraph based on the obtained clustering. The experiments we have performed show the effectiveness of the proposed framework.
Efficient Discovery of Services Specified in Description Logics Languages
"... Abstract. Semantic service descriptions are frequently given using expressive ontology languages based on description languages. The expressiveness of these languages, however, often implies problems for efficient service discovery, especially when increasing numbers of services become available in ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. Semantic service descriptions are frequently given using expressive ontology languages based on description languages. The expressiveness of these languages, however, often implies problems for efficient service discovery, especially when increasing numbers of services become available in large organizations and on the Web. To remedy this problem, we propose an efficient service discovery/retrieval method grounded on a conceptual clustering approach, where services are specified in Description Logics as class definitions [10] and they are retrieved by defining a class expression as a query and by computing the individual subsumption relationship between the query and the available descriptions. We present a new conceptual clustering method that constructs tree indices for clustered services, where available descriptions are the leaf nodes, while inner nodes are intensional descriptions (generalization) of their children nodes. The matchmaking is performed by following the tree branches whose nodes might satisfy the query. The query answering time may strongly improve, since the number of retrieval steps may decrease from O(n) to O(log n) for concise queries. We also show that the proposed method is sound and complete. 1
Textual Similarity based on Proper Names
- Proceedings of the workshop Mathematical/Formal Methods in Information Retrieval (MFIR’2002) at the 25 th ACM SIGIR Conference
, 2002
"... Proper names represent about 10% of English or French newspaper articles. Their quantity and informational quality is already used in different Information Extraction systems. Proper names have widely been studied in the MUC conferences designed to promote research in Information Extraction. We ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Proper names represent about 10% of English or French newspaper articles. Their quantity and informational quality is already used in different Information Extraction systems. Proper names have widely been studied in the MUC conferences designed to promote research in Information Extraction. We have created our own named entity extraction tool based on a linguistic description with automata. The extracted names are used in an information retrieval process: we want to cluster journalistic texts with a high precision level and to provide a description of the topic of the clusters. We verify the interest in the use of proper names in a similarity measure to improve clustering.
Information Retrieval on the Web: Selected Topics
- IBM research, Tokyo Research Laboratory, IBM
, 1999
"... In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. Although the numerical gures vary, the overall trends cited by the sources are consistent and point to exponential growth during the coming decade. And Internet users are increasingly using search engines and search services to nd speci c information of interest. However, users are not satis ed with the performance of the current generation of search engines; the slow speed of retrieval, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. The main body of our paper focuses on linear algebraic models and techniques for solving these problems. keywords: clustering, indexing, information retrieval, Internet, late...
Vocabulary Problem in Internet Resource Discovery
- in Proceedings of the Second International Workshop on Next Generation Information Technologies and Systems, Naharia
, 1994
"... When searching information in a retrieval system, people use a variety of terms to describe their information needs. When the terms used in a query are different from those indexed by the system, users fail to obtain the information they want. This is called the vocabulary problem. This problem has ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
When searching information in a retrieval system, people use a variety of terms to describe their information needs. When the terms used in a query are different from those indexed by the system, users fail to obtain the information they want. This is called the vocabulary problem. This problem has been studied and discussed in information retrieval for decades. Recently Deerwester et. al proposed a new technique based on singular value decomposition and obtained promising results. In this paper, we describe how to apply this technique to Internet resource discovery.
Efficient Discovery of Services Specified in Description Logics Languages
"... Semantic service descriptions are frequently given using expressive ontology languages based on description languages. The expressiveness of these languages, however, often implies problems for efficient service discovery, especially when increasing numbers of services become available in large orga ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Semantic service descriptions are frequently given using expressive ontology languages based on description languages. The expressiveness of these languages, however, often implies problems for efficient service discovery, especially when increasing numbers of services become available in large organizations and on the Web. To remedy this problem, we propose an efficient service discovery/retrieval method grounded on a conceptual clustering approach, where services are specified in Description Logics as class definitions [10] and they are retrieved by defining a class expression as a query and by computing the individual subsumption relationship between the query and the available descriptions. We present a new conceptual clustering method that constructs tree indices for clustered services, where available descriptions are the leaf nodes, while inner nodes are intensional descriptions (generalization) of their children nodes. The matchmaking is performed by following the tree branches whose nodes might satisfy the query. The query answering time may strongly improve, since the number of retrieval steps may decrease from O(n) to O(log n) for concise queries. We also show that the proposed method is sound and complete.
Adaptive Web Sites: Concept and Case Study
- In Artificial Intelligence
, 2001
"... this article, we consider the basic design decisions that underlie any kind of adaptive web site, and consider several kinds of adaptive web sites by way of illustration. Next, to demonstrate the potential power of adaptive web sites, we consider the index page synthesis case study in more depth. We ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
this article, we consider the basic design decisions that underlie any kind of adaptive web site, and consider several kinds of adaptive web sites by way of illustration. Next, to demonstrate the potential power of adaptive web sites, we consider the index page synthesis case study in more depth. We introduce the IndexFinder page synthesis system and describe its application to a live Web site. More detail about our algorithms and experiments can be found in [6].

