Results 11 - 20
of
131
Unsupervised Learning from Dyadic Data
, 1998
"... Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event co-occurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applic ..."
Abstract
-
Cited by 89 (9 self)
- Add to MetaCart
Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event co-occurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applications ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework for unsupervised learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures and unifies probabilistic modeling and structure discovery. Mixture models provide both, a parsimonious yet flexible parameterization of probability distributions with good generalization performance on sparse data, as well as structural information about data-inherent grouping structure. We propose an annealed version of the standard Expectation Maximization algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
Functional Analysis and
- Semi-Groups, Amer. Math. Soc. Colloq. Publ
, 1957
"... In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by id ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document
Map Displays for Information Retrieval
- Journal of the American Society for Information Science
, 1997
"... The focus of this article is to develop a map display for to stack in the dark, without knowing what stacks that information retrieval. Through an examination of relationships among visual displays, information retrieval, and browsing, advantages of visual displays for informa-tion retrieval are cha ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
The focus of this article is to develop a map display for to stack in the dark, without knowing what stacks that information retrieval. Through an examination of relationships among visual displays, information retrieval, and browsing, advantages of visual displays for informa-tion retrieval are characterized as (1) the ability to con-we have walked through. We can get a few books each time and walk out of the library to see if the book we are searching for is among them. If not, we have to walk vey a large amount of information in a limited space, (2) in again, based on our knowledge and our previous experi-the potential to reveal semantic relationships of terms ence, to a location where we hope the book would be. In and documents, and (3) the facilitation of browsing and perceptual inferences on retrieval interfaces. These ad-vantages are further demonstrated through a map dis-play generated by a neural network’s self-organizing al-this situation, success in finding a book greatly depends on whether we can walk to the right place in the dark (to generate a good query), and whether we know how to gorithm. The map display detects complex relationships adjust our locations until we get to the right place (to among given documents, and reveals the relationships modify queries interactively). through a spatial arrangement of terms abstracted from Can we turn on the light for such a library? Can we the documents. The map display also provides inter-active tools to allow the user to interact with the underlydevelop some visible cues in our retrieval systems so that ing information. Examples of the map displays show that we can use our perception for information seeking in such map displays can be used both as an overview tool the digital environment? Answers to these questions may and an access or exploration tool, and the map displays will likely increase the amount of information that the user is willing to browse. reside in the recent advance of information technology. ‘‘Information processing has been evolving from numeri-cal computation to character handling, and now to visual information processing’ ’ (Kunii, 1989). As a result, visu-
Learning approaches for Detecting and Tracking News Events
- IEEE Intelligent Systems
, 1999
"... This paper studies the effective use of information retrieval and machine learning techniques in a new task, event detection and tracking. The objective is to automatically detect novel events from chronologically-ordered streams of news stories, and track events of interest over time. We extended e ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
This paper studies the effective use of information retrieval and machine learning techniques in a new task, event detection and tracking. The objective is to automatically detect novel events from chronologically-ordered streams of news stories, and track events of interest over time. We extended existing supervised learning and unsupervised clustering algorithms to allow document classification based on both information content and temporal aspects of events. A task-oriented evaluation was conducted using Reuters and CNN news stories. We found agglomerative document clustering highly effective (82% in the F 1 measure) for retrospective event detection, and single-pass clustering with time windowing a better choice for on-line alerting of novel events. We also observed robust learning behavior for k-nearest neighbor (kNN) classification and a decision-tree approach in event tracking, under the difficult condition when the number of positive training examples is extremely small. 1 Intr...
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
DEMON: Mining and Monitoring Evolving Data
- IEEE Transactions on Knowledge and Data Engineering
, 2000
"... Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through periodic or occasional addition and deletion of blocks of data. Most data mining algorithms have either ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Data mining algorithms have been the focus of much research recently. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through periodic or occasional addition and deletion of blocks of data. Most data mining algorithms have either assumed that the input data is static, or have been designed for arbitrary insertions and deletions of data records.
Scalable, Distributed Data Mining Using An Agent Based Architecture
- Proceedings the Third International Conference on the Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California
, 1997
"... : Algorithm scalability and the distributed nature of both data and computation deserve serious attention in the context of data mining. This paper presents PADMA (PArallel Data Mining Agents), a parallel agent based system, that makes an effort to address these issues. PADMA contains modules for (1 ..."
Abstract
-
Cited by 44 (7 self)
- Add to MetaCart
: Algorithm scalability and the distributed nature of both data and computation deserve serious attention in the context of data mining. This paper presents PADMA (PArallel Data Mining Agents), a parallel agent based system, that makes an effort to address these issues. PADMA contains modules for (1) parallel data accessing operations, (2) parallel hierarchical clustering, and (3) webbased data visualization. This paper describes the general architecture of PADMA and experimental results. Scalable, Distributed Data Mining Using An Agent Based Architecture Hillol Kargupta, Ilker Hamzaoglu, Brian Stafford Computational Science Methods Group X Division, Los Alamos National Laboratory P.O. Box 1663, MS F645 Los Alamos, NM, 87545 LAUR-96-3491, shorter version published in the Proceedings of High Performance Computing'97 & Knowledge Discovery and Data Mining'97 Abstract Algorithm scalability and the distributed nature of both data and computation deserve serious attention in the contex...
Evaluating Document Clustering for Interactive Information Retrieval
- In Proceedings of the tenth International Conference on Information and Knowledge Managment (CIKM
, 2001
"... We consider the problem of organizing and browsing the top ranked portion of the documents returned by an information retrieval system. We study the effectiveness of a document organization in helping a user to locate the relevant material among the retrieved documents as quickly as possible. In thi ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
We consider the problem of organizing and browsing the top ranked portion of the documents returned by an information retrieval system. We study the effectiveness of a document organization in helping a user to locate the relevant material among the retrieved documents as quickly as possible. In this context we examine a set of clustering algorithms and experimentally show that a clustering of the retrieved documents can be significantly more effective than traditional ranked list approach. We also show that the clustering approach can be as effective as the interactive relevance feedback based on query expansion while retaining an important advantage -- it provides the user with a valuable sense of control over the feedback process.
Document Expansion for Speech Retrieval
, 1999
"... Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of "aboutness" for documents --- is a document about a certain concept --- has been at the core of document indexing for the entire history of IR. This p ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of "aboutness" for documents --- is a document about a certain concept --- has been at the core of document indexing for the entire history of IR. This problem is more difficult for speech indexing since automatic speech transcriptions often contain mistakes. In this study we show that document expansion can be successfully used to alleviate the effect of transcription mistakes on speech retrieval. The loss
The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data
- In IJCAI’ 99
, 1999
"... This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster{Abstraction Model (CAM), is purely data driven and utilizes context-speci c word occurrence statistics. In an intertwined fashion, the CA ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster{Abstraction Model (CAM), is purely data driven and utilizes context-speci c word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. An annealed version of the Expectation{Maximization (EM) algorithm for maximum likelihood estimation of the model parameters is derived. The bene ts of the CAM for interactive retrieval and automated cluster summarization are investigated experimentally. 1

