Results 1  10
of
18
A Probabilistic Learning Approach for Document Indexing
 ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 1991
"... We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information fo ..."
Abstract

Cited by 93 (13 self)
 Add to MetaCart
We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation. (2) Flexibility of the representation, which allows the integration of new text analysis and knowledgebased methods in our approach as well as the consideration of document structures or different types of terms. (3) Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information. Our approach can be applied under restrictions that hold for real applications. We give experimental results for five test collections which show improvements over other indexing methods.
Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms
 Journal of the American Society for Information Science
, 1995
"... Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 198Os, knowledgebased techniques also made an impressive contribution to “intelligent ” information retrieval ..."
Abstract

Cited by 66 (9 self)
 Add to MetaCart
Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 198Os, knowledgebased techniques also made an impressive contribution to “intelligent ” information retrieval and indexing. More recently, information science researchers have turned to other newer artificialintelligencebased inductive learning techniques including neural networks, symbolic learning, and genetic algorithms. These newer techniques, which are grounded on diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information storage and retrieval systems. In this article, we first provide an overview of these newer techniques and their use in information science research. To familiarize readers with these techniques, we present three popular methods: the connectionist Hopfield network; the symbolic ID3/ID5R; and evolutionbased genetic algorithms. We discuss their knowledge representations and algorithms in the context of information retrieval. Sample implementation and testing results from our own research are also provided for each technique. We believe these techniques are promising in their ability to analyze user queries, identify users ’ information needs, and suggest alternatives for search. With proper usersystem interactions, these methods can greatly complement the prevailing fulltext, keywordbased, probabilistic, and knowledgebased techniques.
A probabilistic framework for vague queries and imprecise information in databases
 PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1990
"... A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objec ..."
Abstract

Cited by 58 (13 self)
 Add to MetaCart
A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved. For specifying different kinds of conditions in vague queries, the notion of vague predicates is introduced. Based on the underlying probabilistic model, also imprecise or missing attribute values can be treated easily. In addition, the corresponding formulas can be applied in combination with standard predicates (from twovalued logic), thus extending standard database systems for coping with missing or imprecise data.
Probabilistic Information Retrieval as Combination of Abstraction, Inductive Learning and Probabilistic Assumptions
, 1994
"... We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing with a controlled ...
On the Necessity of Term Dependence in a Query Space for Weighted Retrieval
 Journal of the American Society for Information Science
, 1998
"... In recent years, in the context of the vector space model, the view, held by many researchers, that documents, queries, terms, etc. are all elements of a common space has been challenged (BollmannSdorra and Raghavan, 1993). In particular, it was noted that term independence has to be investigated i ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
In recent years, in the context of the vector space model, the view, held by many researchers, that documents, queries, terms, etc. are all elements of a common space has been challenged (BollmannSdorra and Raghavan, 1993). In particular, it was noted that term independence has to be investigated in the context of user preferences and it was shown, through counter examples, that term independence can hold in the document space, but not in the query space and viceversa. In this paper, we continue the investigation of query and document spaces with respect to the property of term independence. We prove, under realistic assumptions, that requiring term independence to hold in the query space is inconsistent with the goal of achieving better performance by means of weighted retrieval. The result that term independence in the query space is undesirable is obtained without making any assumption about whether or not the property of term independence holds in the document space. The result...
Combining ModelOriented and DescriptionOriented Approaches for Probabilistic Indexing
"... We distinguish modeloriented and descriptionoriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
We distinguish modeloriented and descriptionoriented approaches in probabilistic information retrieval. The former refer to certain representations of documents and queries and use additional independence assumptions, whereas the latter map documents and queries onto feature vectors which form the input to certain classification procedures or regression methods. Descriptionoriented approaches are more flexible with respect to the underlying representations, but the definition of the feature vector is a heuristic step. In this paper, we combine a probabilistic model for the Darmstadt Indexing Approach with logistic regression. Here the probabilistic model forms a guideline for the definition of the feature vector. Experiments with the purely theoretical approach and with several heuristic variations show that heuristic assumptions may yield significant improvements.
Setbased vector model: An efficient approach for correlationbased ranking
 ACM Transactions on Information Systems
, 2005
"... This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term cooccurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. Th ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term cooccurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. This leads to a new ranking mechanism called the setbased vector model. The components of our model are no longer index terms but index termsets, where a termset is a set of index terms. Termsets capture the intuition that semantically related terms appear close to each other in a document. They can be efficiently obtained by limiting the computation to small passages of text. Once termsets have been computed, the ranking is calculated as a function of the termset frequency in the document and its scarcity in the document collection. Experimental results show that the setbased vector model improves average precision for all collections and query types evaluated, while keeping computational costs small. For the 2 gigabyte TREC8 collection, the setbased vector model leads to a gain in average precision figures of 14.7 % and 16.4 % for disjunctive and conjunctive queries, respectively, with respect to the standard vector space model. These gains increase to 24.9 % and 30.0%, respectively, when proximity information is taken into account. Query processing times are larger but, on average, still comparable to those obtained
A Bayesian Approach to User Profiling In Information Retrieval
 TECHNOLOGY LETTERS
, 2000
"... Numerous probability models have been suggested for information retrieval (IR) over the years. These models have been applied to try to manage the inherent uncertainty in IR, for instance, document and query representation, relevance feedback, and evaluating the effectiveness of IR system. On ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Numerous probability models have been suggested for information retrieval (IR) over the years. These models have been applied to try to manage the inherent uncertainty in IR, for instance, document and query representation, relevance feedback, and evaluating the effectiveness of IR system. On the other hand, Bayesian networks have become an established probabilistic framework for uncertainty management in artificial intelligence. In this
Optimum Probability Estimation from Empirical Distributions
 Information Processing and Management
, 1989
"... Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. nonbinary features, estimation sample selection. Then we define an optimum estimate for binary features which can be applied to various typical estimation problems in IR. A method for computing this estimate using empirical data is described. Some experiments show the applicability of our method, whereas comparable approaches are partially based on false assumptions or yield biased estimates. 1 Parameter estimation in IR In IR the development of theoretical models and their evaluation in experiments is of equal importance: A model which cannot be evaluated (applied) is of very little use, while an evaluation can show its weaknesses and strengths and give evidence for further developments. As will be discussed below, any evaluation in IR involves some kind of parameter estimation, even for nonprobabilistic models. So it is interesting to note that the problem of parameter estimation has been discussed only by a few authors ( [Rijsbergen 77], [Robertson & Bovey 82], [Bookstein 83], [?]). In this paper, an attempt is
A Medical Text Classification Agent Using Snomed And Formal Concept Analysis
 COMPUTER SCIENCE DEPARTMENT, THE UNIVERSITY OF ADELAIDE
, 1995
"... This report describes a new approach to text retrieval systems. It firstly reviews current and previous approaches to information retrieval and then describes the construction of a text classification agent using SNOMED and formal concept analysis. A set of 9,000 medical patient discharge summaries ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This report describes a new approach to text retrieval systems. It firstly reviews current and previous approaches to information retrieval and then describes the construction of a text classification agent using SNOMED and formal concept analysis. A set of 9,000 medical patient discharge summaries from the Thorasic unit at the Royal Adelaide Hospital are indexed using SNOMED (Systematized Nomenclature of Medicine). The discharge summaries are semistructured free text documents. The structure of the discharge summaries is described using SGML (Standardized Generalized Markup Language). Information from the SNOMED hierarchy is merged with information about the medical domain explicated in the documents by the ways in which SNOMED concepts are combined. This merge of information, from two differently structured sources, is achieved by embedding SNOMED in a formal concept lattice. A user submits a query as the result of a walk through the SNOMED hierarchy. This walk specifies a mapping i...