## discriminant model for information retrieval (2005)

### Cached

### Download Links

Venue: | In the Proceedings of SIGIR’2005 |

Citations: | 56 - 16 self |

### BibTeX

@INPROCEEDINGS{Gao05discriminantmodel,

author = {Jianfeng Gao and Haoliang Qi and Xinsong Xia and Jian-yun Nie},

title = {discriminant model for information retrieval},

booktitle = {In the Proceedings of SIGIR’2005},

year = {2005},

pages = {290--297}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic features that are derived from the component models of HMM that is widely used in language modeling approaches to IR. Therefore, LDM is a means of melding discriminative and generative models for IR. We present two algorithms of parameter learning for LDM. One is to optimize the average precision (AP) directly using an iterative procedure. The other is a perceptron-based algorithm that minimizes the number of discordant document-pairs in a rank list. The effectiveness of our approach has been evaluated on the task of ad hoc retrieval using six English and Chinese TREC test sets. Results show that (1) in most test sets, LDM significantly outperforms the state-of-the-art language modeling approaches and the classical probabilistic retrieval model; (2) it is more appropriate to train LDM using a measure of AP rather than likelihood if the IR system is graded on AP; and (3) linguistic features (e.g. phrases and dependences) are effective for IR if they are incorporated properly.

### Citations

9804 | The nature of statistical learning theory
- Vapnik
- 1995
(Show Context)
Citation Context ...kely y. The latter model the posterior P(y|x) directly. Recently, discriminative classifiers are preferred to generative ones due to several compelling reasons, one of which, as pointed out by Vapnik =-=[29]-=-, is that “one should solve a (classification) problem directly and avoid solving a more general problem as an intermediate step (such as modeling P(x|y)).” As discussed in [19], most of the existing ... |

2095 |
Pattern Classification
- Duda, Hart, et al.
- 2001
(Show Context)
Citation Context ...hich will be described in the next section. 3. LDM for IR Linear discriminant model in this study follows the general framework of linear discriminant functions widely used for pattern classification =-=[4]-=-, and has been recently introduced into NLP tasks in [2]. In the LDM framework, we assume a set of N+1 features fi(q, c, d), for i = 0, …, N. The features are arbitrary functions that map (q, c, d) to... |

1535 |
Making large-scale SVM learning practical
- Joachims
- 1999
(Show Context)
Citation Context ...se difference vector λ(f(q, di) – f(q, dj)), and can be solved using decomposition algorithms similar to those used for SVM classification. In our experiments, we have adapted the SVM light algorithm =-=[13]-=-. It achieves similar performance to that of the perceptron-based algorithm. As described in Section 3.2, the IR problem can be cast as a special case of ordinal regression discussed in [12]. In ordin... |

1197 |
Practical Methods of Optimization
- Fletcher
- 1987
(Show Context)
Citation Context ... the LDM approach in large scale TREC experiments. We leave it to future work. Finally, it is worth noting that the MaxAP algorithm is a simple example of the non-smooth optimization (NSO) algorithms =-=[5]-=-. Most parameter learning problem in NLP and IR tasks can be considered as a multi-dimensional function optimization problem. However, the objective function, such as the classification error rate of ... |

945 | A language modeling approach to information retrieval
- PONTE, B
- 1998
(Show Context)
Citation Context ...n Language modeling (LM) approaches to information retrieval (IR) assume that the relevance of a document, given a query, can be estimated as the generative probability of the query from the document =-=[23]-=-. One of the most appealing properties of this approach is its ability to incorporate linguistic information of language such as phrase and dependences into the retrieval model in a systematic manner.... |

925 | Optimizing search engines using clickthrough data
- Joachims
- 2002
(Show Context)
Citation Context ... starting point, and pick the parameter setting that achieves the maximal AP. 3.2 Perceptron-based Training This section first formulates the ranking problem under the framework of ordinal regression =-=[14, 12]-=-; then presents a loss function which is closely associated with AP, and a perceptronbased algorithm to optimize the parameter setting with respect to the loss function. The ranking problem in IR can ... |

553 | An efficient boosting algorithm for combining preferences
- Freund, Iyer, et al.
- 1998
(Show Context)
Citation Context ...special case of ordinal regression discussed in [12]. In ordinal regression, all objects are ranked on the same scale, while in IR documents need to be ranked with respect to one query. Freund et al. =-=[6]-=- proposed a learning approach based on boosting algorithm to linearly combining multiple ranks provided by experts. If we consider each expert as a feature function, it appears to be equivalent to our... |

394 | On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes - Ng, Jordan - 2001 |

386 | Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
- Robertson, Walker
- 1994
(Show Context)
Citation Context ...ls make their prediction by using the Bayes rules to estimate P(x|y). Similarly, [19] shows that the classical probabilistic models including the BIR model [15] and its variant, the two-Poisson model =-=[26]-=-, also belong to generative models. Although all these generative models achieve state-ofthe-art performance in large scale IR experiments, there are several appealing reasons to explore discriminativ... |

342 | Learning to order things
- Cohen, Schapire, et al.
- 1998
(Show Context)
Citation Context ...systems, we have to handle unseen queries which might be very different from previously seen queries. We leave the extension of the method to future work. Earlier work along this line can be found in =-=[1]-=-. Another similar approach is the Pranking algorithm proposed by Crammer and Singer [3]. Unlike the perceptron-basedsalgorithm described in Section 3.2, which reduces the total rank into a set of docu... |

321 | Document language models, query models, and risk minimization for information retrieval
- Lafferty, Zhai
- 2001
(Show Context)
Citation Context ...neral problem as an intermediate step (such as modeling P(x|y)).” As discussed in [19], most of the existing retrieval models can be viewed as generative models. For example, in the LM approach to IR =-=[17, 30]-=-, assume each document is a unique class (y), and the task of IR model is to classify a query (x) into its most likely class as given by the posterior P(y|x). Then, language models make their predicti... |

314 |
Large margin rank boundaries for ordinal regression
- Herbrich, Obermayer, et al.
- 2000
(Show Context)
Citation Context ... starting point, and pick the parameter setting that achieves the maximal AP. 3.2 Perceptron-based Training This section first formulates the ranking problem under the framework of ordinal regression =-=[14, 12]-=-; then presents a loss function which is closely associated with AP, and a perceptronbased algorithm to optimize the parameter setting with respect to the loss function. The ranking problem in IR can ... |

297 | A probabilistic model of information retrieval: development and comparative experiments - JONES, WALKER, et al. - 2000 |

219 | Two-Stage language models for information retrieval
- Zhai, Lafferty
- 2002
(Show Context)
Citation Context ... by the probability that a query q = {q1,…,qm} would be generated from the respective document model P(q|d). Most state-of-the-art approaches assume a unigram Markov model, where P(q|d)=∏i=1…mP(qi|d) =-=[30]-=-. Hidden Markov Model is an extension of the Markov model by introducing hidden variables. In the context of language modeling, the hidden variables can be used to represent any linguistic concepts th... |

203 |
A hidden Markov model information retrieval system
- Miller, Leek, et al.
- 1999
(Show Context)
Citation Context ...ropriate algorithm. The second reason concerns the flexibility of incorporating arbitrary features. In generative models, as described earlier, the incorporation can be achieved in two ways (see e.g. =-=[9, 20, 18, 28]-=-). The first approach is to model those features as hidden variables of HMM and integrate them into the generation process. The second approach is to model those features independently, and combine th... |

201 | A general language model for information retrieval
- Song, Croft
- 1999
(Show Context)
Citation Context ...ropriate algorithm. The second reason concerns the flexibility of incorporating arbitrary features. In generative models, as described earlier, the incorporation can be achieved in two ways (see e.g. =-=[9, 20, 18, 28]-=-). The first approach is to model those features as hidden variables of HMM and integrate them into the generation process. The second approach is to model those features independently, and combine th... |

173 | Pranking with ranking
- Crammer, Singer
- 2001
(Show Context)
Citation Context ... seen queries. We leave the extension of the method to future work. Earlier work along this line can be found in [1]. Another similar approach is the Pranking algorithm proposed by Crammer and Singer =-=[3]-=-. Unlike the perceptron-basedsalgorithm described in Section 3.2, which reduces the total rank into a set of document pairs, the Pranking algorithm maintains a total ordered set via project, and adjus... |

143 |
Minimum Classification Error Rate Methods for Speech Recognition
- Juang, Chou, et al.
- 1997
(Show Context)
Citation Context ...leave it to further study. An alternative approach to the NSO problem is to use an approximated but smoothed objective function that can be easily optimized, such as the one suggested by Juang et al. =-=[16]-=-. The comparison of those NSO methods forms another area of future work. 6. Conclusions We have presented a discriminative model for IR, referred to as LDM. It provides a flexible framework to incorpo... |

80 | Discriminative models for information retrieval
- Nallapati
- 2004
(Show Context)
Citation Context ... pointed out by Vapnik [29], is that “one should solve a (classification) problem directly and avoid solving a more general problem as an intermediate step (such as modeling P(x|y)).” As discussed in =-=[19]-=-, most of the existing retrieval models can be viewed as generative models. For example, in the LM approach to IR [17, 30], assume each document is a unique class (y), and the task of IR model is to c... |

72 |
The Eleventh Text Retrieval
- Voorhees, Buckland
- 2002
(Show Context)
Citation Context ...lish queries are TREC topics 201 to 250 (description field only) on TREC disks 2 and 3. Those topics are “natural language” queries consisting of one sentence each of length 10 to 15 words. Following =-=[11]-=-, for the three English TREC collections, we remove those queries that have no relevant document. The Chinese queries are TREC topics CH1 to CH79. We use long queries that contain the title, descripti... |

31 | Capturing term dependencies using a language model based on sentence trees
- Nallapati, Allan
- 2002
(Show Context)
Citation Context ...ropriate algorithm. The second reason concerns the flexibility of incorporating arbitrary features. In generative models, as described earlier, the incorporation can be achieved in two ways (see e.g. =-=[9, 20, 18, 28]-=-). The first approach is to model those features as hidden variables of HMM and integrate them into the generation process. The second approach is to model those features independently, and combine th... |

27 | Microsoft Cambridge at TREC-9: Filtering track
- Robertson, Walker
- 2000
(Show Context)
Citation Context ...stem, which is the best-known implementation of BIR. Among the great number of term weighting functions provided by Okapi, we choose BM2500 for it has achieved good performance in previous experiments=-=[27]-=-. UGM (Unigram Model) is an implementation of the unigram language model approach to IR proposed in [30]. It serves as the baseline LM approach in our experiments. Over all six TREC test sets, UGM ach... |

24 | Dependency tree translation: Syntactically informed phrasal SMT
- Quirk, Menezes, et al.
- 2005
(Show Context)
Citation Context ...imal λ as well as its corresponding AP(.) by traversing the sequence. In our experiments, we found that the MaxAP algorithm can converge on different maxima given different starting points. Following =-=[25]-=-, we attempt to perform the algorithm multiple times, each from a different, random starting point, and pick the parameter setting that achieves the maximal AP. 3.2 Perceptron-based Training This sect... |

18 | Minimum sample risk methods for language modeling
- Gao, Yu, et al.
- 2005
(Show Context)
Citation Context ...t sets. This is largely due to its property of directly optimizing the performance measure (i.e. AP). Similar observations have been reported in the experiments on machine translation [22, 25] and LM =-=[7]-=-. Though the method shows empirical benefits, the lack of theoretical underpinnings (such as optimality and stability) is the major concern. We leave it to further study. An alternative approach to th... |

17 | The Use of Clustering Techniques for Language Modeling - Application to Asian languages
- Gao, Goodman, et al.
- 2001
(Show Context)
Citation Context ...igram probability was linearly interpolated with the unigram probability. All n-gram (n = 1 or 2 in BGM) probabilities are estimated via MLE with a modified version of the absolute discount smoothing =-=[10]-=-. BGM can be viewed as a special case of HMM described in Section 2, where the concept sequence c is a sequence of adjacent word pairs. Results show that BGM substantially outperforms UGM in all Engli... |

1 |
Andi Wu and Changning Huang. 2004. A pragmatic approach to Chinese word segmentation. Tech-Report of Microsoft Research
- Gao, Li
(Show Context)
Citation Context ...concept sequence c for each sentence, which is used for constructing LDM, we process the collections as follows. All Chinese texts have been word-segmented using the word segmentation system MSRSeg 2 =-=[8]-=-. The system also identifies factoids and named entities of various types. We then used an in-house HMM chunk parser to detect phrases such as NP and VP, as described in Table 1. Similarly, all Englis... |

1 |
Jian-Yun Nie, Guangyuan Wu and Guihong Cao. 2004. Dependence language model for information retrieval
- Gao
(Show Context)
Citation Context ...ost appealing properties of this approach is its ability to incorporate linguistic information of language such as phrase and dependences into the retrieval model in a systematic manner. For example, =-=[9]-=- assumes Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or comme... |