#### DMCA

## Dimensionality Reduction and Representation for Nearest Neighbour Learning (1999)

Citations: | 4 - 1 self |

### Citations

6599 |
Neural Networks for Pattern Recognition
- Bishop
(Show Context)
Citation Context ...John, Kohavi, & P eger 1994; Payne & Edwards 1996; Payne & Edwards 1998a), and whether any of the attributes can be combined to generate higher-order dimensions (Michie, Spiegelhalter, & Taylor 1994; =-=Bishop 1995-=-; Wu, Berry, Shivakumar, & McLarty 1995; Payne & Edwards 1998a). If the dimensionality of the input space is large, but the number of instances is small, the distribution of instances within this inpu... |

6599 |
C4.5: Programs for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...classications are performed. The nearest neighbour algorithm requires less computation time during the training phase than most eager learning algorithms (such as the rule induction algorithm, C4.5 (=-=Quinlan 1993-=-)). However, the consequence of using this lazy learning approach is that the computational cost of classifying a new query instance can be high. The power of the nearest neighbour approach has been d... |

5960 | Classification and regression trees - Breiman, Friedman, et al. - 1984 |

5148 | Optimization by simulated annealing - Kirkpatrick, Gelatt, et al. - 1983 |

4371 | Induction of decision trees
- Quinlan
- 1986
(Show Context)
Citation Context ...orithms have been developed which use a variety of metrics as part of their learning bias to select relevant attributes when building decision trees. Such metrics include the information gain metric (=-=Quinlan, 1986-=-) or the distance-based gain ratio (De Mantaras, 1991). Studies have shown that the biases used by rule induction algorithms to favour smaller numbers of attributes and smaller decision trees fail to... |

3770 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...ny text categorisation systems are similar to those categorised by theslter model. Table 3.4 lists a sample of the dierent evaluation methods used by various systems. Latent Semantic Indexing (LSI) (=-=Deerwester et al., 1990-=-; Schutze et al., 1995; Weiner et al., 1995) uses an orthogonal decomposition technique to determine a lower dimensional representation for each document. A corpus can be expressed as a term by docum... |

3696 | Learning internal representations by error propagation - Rumelhart, Hinton, et al. - 1986 |

3464 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ... algorithms on a variety of domains. These approaches will be combined with a nearest neighbour algo1.4 Thesis Overview 24 rithm, and systematically evaluated over a number of standard UCI data sets (=-=Merz & Murphy 1996-=-). A dimensionality reduction technique known as Latent Semantic Indexing has been successfully used to reduce the number of attributes used to represent documents. This technique, which is similar ... |

1389 | Instance-based learning algorithms
- Aha, Kilber, et al.
- 1991
(Show Context)
Citation Context ...ased learning algorithms, some of which reduce the number of instances needed to represent a concept (Aha, Kibler, & Albert 1991), and attempt to handle instances with noisy or irrelevant attributes (=-=Aha & Kibler 1989-=-; Aha 1992b). MBRtalk (Stanll & Waltz 1986) denes similarity over symbolic values, and the EACH algorithm (Salzberg 1991a) uses the Nested Generalised Exemplar (NGE) theory to create compact represe... |

1280 | A study of cross-validation and bootstrap for accuracy estimation and model selection
- Kohavi
- 1995
(Show Context)
Citation Context ...d test sets. The training set is used by the learning algorithm to generate the target concept, and the test set is used to evaluate its accuracy or error rate. Various partitioning strategies exist (=-=Kohavi 1995-=-), and are discussed in detail in Sections 5.1.3 and 6.2. 2.2 The Nearest Neighbour Learning Paradigm 29 2.2 The Nearest Neighbour Learning Paradigm The nearest neighbour learning paradigm has been th... |

1260 |
Agents that Reduce Work and Information Overload
- Maes
- 1997
(Show Context)
Citation Context ...oming increasingly diÆcult to identify what documents or articles are interesting or relevant to the user. For this reason, information agents (Mitchell, Caruana, Freitag, McDermott, & Zabowski 1994; =-=Maes 1994-=-) have been proposed as one possible solution to the problem of assisting users in accessing relevant information and performing simple tasks. Many information agents have been described as personal a... |

829 | Multi-interval discretization of continuous-valued attributes for classification learning
- Fayyad, Irani
- 1993
(Show Context)
Citation Context ...rically. Some existing learning algorithms utilise n-fold cross validation techniques to identify suitable estimates (Kohavi & John 1995), or utilise a stopping criteria to identify partition points (=-=Fayyad & Irani 1993-=-). An automated method for reliably estimating values for the threshold and the rank of these algorithms would be extremely useful. Appendix A A Summary of the Attribute Selection Results This Appendi... |

813 |
Cluster Analysis
- EVERITT
- 1993
(Show Context)
Citation Context ...y iii) D(i; j) +D(j; k) D(i; k) Triangle Inequality Several distance metrics have been proposed (Wilson & Martinez 1997), and include the Chi-square metric (Anderberg 1973), the Mahalanobis metric (=-=Everitt 1974-=-), the Nonlinear metric (Devijer & Kittler 1982), the Cosine Similarity metric (Salton & McGill 1983), the Quadratic metric (Fukunaga & Flick 1984), the Minkowskian metric (Salzberg 1991b), the Modie... |

770 | Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm - Littlestone - 1988 |

755 | Irrelevant Features and the Subset Selection Problem
- John, Kohavi, et al.
- 1994
(Show Context)
Citation Context ...represent a drop in accuracy. Several other studies have investigated the utility of the wrapper method for attribute selection when learning with a rule induction algorithm (Caruana & Freitag, 1994; =-=John et al., 1994-=-; Kohavi, 1994) or a Nearest Neighbour algorithm (Salzberg, 1992; Aha & Bankert, 1994; Moore & Lee, 1994; Skalak, 1994) (see Table 3.2). The rule induction studies utilised variants of the Forward Sel... |

559 | Newsweeder: Learning to filter netnews
- LANG
- 1995
(Show Context)
Citation Context ...997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (=-=Lang, 1995-=-), clustering tech1.1 Introduction 18 niques (Green & Edwards, 1996), relational learning algorithms (Cohen, 1995) and nearest neighbour (or instance based) techniques (Metral, 1993; Kozierok & Maes, ... |

529 |
A practical approach to feature selection
- Kira, Rendell
- 1992
(Show Context)
Citation Context ...ation column indicates the method used to update weights during the evaluation phase. The last column refers either to the type of algorithm used during attribute selection, or in the case of RELIEF (=-=Kira & Rendell, 1992-=-a) and the extensions to RELIEF (Kononenko, 1994), to the learning algorithm used once the attributes have been 3.4 Weighted Model 57 A u t h o r s ( S y s t e m ) S e l e c t i o n E v a l u a t i o ... |

489 |
Theory and Applications of Correspondence Analysis
- Greenacre
- 1984
(Show Context)
Citation Context ... Indexing has been successfully used to reduce the number of attributes used to represent documents. This technique, which is similar to a subspace mapping technique known as Correspondence Analysis (=-=Greenacre 1984-=-), will be used to reduce the dimensionality of a number of standard UCI data sets, which will then be presented to a nearest neighbour learning algorithm. The resulting behaviour will be compared and... |

474 | Estimating Attributes: Analysis and Extensions of RELIEF
- Kononenko
- 1994
(Show Context)
Citation Context ...hts during the evaluation phase. The last column refers either to the type of algorithm used during attribute selection, or in the case of RELIEF (Kira & Rendell, 1992a) and the extensions to RELIEF (=-=Kononenko, 1994-=-), to the learning algorithm used once the attributes have been 3.4 Weighted Model 57 A u t h o r s ( S y s t e m ) S e l e c t i o n E v a l u a t i o n T e s t i n g A l g . A h a , 1 9 9 2 b W e i ... |

423 | Discriminatory analysis – nonparametric discrimination: consistency properties - Fix, Hodges - 1951 |

365 |
The Feature Selection Problem: Traditional Methods and a New Algorithm
- Kira, Rendell
- 1992
(Show Context)
Citation Context ...ation column indicates the method used to update weights during the evaluation phase. The last column refers either to the type of algorithm used during attribute selection, or in the case of RELIEF (=-=Kira & Rendell, 1992-=-a) and the extensions to RELIEF (Kononenko, 1994), to the learning algorithm used once the attributes have been 3.4 Weighted Model 57 A u t h o r s ( S y s t e m ) S e l e c t i o n E v a l u a t i o ... |

352 | Syskill and webert: Identifying interesting web sites,
- Pazzani, Muramatsu, et al.
- 1996
(Show Context)
Citation Context ... 1992; Payne, 1994; Bayer, 1995; Cohen, 1996a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (=-=Pazzani et al., 1996-=-), Minimum Description Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 niques (Green & Edwards, 1996), relational learning algorithms (Cohen, 1995) and nearest neighbour (or instanc... |

338 |
neighbor (NN) Norms: NN pattern classification techniques. Los Alamitos,
- Dasarathy
- 1990
(Show Context)
Citation Context ...eir study of the nonparametric discrimination problem, in which they described a number of dierent procedures and demonstrated that these had asymptotically optimum properties for large sample sets (=-=Dasarathy 1991-=-). However, it was not until 1967 that Cover & Hart formally dened the nearest neighbour rule and applied it to the problem of pattern classication. Many aspects of this classication algorithm have... |

336 | A comparison of two learning algorithms for text categorization". - Lewis, Ringuette - 1994 |

309 | A weighted nearest neighbor algorithm for learning with symbolic features.
- Cost, Salzberg
- 1993
(Show Context)
Citation Context ...emonstrated in a number 2.2 The Nearest Neighbour Learning Paradigm 30 of real-world domains, such as in the pronunciation of English words (Stanll & Waltz 1986), recognition of DNA & RNA sequences (=-=Cost & Salzberg 1993-=-), thyroid disease diagnosis (Kibler & Aha 1987), speech recognition (Bradshaw 1987), clinical audiology diagnosis (Bareiss & Porter 1987), meeting predictions (Kozierok & Maes 1993) and Internet info... |

252 | Learning with many irrelevant features
- Almuallim, Dietterich
- 1991
(Show Context)
Citation Context ...own that the biases used by rule induction algorithms to favour smaller numbers of attributes and smaller decision trees fail tosnd the minimal subset of attributes necessary to identify the concept (=-=Almuallim & Dietterich 1991-=-). John et al. (1994) have attempted to dene the notion of attribute `relevance'. They claim that there are two degrees of relevance: Strong Relevance An attribute is indispensable in learning a conc... |

246 | Experience with a Learning Personal Assistant
- Mitchell, Caruana, et al.
- 1994
(Show Context)
Citation Context ...ing genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; Payne, 1994; Bayer, 1995; Cohen, 1996a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; =-=Mitchell et al., 1994-=-; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 niques (Green & Edwards, ... |

236 |
Choice, similarity, and the context theory of classi"cation,
- Nosofsky
- 1984
(Show Context)
Citation Context ...ned as follows: D(i; j) = A max a=0 ji a j a j (2.4) Various psychological studies have argued that the Manhattan metric is appropriate for domains that have separable (i.e. orthogonal) dimensions (=-=Nosofsky 1984-=-). Salzberg (1991b) investigated the performance of both distance metrics to determine whether or not this hypothesis could be tested, but failed tosnd any signicant dierence between either distance... |

231 |
Pattern Classi and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...e space. However, a distance metric is necessary to determine the relative distance between two instances. A distance metric should satisfy the following criteria for all points within a given space (=-=Duda & Hart 1973-=-): i) D(i; j) 0 and D(i; j) = 0 if and only if i = j Positivity ii) D(i; j) = D(j; i) Symmetry iii) D(i; j) +D(j; k) D(i; k) Triangle Inequality Several distance metrics have been proposed (Wilson... |

218 | Greedy attribute selection.
- Caruana, Freitag
- 1994
(Show Context)
Citation Context ...ill climbing approach (Devijer & Kittler 1982) to search for sub-optimal solutions. These methods have been investigated using 6.3 Wrapper Method Framework 110 a variety of other learning algorithms (=-=Caruana & Freitag 1994-=-; John, Kohavi, & P eger 1994; Langley & Sage 1994b; Moore & Lee 1994; Singh & Provan 1995). The two methods dier in their starting conditions; the Forward Selection algorithm starts with no members ... |

210 | Learning trees and rules with set-valued features. - Cohen - 1996 |

198 | Learning rules that classify e-mail.
- Cohen
- 1996
(Show Context)
Citation Context ...rning mechanisms have been employed within intelligent information agents, including genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; Payne, 1994; Bayer, 1995; =-=Cohen, 1996-=-a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description L... |

194 | Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories.
- Barsalou
- 1985
(Show Context)
Citation Context ...) uses the Nested Generalised Exemplar (NGE) theory to create compact representations of concepts. There are also advantages to the nearest neighbour approach. Such methods can learn graded concepts (=-=Barsalou 1985-=-; Aha 1989), relational concepts (Emde & Wettschereck 1996), and provide a basis for exploring prototypical learning (Zhang 1992; Biberman 1995; Datta & Kibler 1995; Datta & Kibler 1997). There have a... |

181 | Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. - AHA - 1992 |

179 |
Representation and Learning in Information Retrieval
- Lewis
- 1992
(Show Context)
Citation Context ... & Maes, 1993; Green & Edwards, 1996; Pazzani et al., 1996; Payne et al., 1997; Boone, 1998). There are a number of dierences between the task of learning from documents and other learning problems (=-=Lewis, 1992-=-). The organisation of text within most documents is generally either unstructured (i.e. single paragraphs of free text) or semi-structured, such as electronic mail messages (Malone et al., 1987). Doc... |

176 |
Numerical recipes in C: the art of scienti computing
- 17Press, Teukolsky, et al.
- 1992
(Show Context)
Citation Context ... space can then be approximated (by approximating the decomposed matrices) resulting in a lower dimensional representation of the points in the approximated space. Singular Value Decomposition (SVD) (=-=Press, 1992-=-; Greenacre, 1984, Appx A.) 3.5 IR/Text Categorisation Approaches 59 T e c h n i q u e S t u d y L e a r n i n g A l g o r i t h m s I n f o r m a t i o n G a i n L e w i s & R i n g u e t t e , 1 9 9... |

154 | A Probibilistic Approach to Feature Selection—A Filter Solution
- Liu, Setiono
- 1996
(Show Context)
Citation Context ...best = eval; 14s15 done 16 return(selectset): Figure 6.7: The Monte Carlo Algorithm. The Monte Carlo algorithm utilises a stochastic, or random sample approach to search the state space (Skalak 1994; =-=Liu & Setiono 1996-=-a; Liu & Setiono 1996b). Unlike the search methods described above, this method does not traverse the search space in search of sub-optimal states, but rather evaluates asxed number of random states. ... |

150 | Efficient algorithms for minimizing cross validation error.
- Moore, Lee
- 1994
(Show Context)
Citation Context ...lutions. These methods have been investigated using 6.3 Wrapper Method Framework 110 a variety of other learning algorithms (Caruana & Freitag 1994; John, Kohavi, & P eger 1994; Langley & Sage 1994b; =-=Moore & Lee 1994-=-; Singh & Provan 1995). The two methods dier in their starting conditions; the Forward Selection algorithm starts with no members in its attribute subset, and incrementally adds new attributes to thi... |

141 | A comparative evaluation of sequential feature selection algorithms,” Learning from
- Aha, Bankert
- 1996
(Show Context)
Citation Context ...ss 3 Figure 3.8: The 13-dimension UCI Wine data set approximated in a 2-dimensional subspace. Several studies have shown that the wrapper model can identify better attribute sets that theslter model (=-=Aha & Bankert 1995-=-; John, Kohavi, & P eger 1994). However, induction is performed at every search state visited. This can result in an exponential rise in the time taken for an exhaustive search to locate the optimal s... |

140 |
Intelligent information-sharing systems.
- Malone, Grant, et al.
- 1987
(Show Context)
Citation Context ...ing problems (Lewis, 1992). The organisation of text within most documents is generally either unstructured (i.e. single paragraphs of free text) or semi-structured, such as electronic mail messages (=-=Malone et al., 1987-=-). Documents may be organised into sections, subsections, paragraphs, etc., but any given document may contain an arbitrary number of each. In addition, there may be other information available, such ... |

134 |
The distance-weighted k-nearest-neighbor rule
- Dudani
- 1976
(Show Context)
Citation Context ... of the largest group. Other voting strategies have been proposed, including the Qualied k-NN Rule (Devijer & Kittler 1982), the Variable Threshold Rule (Tomek 1976), and the Distance Weighted Rule (=-=Dudani 1976-=-). The value of k can signicantly aect the performance of the k-nearest neighbour algorithm. If the Majority Rule is used when k ! n (where n is the number of instances in the instance space), and t... |

123 | A Personal Learning Apprentice
- Dent, Boticario, et al.
- 1992
(Show Context)
Citation Context ...n agents to assist the user. A variety of learning mechanisms have been employed within intelligent information agents, including genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (=-=Dent et al., 1992-=-; Payne, 1994; Bayer, 1995; Cohen, 1996a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazza... |

118 | Generalizing from case studies: A case study.
- Aha
- 1992
(Show Context)
Citation Context ...orter 1987), meeting predictions (Kozierok & Maes 1993) and Internet informationsltering (Payne, Edwards, & Green 1997). However, this learning paradigm has also been the subject of strong criticism (=-=Aha 1992-=-b). Breiman, Freidman, Olshen, & Stone (1984, p.17) highlightsve objections to such approaches for concept learning: 1. They are expensive due to their large storage requirements; 2. They are sensitiv... |

106 |
A Learning Interface Agent for Scheduling Meetings,"
- Kozierok, Maes
- 1993
(Show Context)
Citation Context ...iques (Lang, 1995), clustering tech1.1 Introduction 18 niques (Green & Edwards, 1996), relational learning algorithms (Cohen, 1995) and nearest neighbour (or instance based) techniques (Metral, 1993; =-=Kozierok & Maes, 1993-=-; Green & Edwards, 1996; Pazzani et al., 1996; Payne et al., 1997; Boone, 1998). There are a number of dierences between the task of learning from documents and other learning problems (Lewis, 1992).... |

101 | Unifying instance-based and rulebased induction.
- Domingos
- 1996
(Show Context)
Citation Context ...gorithms, as this eliminates the need to determine the optimal value of k for each individual data set (Shepherd, 1983; Weiss & Kapouleas, 1989; Aha, 1992a; Michie et al., 1994; Rachlin et al., 1994; =-=Domingos, 1996-=-; Kubat et al., 1997). 2.2 The Nearest Neighbour Learning Paradigm 35 2.2.3 Distance Metrics The location of instances within the instance space is dened by the representation of the instances and th... |

96 | Concept features in re:Agent, an intelligent email agent.
- Boone
- 1998
(Show Context)
Citation Context ...13 Comparing the learning curves of IBPL1, IBPL2 and PIBPL on the USENET news data set. . . . . . . . . . . . . . . . . . . . . . . . 99 5.14 Generating features from concept centroids with Re:Agent (=-=Boone, 1998-=-). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.1 The various search states in a four dimensional ASV space (after Langley 1994). . . . . . . . . . . . . . . . . . . . . . .... |

85 |
Fast Eective Rule Induction. In:
- Cohen
- 1995
(Show Context)
Citation Context ...aive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 niques (Green & Edwards, 1996), relational learning algorithms (=-=Cohen, 1995-=-) and nearest neighbour (or instance based) techniques (Metral, 1993; Kozierok & Maes, 1993; Green & Edwards, 1996; Pazzani et al., 1996; Payne et al., 1997; Boone, 1998). There are a number of diere... |

84 |
A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations [Ph.D.
- Aha
- 1990
(Show Context)
Citation Context ...t within the space to determine error rates; and editing the space to reduce the number of instances required by the algorithm. Nearest neighbour learning algorithms are also known as Instance-Based (=-=Aha 1990-=-) or Exemplar-Based (Bareiss & Porter 1987) learning algorithms, and fall with the category of lazy-learning algorithms (Mitchell 1997), as they defer the induction or generalisation process until cla... |

81 | Trading MIPS and memory for knowledge engineering. - Creecy, Masand, et al. - 1992 |

80 |
Learning representative exemplars as concepts: An initial case study.
- Kibler, Aha
- 1987
(Show Context)
Citation Context ...Learning Paradigm 30 of real-world domains, such as in the pronunciation of English words (Stanll & Waltz 1986), recognition of DNA & RNA sequences (Cost & Salzberg 1993), thyroid disease diagnosis (=-=Kibler & Aha 1987-=-), speech recognition (Bradshaw 1987), clinical audiology diagnosis (Bareiss & Porter 1987), meeting predictions (Kozierok & Maes 1993) and Internet informationsltering (Payne, Edwards, & Green 1997).... |

79 | Relational Instance-Base Learning.
- Emde, Wettschereck
- 1996
(Show Context)
Citation Context ...eory to create compact representations of concepts. There are also advantages to the nearest neighbour approach. Such methods can learn graded concepts (Barsalou 1985; Aha 1989), relational concepts (=-=Emde & Wettschereck 1996-=-), and provide a basis for exploring prototypical learning (Zhang 1992; Biberman 1995; Datta & Kibler 1995; Datta & Kibler 1997). There have also been a number of theoretical studies on the nearest ne... |

71 | Feature subset selection using the wrapper method: Overfitting and dynamic search space topology - Kohavi, Sommerfield - 1995 |

66 |
Protos: An exemplar-based learning apprentice.
- Bareiss, Porter
- 1987
(Show Context)
Citation Context ...ne error rates; and editing the space to reduce the number of instances required by the algorithm. Nearest neighbour learning algorithms are also known as Instance-Based (Aha 1990) or Exemplar-Based (=-=Bareiss & Porter 1987-=-) learning algorithms, and fall with the category of lazy-learning algorithms (Mitchell 1997), as they defer the induction or generalisation process until classications are performed. The nearest nei... |

64 | An adaptive agent for automated web browsing. - Balabanovic, Shoham, et al. - 1995 |

62 | Interface Agents that Learn: An Investigation of Learning Issues in a Mail Agent Interface.
- Payne, Edwards
- 1997
(Show Context)
Citation Context ...ents such as email messages or Web pages can be complex, and it can be diÆcult to map the data within the document into a representation suitable for presentation to a learning algorithm (Payne 1994; =-=Payne & Edwards 1997-=-). This complexity can be illustrated by an example. Consider an application that maintains a database of bibliographic entries. Each entry refers to a published paper, and includes the abstract and t... |

58 | Low-rank orthogonal decompositions for information retrieval applications
- Berry, Fierro
- 1996
(Show Context)
Citation Context ... . 3.5 IR/Text Categorisation Approaches 60 is normally used to perform the decomposition, although a recent study has shown that other orthogonal decomposition approaches, such as ULV decomposition (=-=Berry & Fierro 1996-=-), can be used to replace SVD for this task. Studies have demonstrated that a signicant reduction in dimensionality can be achieved when used within IR systems; for example from 5000-7000 dimensions ... |

54 | Automatic Parameter Selection by Minimizing Estimated Error,
- Kohavi, John
- 1995
(Show Context)
Citation Context ...ank of the approximated subspace. At present, these parameters are determined empirically. Some existing learning algorithms utilise n-fold cross validation techniques to identify suitable estimates (=-=Kohavi & John 1995-=-), or utilise a stopping criteria to identify partition points (Fayyad & Irani 1993). An automated method for reliably estimating values for the threshold and the rank of these algorithms would be ext... |

52 | Hybrid learning using genetic algorithms and decision trees for pattern - Bala, Huang, et al. - 1995 |

49 |
Incremental, instance-based learning of independent and graded concept descriptions. In:
- Aha
- 1989
(Show Context)
Citation Context ...ed Generalised Exemplar (NGE) theory to create compact representations of concepts. There are also advantages to the nearest neighbour approach. Such methods can learn graded concepts (Barsalou 1985; =-=Aha 1989-=-), relational concepts (Emde & Wettschereck 1996), and provide a basis for exploring prototypical learning (Zhang 1992; Biberman 1995; Datta & Kibler 1995; Datta & Kibler 1997). There have also been a... |

49 | Feature subset selection as search with probabilistic estimates, in "AAAI Fall Symposium on Relevance",
- Kohavi
- 1994
(Show Context)
Citation Context ... accuracy. Several other studies have investigated the utility of the wrapper method for attribute selection when learning with a rule induction algorithm (Caruana & Freitag, 1994; John et al., 1994; =-=Kohavi, 1994-=-) or a Nearest Neighbour algorithm (Salzberg, 1992; Aha & Bankert, 1994; Moore & Lee, 1994; Skalak, 1994) (see Table 3.2). The rule induction studies utilised variants of the Forward Selection and Bac... |

46 | Average-case analysis of a nearest neighbor algorithm.
- Langley, Iba
- 1993
(Show Context)
Citation Context .... . . 34 2.4 Comparing symbolic values with the value dierence metric. . . . 39 2.5 The performance of the NN algorithm as the number of relevant (left) and irrelevant (right) dimensions increases. (=-=Langley & Iba, 1993-=-, reproduced with permission) . . . . . . . . . . . . . . . . . 48 3.1 The Filter Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 The Wrapper Model. . . . . . . . . . . . . . . . .... |

43 | Oblivious decision trees and abstract cases.
- Langley, Sage
- 1994
(Show Context)
Citation Context ...rch for sub-optimal solutions. These methods have been investigated using 6.3 Wrapper Method Framework 110 a variety of other learning algorithms (Caruana & Freitag 1994; John, Kohavi, & P eger 1994; =-=Langley & Sage 1994-=-b; Moore & Lee 1994; Singh & Provan 1995). The two methods dier in their starting conditions; the Forward Selection algorithm starts with no members in its attribute subset, and incrementally adds ne... |

39 | The utility of feature weighting in nearest-neighbor algorithms. - Kohavi, Langley, et al. - 1997 |

38 | Towards a better understanding of memory-based reasoning systems
- Kasif, Salzberg, et al.
(Show Context)
Citation Context ...er machine learning algorithms, as this eliminates the need to determine the optimal value of k for each individual data set (Shepherd, 1983; Weiss & Kapouleas, 1989; Aha, 1992a; Michie et al., 1994; =-=Rachlin et al., 1994-=-; Domingos, 1996; Kubat et al., 1997). 2.2 The Nearest Neighbour Learning Paradigm 35 2.2.3 Distance Metrics The location of instances within the instance space is dened by the representation of the ... |

33 | Relevant feedback in information retrieval - unknown authors - 1971 |

31 |
Nearest neighbor pattern classi
- Cover, Hart
- 1967
(Show Context)
Citation Context ... q ; x i ) 0 (2.1) The nearest neighbour learning paradigm is based on an assumption that relates the distribution of values in the instance space to the distribution of values in the output space (=-=Cover & Hart 1967-=-; Michie, Spiegelhalter, & Taylor 1994). It states 2.2 The Nearest Neighbour Learning Paradigm 32 that if two instances, x i and x q , are located close to each other within the instance space (accord... |

27 | Learning prototypical concept descriptions.
- Datta, Kibler
- 1995
(Show Context)
Citation Context .... Such methods can learn graded concepts (Barsalou 1985; Aha 1989), relational concepts (Emde & Wettschereck 1996), and provide a basis for exploring prototypical learning (Zhang 1992; Biberman 1995; =-=Datta & Kibler 1995-=-; Datta & Kibler 1997). There have also been a number of theoretical studies on the nearest neighbour algorithm, including PAC Analysis (Albert & Aha 1991), Average Case Analysis (Langley & 2.2 The Ne... |

25 |
A context similarity measure.
- Biberman
- 1994
(Show Context)
Citation Context ...on & McGill 1983), the Quadratic metric (Fukunaga & Flick 1984), the Minkowskian metric (Salzberg 1991b), the Modied Value Dierence metric (Cost & Salzberg 1993), and the Context Similarity metric (=-=Biberman 1994-=-). The choice of distance metric is signicant, as it can greatly aect the learning bias of the algorithm. Ideally, the distance metric should minimise the distance between two similarly classied in... |

25 | Growing simpler decision trees to facilitate knowledge discovery - Cherkauer, Shavlik - 1996 |

24 | An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications. - Piatetsky-Shapiro, Brachman, et al. - 1996 |

23 | Experience with learning agents which Manage Internet-Based Information
- Edwards, Bayer, et al.
- 1996
(Show Context)
Citation Context ...nts Intelligent information agents such as emailslters (Metral, 1993; Payne, 1994; Cohen, 1996a; Boone, 1998) and Web agents (Armstrong et al., 1995; Balabanovic et al., 1995; Green & Edwards, 1996; =-=Edwards et al., 1996-=-; Pazzani et al., 1996) utilise a learning algorithm to determine what action should be taken on new or incoming documents. An email agent may attempt to categorise an incoming email and save it in th... |

23 | Experience with rule induction and k-nearest neighbor methods for interface agents that learn
- Payne, Edwards, et al.
- 1997
(Show Context)
Citation Context ...ms have been employed within intelligent information agents, including genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; Payne, 1994; Bayer, 1995; Cohen, 1996a; =-=Payne et al., 1997-=-), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (Lan... |

16 |
Learning about speech sounds: the NEXUS project
- Bradshaw
- 1987
(Show Context)
Citation Context ...ns, such as in the pronunciation of English words (Stanll & Waltz 1986), recognition of DNA & RNA sequences (Cost & Salzberg 1993), thyroid disease diagnosis (Kibler & Aha 1987), speech recognition (=-=Bradshaw 1987-=-), clinical audiology diagnosis (Bareiss & Porter 1987), meeting predictions (Kozierok & Maes 1993) and Internet informationsltering (Payne, Edwards, & Green 1997). However, this learning paradigm has... |

15 |
An evolutionary connectionist approach to personal information filtering
- McElligott, Sorensen
- 1994
(Show Context)
Citation Context ...nt information agents, including genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; Payne, 1994; Bayer, 1995; Cohen, 1996a; Payne et al., 1997), neural networks (=-=McElligott & Sorensen, 1994-=-; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 ni... |

15 | Learning Email Filtering Rules with Magi, A Mail Agent Interface
- Payne
- 1994
(Show Context)
Citation Context ...the user. A variety of learning mechanisms have been employed within intelligent information agents, including genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; =-=Payne, 1994-=-; Bayer, 1995; Cohen, 1996a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 19... |

14 |
Pattern Recognition: A Statistical Approach, Prentice-Hall,
- Devijer, Kittler
- 1982
(Show Context)
Citation Context ...he value of their 2.2 The Nearest Neighbour Learning Paradigm 33 class label, and returns the class of the largest group. Other voting strategies have been proposed, including the Qualied k-NN Rule (=-=Devijer & Kittler 1982-=-), the Variable Threshold Rule (Tomek 1976), and the Distance Weighted Rule (Dudani 1976). The value of k can signicantly aect the performance of the k-nearest neighbour algorithm. If the Majority R... |

14 | Implicit Feature Selection with the Value Difference Metric.
- Payne, Edwards
- 1998
(Show Context)
Citation Context ...d to describe the space can vary depending on the number of irrelevant or redundant attributes present in the data set (Almuallim & Dietterich 1991; John, Kohavi, & P eger 1994; Payne & Edwards 1996; =-=Payne & Edwards 1998-=-a), and whether any of the attributes can be combined to generate higher-order dimensions (Michie, Spiegelhalter, & Taylor 1994; Bishop 1995; Wu, Berry, Shivakumar, & McLarty 1995; Payne & Edwards 199... |

13 |
A Learning Agent for Resource Discovery on the World Wide Web
- Bayer
- 1995
(Show Context)
Citation Context ...ariety of learning mechanisms have been employed within intelligent information agents, including genetic algorithms (Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; Payne, 1994; =-=Bayer, 1995-=-; Cohen, 1996a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum ... |

13 | Using machine learning to enhance software tools for internet information management
- Green, Edwards
- 1996
(Show Context)
Citation Context ...hell et al., 1994; Pannu & Sycara, 1996; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 niques (=-=Green & Edwards, 1996-=-), relational learning algorithms (Cohen, 1995) and nearest neighbour (or instance based) techniques (Metral, 1993; Kozierok & Maes, 1993; Green & Edwards, 1996; Pazzani et al., 1996; Payne et al., 19... |

12 | Discovering patterns in EEGsignals: comparative study of a few methods - Kubat, Flotzinger, et al. - 1993 |

11 | Skousen's analogical modeling algorithm: A comparison with lazy learning - DAELEMANS, GILLIS, et al. - 1994 |

11 |
Induction of selective Bayesian classi
- Langley, Sage
- 1994
(Show Context)
Citation Context ...rch for sub-optimal solutions. These methods have been investigated using 6.3 Wrapper Method Framework 110 a variety of other learning algorithms (Caruana & Freitag 1994; John, Kohavi, & P eger 1994; =-=Langley & Sage 1994-=-b; Moore & Lee 1994; Singh & Provan 1995). The two methods dier in their starting conditions; the Forward Selection algorithm starts with no members in its attribute subset, and incrementally adds ne... |

11 | A Learning Personal Agent for Text Filtering and Notification
- Pannu, K
- 1996
(Show Context)
Citation Context ...(Sheth, 1994), symbolic rule induction algorithms (Dent et al., 1992; Payne, 1994; Bayer, 1995; Cohen, 1996a; Payne et al., 1997), neural networks (McElligott & Sorensen, 1994; Mitchell et al., 1994; =-=Pannu & Sycara, 1996-=-; Boone, 1998), naive Bayesian techniques (Pazzani et al., 1996), Minimum Description Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 niques (Green & Edwards, 1996), relational lear... |

10 |
Design of a generic learning interface agent. B.Sc
- Metral
- 1993
(Show Context)
Citation Context ...n Length techniques (Lang, 1995), clustering tech1.1 Introduction 18 niques (Green & Edwards, 1996), relational learning algorithms (Cohen, 1995) and nearest neighbour (or instance based) techniques (=-=Metral, 1993-=-; Kozierok & Maes, 1993; Green & Edwards, 1996; Pazzani et al., 1996; Payne et al., 1997; Boone, 1998). There are a number of dierences between the task of learning from documents and other learning ... |

9 |
Feature selection for case{based classi of cloud types: An empirical comparison
- Aha, Bankert
- 1994
(Show Context)
Citation Context ...f the wrapper method for attribute selection when learning with a rule induction algorithm (Caruana & Freitag, 1994; John et al., 1994; Kohavi, 1994) or a Nearest Neighbour algorithm (Salzberg, 1992; =-=Aha & Bankert, 1994-=-; Moore & Lee, 1994; Skalak, 1994) (see Table 3.2). The rule induction studies utilised variants of the Forward Selection and Backward Elimination searches. In general, the wrapper method was observed... |

8 |
The Role of Prototypicality in Exemplar-Based Learning
- Biberman
- 1995
(Show Context)
Citation Context ...bour learning algorithms to improve their accuracy when learning to categorise documents. For example, tdf 1 weighting strategies have already been used to construct prototype instances (Zhang 1992; =-=Biberman 1995-=-; Datta & Kibler 1997), which in turn are used to train nearest neighbour learning algorithms within some information agent systems (Lang 1995; Cohen 1996b). 1 Term Frequency/Inverse Document Frequenc... |

7 |
Flick: An Optimal Global Nearest Neighbour Metric
- Fukunaga, E
- 1984
(Show Context)
Citation Context ...he Chi-square metric (Anderberg 1973), the Mahalanobis metric (Everitt 1974), the Nonlinear metric (Devijer & Kittler 1982), the Cosine Similarity metric (Salton & McGill 1983), the Quadratic metric (=-=Fukunaga & Flick 1984-=-), the Minkowskian metric (Salzberg 1991b), the Modied Value Dierence metric (Cost & Salzberg 1993), and the Context Similarity metric (Biberman 1994). The choice of distance metric is signicant, a... |

7 |
Theoretical Analysis of the Nearest Neighbor Classi in Noisy Domains
- Okamoto, Yugami
- 1996
(Show Context)
Citation Context ...luding PAC Analysis (Albert & Aha 1991), Average Case Analysis (Langley & 2.2 The Nearest Neighbour Learning Paradigm 31 Iba 1993), and various studies on the eects of noisy domains (Dasarathy 1991; =-=Okamoto & Yugami 1996-=-). 2.2.1 The Basic Learning Paradigm The basic nearest neighbour learning algorithm is trained by simply storing the training instances until classication time. When a query (i.e. unclassied) instan... |

6 |
An algorithm for suÆx stripping
- Porter
- 1980
(Show Context)
Citation Context ...n be used to identify the individualselds in the database record. Fields containing text can be parsed to identify terms, remove punctuation, stem the terms (i.e. remove suÆxes such as `ing' or `ed' (=-=Porter, 1980-=-)), and sort the terms in dictionary order. Figure 1.1 shows an illustrative entry that has been retrieved from a database and processed in this way. This data now has to be mapped to a training examp... |

6 | Performing Eective Feature Selection by Investigating the Deep Structure of the Data - Richeldi, Lanzi - 1996 |

5 | Learning Mechanisms for Information Filtering Agents
- Payne, Edwards
- 1995
(Show Context)
Citation Context ...sets of values. The PIBPL algorithm explores the utility of the VDM weight (Equation 2.8) as a means of selecting or rejecting terms from within the sets. The weight re ects the typicality of a term (=-=Payne & Edwards 1995-=-), i.e. how likely the term is to occur in one class but not in others. Terms which appear with equal frequency in all classes have a low weight, whereas those that mostly appear in a single class hav... |

4 | Learning symbolic prototypes
- Datta, Kibler
- 1997
(Show Context)
Citation Context ...lgorithms to improve their accuracy when learning to categorise documents. For example, tdf 1 weighting strategies have already been used to construct prototype instances (Zhang 1992; Biberman 1995; =-=Datta & Kibler 1997-=-), which in turn are used to train nearest neighbour learning algorithms within some information agent systems (Lang 1995; Cohen 1996b). 1 Term Frequency/Inverse Document Frequency (Rocchio Jr 1971; S... |

4 |
A Survey of Feature Selection Methods. Unpublished Draft
- Payne, Edwards
- 1996
(Show Context)
Citation Context ...mber of dimensions used to describe the space can vary depending on the number of irrelevant or redundant attributes present in the data set (Almuallim & Dietterich 1991; John, Kohavi, & P eger 1994; =-=Payne & Edwards 1996-=-; Payne & Edwards 1998a), and whether any of the attributes can be combined to generate higher-order dimensions (Michie, Spiegelhalter, & Taylor 1994; Bishop 1995; Wu, Berry, Shivakumar, & McLarty 199... |

3 |
Adaptive Control Processes: A Guided Tour .New
- Bellman
- 1961
(Show Context)
Citation Context ...ed as a function of the number of attributes (dimensions) used to describe each instance and the number of instances in the training set. This is generally referred to as the curse of dimensionality (=-=Bellman 1961-=-). The following subsections discuss the issues listed above with respect to the nearest neighbour learning paradigm, and discuss ways in which their impact can be minimised. 2.3.1 The Problem of Redu... |

3 | selection: a useful preprocessing step - Moulinier |

2 |
Feature Selection and Classi - A Probabilistic Wrapper Approach
- Liu, Setiono
- 1996
(Show Context)
Citation Context ...best = eval; 14s15 done 16 return(selectset): Figure 6.7: The Monte Carlo Algorithm. The Monte Carlo algorithm utilises a stochastic, or random sample approach to search the state space (Skalak 1994; =-=Liu & Setiono 1996-=-a; Liu & Setiono 1996b). Unlike the search methods described above, this method does not traverse the search space in search of sub-optimal states, but rather evaluates asxed number of random states. ... |

2 | A Framework for Comparing Text Categorisation Approaches - Moulinier - 1996 |

1 |
Cluster Analysis for Applications. New York:Academic Press. BIBLIOGRAPHY 191
- Anderberg
- 1973
(Show Context)
Citation Context ... Positivity ii) D(i; j) = D(j; i) Symmetry iii) D(i; j) +D(j; k) D(i; k) Triangle Inequality Several distance metrics have been proposed (Wilson & Martinez 1997), and include the Chi-square metric (=-=Anderberg 1973-=-), the Mahalanobis metric (Everitt 1974), the Nonlinear metric (Devijer & Kittler 1982), the Cosine Similarity metric (Salton & McGill 1983), the Quadratic metric (Fukunaga & Flick 1984), the Minkowsk... |

1 | Dimensionality Reduction through Correspondence Analysis. Unpublished Draft
- Payne, Edwards
- 1998
(Show Context)
Citation Context ...d to describe the space can vary depending on the number of irrelevant or redundant attributes present in the data set (Almuallim & Dietterich 1991; John, Kohavi, & P eger 1994; Payne & Edwards 1996; =-=Payne & Edwards 1998-=-a), and whether any of the attributes can be combined to generate higher-order dimensions (Michie, Spiegelhalter, & Taylor 1994; Bishop 1995; Wu, Berry, Shivakumar, & McLarty 1995; Payne & Edwards 199... |

1 |
RALPH: Rapidly Adapting Lateral Position Holder
- Pomerleau
- 1995
(Show Context)
Citation Context ...l, and illustrated by means of an example. 26 2.1 Terminology 27 2.1 Terminology A machine learning algorithm is a program that can learn to perform a given task, such as driving autonomous vehicles (=-=Pomerleau 1995-=-), detecting fraud (Piatetsky-Shapiro, Brachman, Khabaza, Kloesgen, & Simoudis 1996), playing games (Mitchell 1997), or classifying unclassied data. The algorithm is generally trained by presenting i... |