## Geometry in Learning (1997)

Venue: | In Geometry at Work |

Citations: | 19 - 6 self |

### BibTeX

@INPROCEEDINGS{Bennett97geometryin,

author = {Kristin P. Bennett and Erin J. Bredensteiner},

title = {Geometry in Learning},

booktitle = {In Geometry at Work},

year = {1997}

}

### Years of Citing Articles

### OpenURL

### Abstract

One of the fundamental problems in learning is identifying members of two different classes. For example, to diagnose cancer, one must learn to discriminate between benign and malignant tumors. Through examination of tumors with previously determined diagnosis, one learns some function for distinguishing the benign and malignant tumors. Then the acquired knowledge is used to diagnose new tumors. The perceptron is a simple biologically inspired model for this two-class learning problem. The perceptron is trained or constructed using examples from the two classes. Then the perceptron is used to classify new examples. We describe geometrically what a perceptron is capable of learning. Using duality, we develop a framework for investigating different methods of training a perceptron. Depending on how we define the "best" perceptron, different minimization problems are developed for training the perceptron. The effectiveness of these methods is evaluated empirically on four practical applic...

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...f = fl + 1 and fi = fl \Gamma 1, then Problem (7) becomes min w;fl 1 2 kwk 2 s:t: Aw \Gamma (fl + 1)es0 \GammaBw + (fl \Gamma 1)es0 (12) Problem (12) is exactly the "Optimal Plane" proposed =-=by Vapnik [33]-=-. By using optimality conditions it can be shown that Problem (12) and Problem (6) are equivalent on separable problems. The proof is provided in the Appendix. The above are only a few of the many exi... |

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...nd the MotzkinSchoenberg algorithm for finding the solution of linear inequalities [23]. A perceptron is also known as a linear discriminant. So any linear discriminant algorithms such as in the book =-=[7]-=- may be used. A single linear program can be used to construct a separating plane in polynomial time [14, 11]. Edelsbrunner proposed an algorithm with O(log m + log k) complexity [8]. Statistical meth... |

2649 |
Introduction to Statistical Pattern Recognition”, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ...t a separating plane in polynomial time [14, 11]. Edelsbrunner proposed an algorithm with O(log m + log k) complexity [8]. Statistical methods such as Fisher's Linear Discriminant may also be applied =-=[9]-=-. 1 When applied to inseparable problems, the solution that misclassified the least number of points is selected. 8 The problems formulated in this section are only for the linearly separable case. Ca... |

2171 | Support vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...aximum classification error which corresponds to the distance between the relaxed supporting planes. Since this problem is a very minor variation of the Generalized Optimal Plane of Cortes and Vapnik =-=[5, 33]-=-, we refer to it as GOP. The problem is a nonlinear perturbation of the RLP. Whensis close to 0 the RLP objective is emphasized. Using thereoms on nonlinear perturbation of linear programming in [18],... |

1774 |
Introduction to the Theory of Neural Computation
- Hertz, Palmer
- 1991
(Show Context)
Citation Context ...:n (jx i j); is denoted by kxk 1 . 2 A Simple Learning Model The perceptron model can be motivated biologically. The brain consists of an interconnecting network of about 10 11 neurons or nerve cells =-=[13]. Each neu-=-ron receives stimuli from other cells. A stimulus may be excitatory or inhibitory. If the combined stimuli exceeds some threshold then the cell "fires". In the perceptron, the stimuli are mo... |

741 |
Aha, “UCI repository of machine learning data bases,” http: //www.ics.uci.edu/~mlearn/MLRepository.html
- Murphy, W
- 1992
(Show Context)
Citation Context ...publicly available via the World Wide Web. All of the above datasets are available via anonymous file transfer protocol (ftp) from the UCI Repository of Machine Learning Databases and Domain Theories =-=[24]-=- at ftp://ftp.ics.uci.edu/pub/machine-learning-databases. The following section contains computational results for the MSM, RLP, RLP-P, and GOP methods on these datasets. 7 Computational Results This ... |

721 |
Cross-Validatory Choices and Assessment of Statistical Prediction (with Discussion
- Stone
- 1974
(Show Context)
Citation Context ... The best algorithm is the one that generalizes best, i.e., the algorithm most accurate on future points. Since the future points are unknown, we use an experimental technique called cross validation =-=[31]-=- to estimate the accuracy on future points. In 10-fold cross validation a dataset is divided randomly into ten disjoint parts of equal size. The method is then trained on nine of these parts. These ni... |

515 |
Perceptrons: An Introduction to Computational Geometry
- Minsky, Papert
- 1969
(Show Context)
Citation Context ...re points will be while the function is being constructed. Many types of classification functions are possible, but in this paper we will restrict ourselves to a linear function called the perceptron =-=[27, 21]-=-. While the perceptron model is quite simple, it works very well on many practical problems including the Wisconsin Breast Cancer Diagnosis problem described above. We present computational results fo... |

250 | A system for induction of oblique decision trees
- Murthy, Kasif, et al.
- 1994
(Show Context)
Citation Context ...largely all of one class the algorithm stops. If not, the process is repeated in each of the half spaces until the desired accuracy is achieved. Perceptrons are one such way to construct the decision =-=[26, 4, 1, 17]-=-. The final method is to construct nonlinear mappings of the attributes and then construct a linear discriminant in the augmented attribute space. The resulting problem is linear in its parameters but... |

210 | Robust linear programming discrimination of two linearly inseparable sets
- Bennett, Mangasarian
- 1992
(Show Context)
Citation Context ...g variant of the Robust Linear Programming (RLP) approach of 9 x 0 w = ff x 0 w = (ff + fi)=2 x 0 w = fi A B Figure 4: Minimizing the maximum error in the inseparable case. 10 Bennett and Mangasarian =-=[2]-=-. min w;ff;fi;y;z 1 m e 0 y + 1 k e 0 z s:t: Aw \Gamma ffe + ys0 \GammaBw + fie + zs0 ff \Gamma fi = 2 ys0 zs0 (13) The nonnegative slack variable y relaxes the constraints that all points in A be on ... |

193 |
Nonlinear programming
- Mangasarian
- 1994
(Show Context)
Citation Context ...? = ? ; has no solution: (5) At first glance, these two characterizations of what a perceptron can learn, equations (2) and (5), look unrelated. But in fact, using Gordan's Theorem of the Alternative =-=[16]-=-, it can be shown that equations (2) have a solution if and only if equations (5) have no solution (see Theorem A.1 in Appendix). The two formulations are in different spaces but they solve the same u... |

180 |
Analysis of hidden units in a layered network trained to classify sonar targets
- Gorman, Sejnowski
- 1988
(Show Context)
Citation Context ...ar signal is transmitted at various angles with rises in frequency. A similar procedure is performed to obtain the rock attributes. The publicly available Sonar dataset represents 208 mines and rocks =-=[12]-=-. Sixty real-valued attributes between 0.0 and 1.0 are collected for each mine or rock. The value of the attribute represents the amount of energy within a particular frequency band, integrated over a... |

132 | Multisurface method of pattern separation for medical diagnosis applied to breast cytology - Wolberg, Mangasarian - 1990 |

124 | H.: Breast cancer diagnosis and prognosis via linear programming
- Mangasarian, Street, et al.
- 1995
(Show Context)
Citation Context ...urements such as clump thickness, uniformity of cell size, and uniformity of cell shape are collected. A mathematical programming approach incorporating the RLP has been employed in clinical practice =-=[20, 34]-=-. This data was collected before the computer imaging techniques were used for measuring attributes as discussed in the introduction. They report 100% correctness on 131 new cases that have been diagn... |

119 | Multivariate decision trees
- Brodley, Utgoff
- 1995
(Show Context)
Citation Context ...ing methods for training a perceptron. We provide a few pointers to other approaches. This is not a comprehensive list. A notable class of algorithms is comprised of Rosenblatt's Perceptron algorithm =-=[27, 10, 4]-=- and the MotzkinSchoenberg algorithm for finding the solution of linear inequalities [23]. A perceptron is also known as a linear discriminant. So any linear discriminant algorithms such as in the boo... |

76 |
Minos 5.4 user's guide
- Murtagh, Saunders
- 1993
(Show Context)
Citation Context ...n the two parallel planes: x 0sw =sff+sfi 2 . Both Problems (6) and (7) are quadratic programming problems with linear constraints. They can be solved using standard mathematical programming packages =-=[25, 22]-=-. The choice of which problem to use in practice depends on the characteristics of the underlying problem. In Problem (6), the constraints are very simple, and the number of variables depends only on ... |

71 |
Concept Acquisition Through Representational, Adjustment (Technical Report 87-19
- Schlimmer
- 1987
(Show Context)
Citation Context ...es. The voting patterns of congressmen can be used to determine party affiliation. A specific example of this application is the 1984 United States Congressional Voting Records Database (House Votes) =-=[30]-=-. This is a publicly available dataset. Each instance of the dataset represents a U.S. House of Representatives Congressman. Information on 435 congressmen is given. Each congressman is classified as ... |

69 | Pattern recognition via linear programming: Theory and application to medical diagnosis. Large-scale numerical optimization
- Mangasarian, Setiono, et al.
- 1990
(Show Context)
Citation Context ...1 This approach, called the Multisurface Method of Pattern Recognition (MSM) [15], was used in the initial implementation of the automated breast cancer diagnosis system described in the introduction =-=[19, 34]-=-. The second general method is to fix ff \Gamma fi ? 0. If we set ff \Gamma fi = 2 by defining ff = fl + 1 and fi = fl \Gamma 1, then Problem (7) becomes min w;fl 1 2 kwk 2 s:t: Aw \Gamma (fl + 1)es0 ... |

55 |
Linear and Nonlinear Separation of Patterns by Linear Programming
- Mangasarian
- 1965
(Show Context)
Citation Context ... also known as a linear discriminant. So any linear discriminant algorithms such as in the book [7] may be used. A single linear program can be used to construct a separating plane in polynomial time =-=[14, 11]-=-. Edelsbrunner proposed an algorithm with O(log m + log k) complexity [8]. Statistical methods such as Fisher's Linear Discriminant may also be applied [9]. 1 When applied to inseparable problems, the... |

54 |
Nonlinear perturbation of linear programs
- Mangasarian, Meyer
- 1979
(Show Context)
Citation Context ..., 33], we refer to it as GOP. The problem is a nonlinear perturbation of the RLP. Whensis close to 0 the RLP objective is emphasized. Using thereoms on nonlinear perturbation of linear programming in =-=[18]-=-, we know that there exists some positive numberssuch that ifs2 (0; ], there exists a solution of GOP (14) that also solves RLP (13). This means that forssufficiently small, GOP will choose one of the... |

52 |
The perceptron: A perceiving and recognizing automaton
- Rosenblatt
- 1957
(Show Context)
Citation Context ...re points will be while the function is being constructed. Many types of classification functions are possible, but in this paper we will restrict ourselves to a linear function called the perceptron =-=[27, 21]-=-. While the perceptron model is quite simple, it works very well on many practical problems including the Wisconsin Breast Cancer Diagnosis problem described above. We present computational results fo... |

50 |
International application of a new probability algorithm for the diagnosis of coronary artery disease
- Detrano, Janosi, et al.
- 1989
(Show Context)
Citation Context ...es with respect to the separating plane a diagnosis is made. The Cleveland Heart Disease Database (Heart) is a publicly available dataset that contains information on 297 patients using 13 attributes =-=[6]-=-. A second application, as discussed previously, is the diagnosis of breast cancer. To evaluate whether a tumor is benign or malignant, a fine needle aspiration is performed collecting a small amount ... |

46 |
Computing the extreme distances between two convex polygons
- Edelsbrunner
- 1985
(Show Context)
Citation Context ...ch as in the book [7] may be used. A single linear program can be used to construct a separating plane in polynomial time [14, 11]. Edelsbrunner proposed an algorithm with O(log m + log k) complexity =-=[8]-=-. Statistical methods such as Fisher's Linear Discriminant may also be applied [9]. 1 When applied to inseparable problems, the solution that misclassified the least number of points is selected. 8 Th... |

40 | Mathematical programming in neural networks
- Mangasarian
- 1993
(Show Context)
Citation Context ...ators. The most popular is the multilayer perceptron or neural network. A neural network is created from an interconnecting network of threshold/perceptron type units. We invite the reader to consult =-=[13, 17]-=- or one of the many books on this subject. Another approach is decision trees. In decision trees, a linear separation is constructed that divides the attribute space into two parts. If the parts conta... |

39 |
Decision tree construction via linear programming
- Bennett
- 1992
(Show Context)
Citation Context ...largely all of one class the algorithm stops. If not, the process is repeated in each of the half spaces until the desired accuracy is achieved. Perceptrons are one such way to construct the decision =-=[26, 4, 1, 17]-=-. The final method is to construct nonlinear mappings of the attributes and then construct a linear discriminant in the augmented attribute space. The resulting problem is linear in its parameters but... |

38 |
Multi-surface method of pattern separation
- Mangasarian
- 1968
(Show Context)
Citation Context ...aeswse w j = \Gamma1 (11) A solution of one the 2n problems with the greatest value of ff \Gamma fi is the optimal answer. 1 This approach, called the Multisurface Method of Pattern Recognition (MSM) =-=[15]-=-, was used in the initial implementation of the automated breast cancer diagnosis system described in the introduction [19, 34]. The second general method is to fix ff \Gamma fi ? 0. If we set ff \Gam... |

38 | Nuclear Feature Extraction for Breast Tumor Diagnosis
- Street, Wolberg, et al.
- 1905
(Show Context)
Citation Context ...eneralize" the knowledge you learned by applying it to diagnosing new tumors. At the University of Wisconsin-Madison, a computer system has been developed that has "learned" to diagnose=-= breast cancer [35, 32, 34]-=-. The prepared slide of the fine needle aspirate is inserted into a computer imaging system that measures and determines low-level features of the nuclei of the cells within the tumor. The tumor is th... |

37 |
Optimal linear discriminants
- Gallant
- 1986
(Show Context)
Citation Context ...ing methods for training a perceptron. We provide a few pointers to other approaches. This is not a comprehensive list. A notable class of algorithms is comprised of Rosenblatt's Perceptron algorithm =-=[27, 10, 4]-=- and the MotzkinSchoenberg algorithm for finding the solution of linear inequalities [23]. A perceptron is also known as a linear discriminant. So any linear discriminant algorithms such as in the boo... |

35 | Bilinear separation of two sets in n-space
- Bennett, Mangasarian
- 1992
(Show Context)
Citation Context ...well for the separable case. As noted above on problems with noisy data, the method performs poorly since it minimizes the maximum error [2]. RLP achieves very strong results for inseparable problems =-=[3, 35]-=-. For separable problems, however, any separating plane, if scaled appropriately, is optimal for RLP since e 0 y = e 0 z = 0. So RLP is not very well defined in the separable case. The next two approa... |

30 |
Improved linear programming models for discriminant analysis. Decision Sciences
- Glover
- 1990
(Show Context)
Citation Context ... also known as a linear discriminant. So any linear discriminant algorithms such as in the book [7] may be used. A single linear program can be used to construct a separating plane in polynomial time =-=[14, 11]-=-. Edelsbrunner proposed an algorithm with O(log m + log k) complexity [8]. Statistical methods such as Fisher's Linear Discriminant may also be applied [9]. 1 When applied to inseparable problems, the... |

24 |
An Algorithm to Generate Radial Basis Function (RBF)-like Nets for Classification Problems
- Roy, Govil, et al.
- 1995
(Show Context)
Citation Context ...e more attributes, x 2 1 ; x 2 2 ; and x 1 x 2 . The separating surface would now be x 1 w 1 + x 2 w 2 + x 2 1 w 3 + x 2 2 w 4 + x 1 x 2 w 4 = fl. Support Vector Networks [33] and Polynomial Networks =-=[29, 28]-=- are examples of this type 16 of approach. 9 Conclusions We have studied the problem of training a perceptron to classify points from two sets. We showed that a perceptron can only correctly classify ... |

23 |
A polynomial time algorithm for the construction and training of a class of multilayer perceptrons, Neural Networks 6
- Roy, Kim, et al.
- 1993
(Show Context)
Citation Context ...e more attributes, x 2 1 ; x 2 2 ; and x 1 x 2 . The separating surface would now be x 1 w 1 + x 2 w 2 + x 2 1 w 3 + x 2 2 w 4 + x 1 x 2 w 4 = fl. Support Vector Networks [33] and Polynomial Networks =-=[29, 28]-=- are examples of this type 16 of approach. 9 Conclusions We have studied the problem of training a perceptron to classify points from two sets. We showed that a perceptron can only correctly classify ... |

9 | Image analysis and machine learning applied to Breast cancer diagnosis and prognosis
- Wolberg, Street, et al.
- 1995
(Show Context)
Citation Context ...eneralize" the knowledge you learned by applying it to diagnosing new tumors. At the University of Wisconsin-Madison, a computer system has been developed that has "learned" to diagnose=-= breast cancer [35, 32, 34]-=-. The prepared slide of the fine needle aspirate is inserted into a computer imaging system that measures and determines low-level features of the nuclei of the cells within the tumor. The tumor is th... |

4 |
The relaxation method for linear inequalities
- unknown authors
- 1954
(Show Context)
Citation Context ...t a comprehensive list. A notable class of algorithms is comprised of Rosenblatt's Perceptron algorithm [27, 10, 4] and the MotzkinSchoenberg algorithm for finding the solution of linear inequalities =-=[23]-=-. A perceptron is also known as a linear discriminant. So any linear discriminant algorithms such as in the book [7] may be used. A single linear program can be used to construct a separating plane in... |