## The Learning Behavior of Single Neuron Classifiers on Linearly Separable or Nonseparable Input (0)

Venue: | In Proceedings of the 1999 International Joint Conference on Neural Networks |

Citations: | 10 - 6 self |

### BibTeX

@INPROCEEDINGS{Basu_thelearning,

author = {Mitra Basu and Tin Kam Ho},

title = {The Learning Behavior of Single Neuron Classifiers on Linearly Separable or Nonseparable Input},

booktitle = {In Proceedings of the 1999 International Joint Conference on Neural Networks},

year = {}

}

### OpenURL

### Abstract

Determining linear separability is an important way of understanding structures present in data. We explore the behavior of several classical descent procedures for determining linear separability and training linear classifiers in the presence of linearly nonseparable input. We compare the adaptive procedures to linear programming methods using many pairwise discrimination problems from a public database. We found that the adaptive procedures have serious implementational problems which make them less preferable than linear programming. 1

### Citations

2963 |
Uci repository of machine learning databases, university of california, irvine, ca . www.ics.uci.edu/mlearn/MLRepository.html
- Blake, Merz
- 1998
(Show Context)
Citation Context ...lthough it is understood that it does not converge for nonseparable input. The problems were discrimination between all pairs of classes in 14 data sets from the UC-Irvine Machine Learning Depository =-=[2]-=-. The data sets were so chosen that each set has at least 500 input vectors and no missing values in the features. The names of the dataset are as follows: abalone, car, german, kr-vs-kp, letter, lrs,... |

579 |
Adaptive switching circuits
- Widrow, E
- 1960
(Show Context)
Citation Context ...inimize J . Though, the difference in the details of the algorithms leads to drastically different results. The ff-LMS algorithm or Widrow-Hoff delta rule embodies the minimal disturbance principle 1 =-=[19]-=-. It is designed to handle both linear and nonlinear input. The criterion function is minimized with respect to the weight vector w using gradient descent method. The unknown vector b is chosen arbitr... |

379 |
AMPL: A Modeling Language for
- Fourer, Gay, et al.
- 1993
(Show Context)
Citation Context ...ing doubts on the practicality of these adaptive algorithms. The only reservation we have about this conclusion is that for linear programming we used the MINOS solver [13] through the AMPL interface =-=[4]-=-, which was highly optimized commercial code, whereas the adaptive procedures were run under the simplest implementation by ourselves in C, so affordable (elapsed) time may not mean the same thing for... |

252 |
principles of neurodynamics: perceptrons and the theory of brain mechanisms
- Rosenblatt
- 1962
(Show Context)
Citation Context ...up D will be covered in the next section. Group A: Error correction procedures Among the adaptive procedures, the fixed-increment perceptron training rule is the most well known. It can be shown that =-=[14]-=- if the input vectors are linearly separable this rule will produce a solution vector w in a finite number of steps. However, for input vectors that are not linearly separable, the perceptron algorith... |

217 | Robust linear programming discrimination of two linearly inseparable sets
- Bennett, Mangasarian
- 1992
(Show Context)
Citation Context ...perplane. With a properly defined objective function, a separating hyperplane can be obtained by solving a linear programming problem. Several alternative formulations have been proposed in the past (=-=[3]-=-, [5], [10], [15], [17]) employing different objective functions. An early survey of these methods is given in [7]. Here we mention a few representative formulations. In a very simple formulation desc... |

84 | Interior methods for constrained optimization
- Wright
- 1992
(Show Context)
Citation Context ...ery vertex formed by the intersections of the constraining hyperplanes before reaching an optimum. Empirical evidence shows that in practice this rarely happens. More recently, interior-point methods =-=[22]-=-, such as Karmarkar's, are shown to have better worst case time complexity. Still, the comparative advantages of such methods for an arbitrary problem remain unclear, partly because there has not been... |

69 |
Perceptrons (expanded edition
- Minsky, Papert
- 1988
(Show Context)
Citation Context ...es z. In other words, the algorithm will continue to make weight changes indefinitely. In the cases where the input vectors are integer-valued, the weight vectors cycle through a finite set of values =-=[12]-=-. An observation of such cycling is an indication of that the input is linearly nonseparable. Though, there are no known time bounds for this to become observable. The projection learning rule [1] (mo... |

34 |
Improve Linear Programming Models for Discriminant Analysis
- Glover
- 1990
(Show Context)
Citation Context ...ane. With a properly defined objective function, a separating hyperplane can be obtained by solving a linear programming problem. Several alternative formulations have been proposed in the past ([3], =-=[5]-=-, [10], [15], [17]) employing different objective functions. An early survey of these methods is given in [7]. Here we mention a few representative formulations. In a very simple formulation described... |

25 |
An algorithm for linear inequalities and its applications
- Ho, Kashyap
- 1965
(Show Context)
Citation Context ... vector it is possible that the resulting weight vector may not classify all vectors correctly even for a linearly separable problem. Group C: Constrained error minimization procedures Ho and Kashyap =-=[9]-=- modified the Widrow-Hoff procedure to obtain a weight vector w as well as a margin vector b. They imposed the restriction that the m-dimensional margin vector must be positive-valued, i.e., b ? 0 (b ... |

22 |
Pattern classifier design by linear programming
- Smith
- 1968
(Show Context)
Citation Context ...rly defined objective function, a separating hyperplane can be obtained by solving a linear programming problem. Several alternative formulations have been proposed in the past ([3], [5], [10], [15], =-=[17]-=-) employing different objective functions. An early survey of these methods is given in [7]. Here we mention a few representative formulations. In a very simple formulation described in [15], the obje... |

11 |
Classification of linearly non-separable patterns by linear threshold elements
- Roychowdhury, Siu, et al.
- 1995
(Show Context)
Citation Context ... properly defined objective function, a separating hyperplane can be obtained by solving a linear programming problem. Several alternative formulations have been proposed in the past ([3], [5], [10], =-=[15]-=-, [17]) employing different objective functions. An early survey of these methods is given in [7]. Here we mention a few representative formulations. In a very simple formulation described in [15], th... |

8 |
T.: Discrete Neural Computation
- Siu, Roychowdhury, et al.
- 1995
(Show Context)
Citation Context ...rs claim that it can identify and discard nonseparable samples to make the data linearly separable with increased convergence speed. Again we notice similarity with the heuristic approach proposed in =-=[16]-=-. However, the authors do not provide any theoretical basis to their claim. The algorithm is only tested on simple toy problems. We have no knowledge of any extensive testing on real world problems. 3... |

6 |
Adaptive threshold logic
- MAYS
- 1963
(Show Context)
Citation Context ... are of same length in both linearly-separable as well as linearly non-separable cases [20]. It is known that in some cases this rule may fail to separate training vectors that are linearly separable =-=[11]-=-. This is not surprising, since the mean square error (MSE) solution 1 The rule aims at making minimum possible change in the weight vector during the update process such that the output for as many o... |

4 |
Large-scale Linearly Constrained
- Murtagh, Saunders
- 1978
(Show Context)
Citation Context ...implementational difficulties bring doubts on the practicality of these adaptive algorithms. The only reservation we have about this conclusion is that for linear programming we used the MINOS solver =-=[13]-=- through the AMPL interface [4], which was highly optimized commercial code, whereas the adaptive procedures were run under the simplest implementation by ourselves in C, so affordable (elapsed) time ... |

3 |
and Nonlinear Separation of Patterns by
- Mangasarian, Linear
- 1965
(Show Context)
Citation Context ...With a properly defined objective function, a separating hyperplane can be obtained by solving a linear programming problem. Several alternative formulations have been proposed in the past ([3], [5], =-=[10]-=-, [15], [17]) employing different objective functions. An early survey of these methods is given in [7]. Here we mention a few representative formulations. In a very simple formulation described in [1... |

2 |
Adaptive Ho-Kashyap Rules for Perceptron Training
- Hassoun, Song
- 1992
(Show Context)
Citation Context ... = w j + ff(Z t ) y (jffl j j + ffl j ) (6) where (Z t ) y = (ZZ t ) \Gamma1 Z is the pseudoinverse of Z t . Computation of the pseudoinverse may be avoided by using the following alternate procedure =-=[8]-=- ffl j k = z k t w j \Gamma b k j b k j+1 = b k j + ae 1 =2(jffl j k j + ffl j k ) w j+1 = w j \Gamma ae 2 z k (z kt w j \Gamma b j+1 k ) (7) This algorithm yields a solution vector in case of linearl... |

1 |
The Fractional Correction Rule : A New Perspective
- Basu, Liang
- 1998
(Show Context)
Citation Context ...es [12]. An observation of such cycling is an indication of that the input is linearly nonseparable. Though, there are no known time bounds for this to become observable. The projection learning rule =-=[1]-=- (more commonly known as the fractional correction rule [18]), a variation of the perceptron rule, is also based on error correcting principle. However, its behavior with nonlinear input is quite diff... |

1 |
Comment on "Pattern Classification Design by Linear Programming
- Grinold
- 1969
(Show Context)
Citation Context ...ny given iteration. This formulation gives only a test for linear separability but does not lead to any useful solution if the data are not linearly separable. Another formulation suggested by Smith (=-=[6]-=-, [17]) minimizes an error function: minimize a t t subject to Z t w + tsb ts0 (9) where Z is the augmented data matrix as before, a is a positive vector of weights, b is a positive margin vector chos... |