### Table 1: Several discrete-time gradient-based algorithms: EGU|Unnormalized Exponentiated Gradient [KW97b], EG|Exponentiated Gradient [KW97b], and BEG|Bounded Exponentiated Gradient [Byl97]. Here 5t;i is short hand for @Lt(!t) @!t[i]

1997

"... In PAGE 3: ... The Euler-discretization of the dual update (2) gives !t+h := !t ? h 5 Lt(!t) or t+h := f(f?1( t) ? h 5 Lt(!t)) : (4) For example if f is the identity function then both the main update and its dual collapse to the conventional gradient descent update. However if f(x) = ln(x) then the discretized version (3) of the main update with h = 1 gives the Unnormalized Exponentiated Gradient Update (EGU) of [KW97b] (See the Table1 for more examples).In the next section we discuss the purpose and desired properties of link functions.... ..."

Cited by 5

### Table 1 A summary of the gradient-based methods and their time complexities. The variable n denotes the number of processing nodes.

1999

Cited by 3

### Table1. Complexityofgradient-baseddirectandindirectmethods. Gradient-based estimators 2-D 1-D

2003

Cited by 1

### Table 1 shows example parallel speedup and efficiency results for DAKOTA optimizations conducted on a network-connected workstation cluster. The OPT++ quasi-Newton vendor optimization algorithm [7] was used for this analysis to explore value-based load-imbalanced, value-based speculative, and gradient-based line searches. The results were generated using an algebraic nonlin- ear unconstrained Rosenbrock function [8], which was combined with a time delay of five seconds to simulate expensive function evaluations. The first two rows show results for the value-based line search in load imbalanced and speculative parallel modes, respectively. The value-based load-imbalanced approach requires one fewer processor than do the value-based speculative and gra- dient-based approaches because only a gradient computation is performed following the line search (the function value is already available). However the performance of the value-based load-imbalanced line search is much lower, as measured by speedup

"... In PAGE 3: ... A communicator defines the context of processors over which a message passing communication occurs. By providing mechanisms for subdividing existing communicators into new partitions and for sending messages between the new partitions, each level of parallelism can be Table1 : DAKOTA/OPT++ speedup and efficiency results Line Search Type p il Ts (sec.) Tp (sec.... ..."

### Table 1: Results for the anbncn language. The table compares Pseudoinverse based Evolino (PI-Evolino) with Gradient-based LSTM (G-LSTM) on the anbncn language task. Standard refers to Evolino with the parameter settings used for both discrete and continuous domains (anbncn and superimposed sine waves). The Tuned version is biased to the language task: we additionally squash the cell input with the tanh function. The leftmost column shows the set of strings used for training in each of the experiments. The other three columns show the set of legal strings to which each method could generalize after 50 generations (3000 evaluations), averaged over 20 runs. The upper training sets contain all strings up to the indicated length. The lower training sets only contain a single pair. PI- Evolino generalizes better than G-LSTM, most notably when trained on only two examples of correct behavior. The G-LSTM results are taken from [1].

2007

"... In PAGE 14: ...n anbncn are those in which the number of as, bs, and cs is equal, e.g. ST , SabcT , SaabbccT , SaaabbbcccT , and so forth. So, for n = 3, the set of input and target values would be: Input: S a a a b b b c c c Target: a/T a/b a/b a/b b b c c c T Evolino-based LSTM networks were evolved using 8 different training sets, each containing legal strings with values for n as shown in the first column of Table1 . In the first four sets, n ranges from 1 to k, where k = 10, 20, 30, 40.... In PAGE 14: ... Evolution was terminated after 50 generations, after which the best network in each simulation was tested. Table1 compares the results of Evolino-based LSTM, using Pseudoinverse as supervised learning module (PI-Evolino), with those of G-LSTM from [1]; Stan- dard PI-Evolino uses parameter settings that are a compromise between discrete and continuous domains. If we set h to the tanh function, we obtain Tuned PI- Evolino.... ..."

Cited by 4

### Table 5. Performance of neural networks

2005

"... In PAGE 4: ... The training set was broken up as 80% training and 20% cross validation. Table5 reveals the performance of backpropagation and conjugate gradient algorithm for the directional prediction of Microsoft stocks for different number of hidden neurons. Performance of the Mamdani Fuzzy Inference System (FIS) is illustrated in Table 6.... ..."

Cited by 1

### Table 6: Optimization Results, Weight Merit Function, 6-40-3(80) Network This was not the same design as indicated by the non- parametric representation of the design space which oc- curred for material combination 11211. A comparison of the non-parametric representation of the two design spaces and their associated neural network approxima- tions is provided in Figure 14.

1994

"... In PAGE 10: ... The results in this section are presented in two forms. Table6 shows the optimum weight design as de- termined by each of the optimization strategies: 1) ex- haustive material space/gradient based continuous vari- able search [EX], 2) simulated annealing [SA] or 3) suc- cessive simulated annealing [SSA]. The EX solution also indicates which of the three initial conditions for the gra- dient search on the x2 parameter resulted in the least weight design.... In PAGE 10: ... The De- sign Space Optimum also indicates the constraint which was active for each of the structural elements: n-no constraint, g-minimum gage, b-local buckling or y-yield stress. Table6 presents the results for the 6-40-3(80) net- work. All three optimization methods converged to the same material combination (11111) and to the same Node 2 x-location (consistent with the convergence cri- teria).... ..."

Cited by 7

### Table 8 Translation from neural network into system identification.

in IN

"... In PAGE 9: ...able 7 Parameter estimation results after pruning .......................................................... 67 Table8... ..."

### Table 1: Fitted models for the translation example of Berger et al. [4].

"... In PAGE 3: ... This is slightly silly, but the exam- ple is easy to understand and verify, and its small sample space allows a comparison of the routines using estima- tors with those using exact expressions. Table1 shows trials of how well the constraints are satisfied and how many iterations are required to reach convergence as a function of sample size for the naive and gradient-based methods and their exact versions. All trials in Table 1 are based on a uniform instrumental sampling distribu-... ..."