Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (2006)
by
András Antos
,
Csaba Szepesvári
,
Rémi Munos
| Venue: | In COLT-19 |
| Citations: | 52 - 15 self |







