H learning: A reinforcement learning method to optimize undiscounted average reward (1994)

by P Tadepalli, D Ok