The goal of cartpole task is balancing the cart to prevent the pole from falling down. It is one of the mostly experimented environments from OpenAI Gym. My implementation of q-learning solved cartpole in 1598 training steps. I am happy that it worked even though I haven’t tuned the hyperparameters too much :)
The below plot shows the 100 episodes average rewards got from on-policy training. x-axis represents the training episodes. The reward is along the y-axis.
See my post, Learing RL by Coding