Comparing n-step TD methods for policy evaluation.
This simulation evaluates a fixed Blackjack strategy ("stick on 20 or 21, hit otherwise") using n-step Temporal-Difference (TD) methods. The goal is to calculate the **value function**, which represents the probability of winning from any given game state.
Controls how many future steps are used to calculate the return. A higher 'n' is closer to a Monte Carlo update.
Shows the probability of winning when the player holds an Ace that can count as 11. Green is high probability, Red is low.
Shows the probability of winning when the player has a "hard" hand (no Ace, or Ace must count as 1).
Compares the learning speed of two algorithms. A faster rising curve indicates a more efficient algorithm.