n-Step TD for Blackjack

Comparing n-step TD methods for policy evaluation.

About the Simulation

This simulation evaluates a fixed Blackjack strategy ("stick on 20 or 21, hit otherwise") using n-step Temporal-Difference (TD) methods. The goal is to calculate the **value function**, which represents the probability of winning from any given game state.

Controls

n-steps: 3

Controls how many future steps are used to calculate the return. A higher 'n' is closer to a Monte Carlo update.

Value Function (Usable Ace)

Shows the probability of winning when the player holds an Ace that can count as 11. Green is high probability, Red is low.

Value Function (No Usable Ace)

Shows the probability of winning when the player has a "hard" hand (no Ace, or Ace must count as 1).

Learning Progress Comparison

Compares the learning speed of two algorithms. A faster rising curve indicates a more efficient algorithm.