$\epsilon$-soft Random Action

  • A policy is $\epsilon$-soft if all actions $a$ have probability of being chosen
P(a)ϵAP(a) \geq \frac{\epsilon}{\vert A \vert}

where $\vert A \vert$ is the number of possible actions.

  • Example ($\epsilon$-greedy policy)

    • Let action $a$ from policy $\pi$
    • Select a random number $p$ between 0 and 1
      • if $p < 1 - \epsilon$: (condition 1)
        choose action $a$
      • else: (condition 2)
        choose random action among all possible actions
  • Proof:

    • Probability of selecting action $a_k$ from condition 1 : $1- \epsilon$
    • Probability of selecting action $a_k$ from condition 2 : $\frac{\epsilon}{\vert A \vert}$
    • Probability of selection action $a_k$ :
      $P(a_k) = (1 - \epsilon) + \frac{\epsilon}{\vert A \vert}$
    • Sum probabilities of all action : P(ak)+ikP(ai)=(1ϵ)+ϵA+ikϵA=(1ϵ)+ϵA+ϵA1A=(1ϵ)+ϵ=1\begin{aligned} P(a_k) + \sum_{i \neq k} P(a_i) &= (1-\epsilon) + \frac{\epsilon}{\vert A \vert} + \sum_{i \neq k} \frac{\epsilon}{\vert A \vert} \\ &= (1 - \epsilon) + \frac{\epsilon}{\vert A \vert} + \frac{\epsilon \vert A - 1 \vert}{\vert A \vert} \\ &= (1 - \epsilon) + \epsilon \\ &= 1 \end{aligned}
    • Thus, minimum probability of selection action $a$ is $\frac{\epsilon}{\vert A \vert}$