ϵ\epsilon-soft Random Action

  • A policy is ϵ\epsilon-soft if all actions aa have probability of being chosen
P(a)ϵAP(a) \geq \frac{\epsilon}{\vert A \vert}

where A\vert A \vert is the number of possible actions.

  • Example (ϵ\epsilon-greedy policy)

    • Let action aa from policy π\pi
    • Select a random number pp between 0 and 1
      • if p<1ϵp < 1 - \epsilon: (condition 1)
        choose action aa
      • else: (condition 2)
        choose random action among all possible actions
  • Proof:

    • Probability of selecting action aka_k from condition 1 : 1ϵ1- \epsilon
    • Probability of selecting action aka_k from condition 2 : ϵA\frac{\epsilon}{\vert A \vert}
    • Probability of selection action aka_k :
      P(ak)=(1ϵ)+ϵAP(a_k) = (1 - \epsilon) + \frac{\epsilon}{\vert A \vert}
    • Sum probabilities of all action : P(ak)+ikP(ai)=(1ϵ)+ϵA+ikϵA=(1ϵ)+ϵA+ϵA1A=(1ϵ)+ϵ=1\begin{aligned} P(a_k) + \sum_{i \neq k} P(a_i) &= (1-\epsilon) + \frac{\epsilon}{\vert A \vert} + \sum_{i \neq k} \frac{\epsilon}{\vert A \vert} \\ &= (1 - \epsilon) + \frac{\epsilon}{\vert A \vert} + \frac{\epsilon \vert A - 1 \vert}{\vert A \vert} \\ &= (1 - \epsilon) + \epsilon \\ &= 1 \end{aligned}
    • Thus, minimum probability of selection action aa is ϵA\frac{\epsilon}{\vert A \vert}