epsilon-Soft Random Action
A summary of "Understanding Deep Reinforcement Learning"
$\epsilon$-soft Random Action
- A policy is $\epsilon$-soft if all actions $a$ have probability of being chosen
where $\vert A \vert$ is the number of possible actions.
-
Example ($\epsilon$-greedy policy)
- Let action $a$ from policy $\pi$
- Select a random number $p$ between 0 and 1
- if $p < 1 - \epsilon$: (condition 1)
choose action $a$ - else: (condition 2)
choose random action among all possible actions
- if $p < 1 - \epsilon$: (condition 1)
-
Proof:
- Probability of selecting action $a_k$ from condition 1 : $1- \epsilon$
- Probability of selecting action $a_k$ from condition 2 : $\frac{\epsilon}{\vert A \vert}$
- Probability of selection action $a_k$ :
$P(a_k) = (1 - \epsilon) + \frac{\epsilon}{\vert A \vert}$ - Sum probabilities of all action :
- Thus, minimum probability of selection action $a$ is $\frac{\epsilon}{\vert A \vert}$