epsilon-Soft Random Action
A summary of "Understanding Deep Reinforcement Learning"
-soft Random Action
- A policy is -soft if all actions have probability of being chosen
where is the number of possible actions.
-
Example (-greedy policy)
- Let action from policy
- Select a random number between 0 and 1
- if : (condition 1)
choose action - else: (condition 2)
choose random action among all possible actions
- if : (condition 1)
-
Proof:
- Probability of selecting action from condition 1 :
- Probability of selecting action from condition 2 :
- Probability of selection action :
- Sum probabilities of all action :
- Thus, minimum probability of selection action is