View source on GitHub |
Abstract base class for Python Policies.
tf_agents.policies.py_policy.PyPolicy(
time_step_spec: tf_agents.trajectories.TimeStep
,
action_spec: tf_agents.typing.types.NestedArraySpec
,
policy_state_spec: tf_agents.typing.types.NestedArraySpec
= (),
info_spec: tf_agents.typing.types.NestedArraySpec
= (),
observation_and_action_constraint_splitter: Optional[types.Splitter] = None
)
The action(time_step, policy_state)
method returns a PolicyStep named tuple
containing the following nested arrays:
action
: The action to be applied on the environment.
state
: The state of the policy (E.g. RNN state) to be fed into the next
call to action.
info
: Optional side information such as action log probabilities.
For stateful policies, e.g. those containing RNNs, an initial policy state can
be obtained through a call to get_initial_state()
.
Example of simple use in Python:
py_env = PyEnvironment() policy = PyPolicy()
time_step = py_env.reset() policy_state = policy.get_initial_state()
acc_reward = 0 while not time_step.is_last(): action_step = policy.action(time_step, policy_state) policy_state = action_step.state time_step = py_env.step(action_step.action) acc_reward += time_step.reward
Methods
action
action(
time_step: tf_agents.trajectories.TimeStep
,
policy_state: tf_agents.typing.types.NestedArray
= (),
seed: Optional[types.Seed] = None
) -> tf_agents.trajectories.PolicyStep
Generates next action given the time_step and policy_state.
Args | |
---|---|
time_step
|
A TimeStep tuple corresponding to time_step_spec() .
|
policy_state
|
An optional previous policy_state. |
seed
|
Seed to use if action uses sampling (optional). |
Returns | |
---|---|
A PolicyStep named tuple containing:
action : A nest of action Arrays matching the action_spec() .
state : A nest of policy states to be fed into the next call to action.
info : Optional side information such as action log probabilities.
|
get_initial_state
get_initial_state(
batch_size: Optional[int] = None
) -> tf_agents.typing.types.NestedArray
Returns an initial state usable by the policy.
Args | |
---|---|
batch_size
|
An optional batch size. |
Returns | |
---|---|
An initial policy state. |