Returns random samples of the given action_spec.
Inherits From: PyPolicy
tf_agents.policies.random_py_policy.RandomPyPolicy(
time_step_spec: tf_agents.trajectories.TimeStep
,
action_spec: tf_agents.typing.types.NestedArraySpec
,
policy_state_spec: tf_agents.typing.types.NestedArraySpec
= (),
info_spec: tf_agents.typing.types.NestedArraySpec
= (),
seed: Optional[types.Seed] = None,
outer_dims: Optional[Sequence[int]] = None,
observation_and_action_constraint_splitter: Optional[types.Splitter] = None
)
Used in the notebooks
Args |
time_step_spec
|
Reference time_step_spec . If not None and outer_dims is
not provided this is used to infer the outer_dims required for the given
time_step when action is called.
|
action_spec
|
A nest of BoundedArraySpec representing the actions to sample
from.
|
policy_state_spec
|
Nest of tf.TypeSpec representing the data in the
policy state field.
|
info_spec
|
Nest of tf.TypeSpec representing the data in the policy info
field.
|
seed
|
Optional seed used to instantiate a random number generator.
|
outer_dims
|
An optional list/tuple specifying outer dimensions to add to
the spec shape before sampling. If unspecified the outer_dims are
derived from the outer_dims in the given observation when action is
called.
|
observation_and_action_constraint_splitter
|
A function used to process
observations with action constraints. These constraints can indicate,
for example, a mask of valid/invalid actions for a given state of the
environment. The function takes in a full observation and returns a
tuple consisting of 1) the part of the observation intended as input to
the network and 2) the constraint. An example
observation_and_action_constraint_splitter could be as simple as: def observation_and_action_constraint_splitter(observation): return
observation['network_input'], observation['constraint'] Note: when
using observation_and_action_constraint_splitter , make sure the
provided q_network is compatible with the network-specific half of the
output of the observation_and_action_constraint_splitter . In
particular, observation_and_action_constraint_splitter will be called
on the observation before passing to the network. If
observation_and_action_constraint_splitter is None, action constraints
are not applied.
|
Attributes |
action_spec
|
Describes the ArraySpecs of the np.Array returned by action() .
action can be a single np.Array, or a nested dict, list or tuple of
np.Array.
|
collect_data_spec
|
Describes the data collected when using this policy with an environment.
|
info_spec
|
Describes the Arrays emitted as info by action() .
|
observation_and_action_constraint_splitter
|
|
policy_state_spec
|
Describes the arrays expected by functions with policy_state as input.
|
policy_step_spec
|
Describes the output of action() .
|
time_step_spec
|
Describes the TimeStep np.Arrays expected by action(time_step) .
|
trajectory_spec
|
Describes the data collected when using this policy with an environment.
|
Methods
action
View source
action(
time_step: tf_agents.trajectories.TimeStep
,
policy_state: tf_agents.typing.types.NestedArray
= (),
seed: Optional[types.Seed] = None
) -> tf_agents.trajectories.PolicyStep
Generates next action given the time_step and policy_state.
Args |
time_step
|
A TimeStep tuple corresponding to time_step_spec() .
|
policy_state
|
An optional previous policy_state.
|
seed
|
Seed to use if action uses sampling (optional).
|
Returns |
A PolicyStep named tuple containing:
action : A nest of action Arrays matching the action_spec() .
state : A nest of policy states to be fed into the next call to action.
info : Optional side information such as action log probabilities.
|
get_initial_state
View source
get_initial_state(
batch_size: Optional[int] = None
) -> tf_agents.typing.types.NestedArray
Returns an initial state usable by the policy.
Args |
batch_size
|
An optional batch size.
|
Returns |
An initial policy state.
|