tf_agents.policies.random_py_policy.RandomPyPolicy

Returns random samples of the given action_spec.

Inherits From: PyPolicy

tf_agents.policies.random_py_policy.RandomPyPolicy(
    time_step_spec: tf_agents.trajectories.TimeStep,
    action_spec: tf_agents.typing.types.NestedArraySpec,
    policy_state_spec: tf_agents.typing.types.NestedArraySpec = (),
    info_spec: tf_agents.typing.types.NestedArraySpec = (),
    seed: Optional[types.Seed] = None,
    outer_dims: Optional[Sequence[int]] = None,
    observation_and_action_constraint_splitter: Optional[types.Splitter] = None
)

Used in the notebooks

Used in the tutorials
Policies Drivers SAC minitaur with the Actor-Learner API

Args
`time_step_spec`	Reference `time_step_spec`. If not None and outer_dims is not provided this is used to infer the outer_dims required for the given time_step when action is called.
`action_spec`	A nest of BoundedArraySpec representing the actions to sample from.
`policy_state_spec`	Nest of `tf.TypeSpec` representing the data in the policy state field.
`info_spec`	Nest of `tf.TypeSpec` representing the data in the policy info field.
`seed`	Optional seed used to instantiate a random number generator.
`outer_dims`	An optional list/tuple specifying outer dimensions to add to the spec shape before sampling. If unspecified the outer_dims are derived from the outer_dims in the given observation when `action` is called.
`observation_and_action_constraint_splitter`	A function used to process observations with action constraints. These constraints can indicate, for example, a mask of valid/invalid actions for a given state of the environment. The function takes in a full observation and returns a tuple consisting of 1) the part of the observation intended as input to the network and 2) the constraint. An example `observation_and_action_constraint_splitter` could be as simple as: `def observation_and_action_constraint_splitter(observation): return observation['network_input'], observation['constraint']` Note: when using `observation_and_action_constraint_splitter`, make sure the provided `q_network` is compatible with the network-specific half of the output of the `observation_and_action_constraint_splitter`. In particular, `observation_and_action_constraint_splitter` will be called on the observation before passing to the network. If `observation_and_action_constraint_splitter` is None, action constraints are not applied.

Attributes
`action_spec`	Describes the ArraySpecs of the np.Array returned by `action()`. `action` can be a single np.Array, or a nested dict, list or tuple of np.Array.
`collect_data_spec`	Describes the data collected when using this policy with an environment.
`info_spec`	Describes the Arrays emitted as info by `action()`.
`observation_and_action_constraint_splitter`
`policy_state_spec`	Describes the arrays expected by functions with `policy_state` as input.
`policy_step_spec`	Describes the output of `action()`.
`time_step_spec`	Describes the `TimeStep` np.Arrays expected by `action(time_step)`.
`trajectory_spec`	Describes the data collected when using this policy with an environment.

Methods

`action`

View source

action(
    time_step: tf_agents.trajectories.TimeStep,
    policy_state: tf_agents.typing.types.NestedArray = (),
    seed: Optional[types.Seed] = None
) -> tf_agents.trajectories.PolicyStep

Generates next action given the time_step and policy_state.

Args
`time_step`	A `TimeStep` tuple corresponding to `time_step_spec()`.
`policy_state`	An optional previous policy_state.
`seed`	Seed to use if action uses sampling (optional).

Returns
A PolicyStep named tuple containing: `action`: A nest of action Arrays matching the `action_spec()`. `state`: A nest of policy states to be fed into the next call to action. `info`: Optional side information such as action log probabilities.

`get_initial_state`

View source

get_initial_state(
    batch_size: Optional[int] = None
) -> tf_agents.typing.types.NestedArray

Returns an initial state usable by the policy.

Args
`batch_size`	An optional batch size.

Returns
An initial policy state.

tf_agents.policies.random_py_policy.RandomPyPolicy

Used in the notebooks

Args

Attributes

Methods

action

get_initial_state

`action`

`get_initial_state`