tf_agents.environments.TFPyEnvironment

Exposes a Python environment as an in-graph TF environment.

Inherits From: TFEnvironment

View aliases

Main aliases

tf_agents.environments.tf_py_environment.TFPyEnvironment

tf_agents.environments.TFPyEnvironment(
    environment: tf_agents.environments.PyEnvironment,
    check_dims: bool = False,
    isolation: bool = False
)

Used in the notebooks

Used in the tutorials
Train a Deep Q Network with TF-Agents Environments Tutorial on Multi Armed Bandits in TF-Agents Checkpointer and PolicySaver REINFORCE agent

This class supports Python environments that return nests of arrays as observations and accept nests of arrays as actions. The nest structure is reflected in the in-graph environment's observation and action structure.

Implementation notes:

Since tf.py_func deals in lists of tensors, this class has some additional tf.nest.flatten and tf.nest.pack_structure_as calls.
This class currently cast rewards and discount to float32.

Args
`environment`	Environment to interact with, implementing `py_environment.PyEnvironment`. Or a `callable` that returns an environment of this form. If a `callable` is provided and `thread_isolation` is provided, the callable is executed in the dedicated thread.
`check_dims`	Whether should check batch dimensions of actions in `step`.
`isolation`	If this value is `False` (default), interactions with the environment will occur within whatever thread the methods of the `TFPyEnvironment` are run from. For example, in TF graph mode, methods like `step` are called from multiple threads created by the TensorFlow engine; calls to step the environment are guaranteed to be sequential, but not from the same thread. This creates problems for environments that are not thread-safe. Using isolation ensures not only that a dedicated thread (or thread-pool) is used to interact with the environment, but also that interaction with the environment happens in a serialized manner. If `isolation == True`, a dedicated thread is created for interactions with the environment. If `isolation` is an instance of `multiprocessing.pool.Pool` (this includes instances of `multiprocessing.pool.ThreadPool`, nee `multiprocessing.dummy.Pool` and `multiprocessing.Pool`, then this pool is used to interact with the environment. NOTE If using `isolation` with a `BatchedPyEnvironment`, ensure you create the `BatchedPyEnvironment` with `multithreading=False`, since otherwise the multithreading in that wrapper reverses the effects of this one.

Raises
`TypeError`	If `environment` is not an instance of `py_environment.PyEnvironment` or subclasses, or is a callable that does not return an instance of `PyEnvironment`.
`TypeError`	If `isolation` is not `True`, `False`, or an instance of `multiprocessing.pool.Pool`.

Attributes
`batch_size`
`batched`
`pyenv`	Returns the underlying Python environment.

Attributes

batch_size

batched

pyenv Returns the underlying Python environment.

Methods

`action_spec`

View source

action_spec()

Describes the specs of the Tensors expected by step(action).

action can be a single Tensor, or a nested dict, list or tuple of Tensors.

Returns
An single `TensorSpec`, or a nested dict, list or tuple of `TensorSpec` objects, which describe the shape and dtype of each Tensor expected by `step()`.

`close`

View source

close() -> None

Send close to wrapped env & also to the isolation pool + join it.

Only closes pool when isolation was provided at init time.

`current_time_step`

View source

current_time_step()

Returns the current TimeStep.

Returns
A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: Reward at this time_step. discount: A discount in the range [0, 1]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to `observation_spec()`.

`observation_spec`

View source

observation_spec()

Defines the TensorSpec of observations provided by the environment.

Returns
A `TensorSpec`, or a nested dict, list or tuple of `TensorSpec` objects, which describe the observation.

`render`

View source

render(
    mode: Text = 'rgb_array'
) -> Optional[types.NestedTensor]

Renders the environment.

Note for compatibility this will convert the image to uint8.

Args
`mode`	One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized.

Returns
A Tensor of shape [width, height, 3] denoting an RGB image if mode is `rgb_array`. Otherwise return nothing and render directly to a display window.

Raises
`NotImplementedError`	If the environment does not support rendering.

`reset`

View source

reset()

Resets the environment and returns the current time_step.

Returns
A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: Reward at this time_step. discount: A discount in the range [0, 1]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to `observation_spec()`.

`reward_spec`

View source

reward_spec()

Defines the TensorSpec of rewards provided by the environment.

Returns
A `TensorSpec`, or a nested dict, list or tuple of `TensorSpec` objects, which describe the reward.

`step`

View source

step(
    action
)

Steps the environment according to the action.

If the environment returned a TimeStep with StepType.LAST at the previous step, this call to step should reset the environment (note that it is expected that whoever defines this method, calls reset in this case), start a new sequence and action will be ignored.

This method will also start a new sequence if called after the environment has been constructed and reset() has not been called. In this case action will be ignored.

Expected sequences look like:

time_step -> action -> next_time_step

The action should depend on the previous time_step for correctness.

Args
`action`	A Tensor, or a nested dict, list or tuple of Tensors corresponding to `action_spec()`.

Returns
A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: Reward at this time_step. discount: A discount in the range [0, 1]. observation: A Tensor, or a nested dict, list or tuple of Tensors corresponding to `observation_spec()`.

`time_step_spec`

View source

time_step_spec()

Describes the TimeStep specs of Tensors returned by step().

Returns
A `TimeStep` namedtuple containing `TensorSpec` objects defining the Tensors returned by `step()`, i.e. (step_type, reward, discount, observation).

tf_agents.environments.TFPyEnvironment

View aliases

Used in the notebooks

Implementation notes:

Args

Raises

Attributes

Methods

action_spec

close

current_time_step

observation_spec

render

reset

reward_spec

step

time_step_spec

`action_spec`

`close`

`current_time_step`

`observation_spec`

`render`

`reset`

`reward_spec`

`step`

`time_step_spec`