View source on GitHub |
Exposes a Python environment as an in-graph TF environment.
Inherits From: TFEnvironment
tf_agents.environments.TFPyEnvironment(
environment: tf_agents.environments.PyEnvironment
,
check_dims: bool = False,
isolation: bool = False
)
Used in the notebooks
Used in the tutorials |
---|
This class supports Python environments that return nests of arrays as observations and accept nests of arrays as actions. The nest structure is reflected in the in-graph environment's observation and action structure.
Implementation notes:
Since
tf.py_func
deals in lists of tensors, this class has some additionaltf.nest.flatten
andtf.nest.pack_structure_as
calls.This class currently cast rewards and discount to float32.
Args | |
---|---|
environment
|
Environment to interact with, implementing
py_environment.PyEnvironment . Or a callable that returns an
environment of this form. If a callable is provided and
thread_isolation is provided, the callable is executed in the
dedicated thread.
|
check_dims
|
Whether should check batch dimensions of actions in step .
|
isolation
|
If this value is False (default), interactions with the
environment will occur within whatever thread the methods of the
TFPyEnvironment are run from. For example, in TF graph mode, methods
like step are called from multiple threads created by the TensorFlow
engine; calls to step the environment are guaranteed to be sequential,
but not from the same thread. This creates problems for environments
that are not thread-safe. Using isolation ensures not only that a
dedicated thread (or thread-pool) is used to interact with the
environment, but also that interaction with the environment happens in a
serialized manner. If isolation == True , a dedicated thread is
created for interactions with the environment. If isolation is an
instance of multiprocessing.pool.Pool (this includes instances of
multiprocessing.pool.ThreadPool , nee multiprocessing.dummy.Pool and
multiprocessing.Pool , then this pool is used to interact with the
environment. NOTE If using isolation with a
BatchedPyEnvironment , ensure you create the BatchedPyEnvironment
with multithreading=False , since otherwise the multithreading in that
wrapper reverses the effects of this one.
|
Raises | |
---|---|
TypeError
|
If environment is not an instance of
py_environment.PyEnvironment or subclasses, or is a callable that does
not return an instance of PyEnvironment .
|
TypeError
|
If isolation is not True , False , or an instance of
multiprocessing.pool.Pool .
|
Attributes | |
---|---|
batch_size
|
|
batched
|
|
pyenv
|
Returns the underlying Python environment. |
Methods
action_spec
action_spec()
Describes the specs of the Tensors expected by step(action)
.
action
can be a single Tensor, or a nested dict, list or tuple of
Tensors.
Returns | |
---|---|
An single TensorSpec , or a nested dict, list or tuple of
TensorSpec objects, which describe the shape and
dtype of each Tensor expected by step() .
|
close
close() -> None
Send close to wrapped env & also to the isolation pool + join it.
Only closes pool when isolation
was provided at init time.
current_time_step
current_time_step()
Returns the current TimeStep
.
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: Reward at this time_step.
discount: A discount in the range [0, 1].
observation: A Tensor, or a nested dict, list or tuple of Tensors
corresponding to observation_spec() .
|
observation_spec
observation_spec()
Defines the TensorSpec
of observations provided by the environment.
Returns | |
---|---|
A TensorSpec , or a nested dict, list or tuple of
TensorSpec objects, which describe the observation.
|
render
render(
mode: Text = 'rgb_array'
) -> Optional[types.NestedTensor]
Renders the environment.
Note for compatibility this will convert the image to uint8.
Args | |
---|---|
mode
|
One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized. |
Returns | |
---|---|
A Tensor of shape [width, height, 3] denoting an RGB image if mode is
rgb_array . Otherwise return nothing and render directly to a display
window.
|
Raises | |
---|---|
NotImplementedError
|
If the environment does not support rendering. |
reset
reset()
Resets the environment and returns the current time_step.
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: Reward at this time_step.
discount: A discount in the range [0, 1].
observation: A Tensor, or a nested dict, list or tuple of Tensors
corresponding to observation_spec() .
|
reward_spec
reward_spec()
Defines the TensorSpec
of rewards provided by the environment.
Returns | |
---|---|
A TensorSpec , or a nested dict, list or tuple of
TensorSpec objects, which describe the reward.
|
step
step(
action
)
Steps the environment according to the action.
If the environment returned a TimeStep
with StepType.LAST
at the
previous step, this call to step
should reset the environment (note that
it is expected that whoever defines this method, calls reset in this case),
start a new sequence and action
will be ignored.
This method will also start a new sequence if called after the environment
has been constructed and reset()
has not been called. In this case
action
will be ignored.
Expected sequences look like:
time_step -> action -> next_time_step
The action should depend on the previous time_step for correctness.
Args | |
---|---|
action
|
A Tensor, or a nested dict, list or tuple of Tensors corresponding
to action_spec() .
|
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: Reward at this time_step.
discount: A discount in the range [0, 1].
observation: A Tensor, or a nested dict, list or tuple of Tensors
corresponding to observation_spec() .
|
time_step_spec
time_step_spec()
Describes the TimeStep
specs of Tensors returned by step()
.
Returns | |
---|---|
A TimeStep namedtuple containing TensorSpec objects defining the
Tensors returned by step() , i.e.
(step_type, reward, discount, observation).
|