View source on GitHub |
Create a Trajectory from tensors representing a single episode.
tf_agents.trajectories.trajectory.from_episode(
observation: tf_agents.typing.types.NestedSpecTensorOrArray
,
action: tf_agents.typing.types.NestedSpecTensorOrArray
,
policy_info: tf_agents.typing.types.NestedSpecTensorOrArray
,
reward: tf_agents.typing.types.NestedSpecTensorOrArray
,
discount: Optional[types.SpecTensorOrArray] = None
) -> tf_agents.trajectories.Trajectory
If none of the inputs are tensors, then numpy arrays are generated instead.
If discount
is not provided, the first entry in reward
is used to estimate
T
:
reward_0 = tf.nest.flatten(reward)[0]
T = shape(reward_0)[0]
In this case, a discount
of all ones having dtype float32
is generated.
Returns | |
---|---|
An instance of Trajectory .
|