View source on GitHub |
A tuple that represents a transition.
tf_agents.trajectories.Transition(
time_step, action_step, next_time_step
)
A Transition
represents a S, A, S'
sequence of operations. Tensors
within a Transition
are typically shaped [B, ...]
where B
is the
batch size.
In some cases Transition objects are used to store time-shifted intermediate
values for RNN computations, in which case the stored tensors are
shaped [B, T, ...]
.
In other cases, Transition
objects store n-step transitions
S_t, A_t, S_{t+N}
where the associated reward and discount in
next_time_step
are calculated as:
next_time_step.reward = r_t +
g^{1} * d_t * r_{t+1} +
g^{2} * d_t * d_{t+1} * r_{t+2} +
g^{3} * d_t * d_{t+1} * d_{t+2} * r_{t+3} +
...
g^{N-1} * d_t * ... * d_{t+N-2} * r_{t+N-1}
next_time_step.discount = g^{N-1} * d_t * d_{t+1} * ... * d_{t+N-1}.
See to_n_step_transition
for an example that converts Trajectory
objects
to this format.
Methods
replace
replace(
**kwargs
) -> 'Transition'
Exposes as namedtuple._replace.
Usage:
new_transition = transition.replace(action_step=())
This returns a new transition with an empty action_step
.
Args | |
---|---|
**kwargs
|
key/value pairs of fields in the transition. |
Returns | |
---|---|
A new Transition .
|