Notice that reward and discount for time_steps are undefined, therefore filled
with zero.
Args
trajectory
An instance of Trajectory. The tensors in Trajectory must have
shape [B, T, ...] when next_trajectory is None. discount is assumed
to be a scalar float; hence the shape of trajectory.discount must be
[B, T].
next_trajectory
(optional) An instance of Trajectory.
Returns
A tuple (time_steps, policy_steps, next_time_steps). The reward and
discount fields of time_steps are filled with zeros because these
cannot be deduced (please do not use them).
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[]]