View source on GitHub |
Computes the weighted cross-entropy loss for a sequence of logits.
tfa.seq2seq.sequence_loss(
logits: tfa.types.TensorLike
,
targets: tfa.types.TensorLike
,
weights: tfa.types.TensorLike
,
average_across_timesteps: bool = True,
average_across_batch: bool = True,
sum_over_timesteps: bool = False,
sum_over_batch: bool = False,
softmax_loss_function: Optional[Callable] = None,
name: Optional[str] = None
) -> tf.Tensor
Depending on the values of average_across_timesteps
/
sum_over_timesteps
and average_across_batch
/ sum_over_batch
, the
return Tensor will have rank 0, 1, or 2 as these arguments reduce the
cross-entropy at each target, which has shape
[batch_size, sequence_length]
, over their respective dimensions. For
example, if average_across_timesteps
is True
and average_across_batch
is False
, then the return Tensor will have shape [batch_size]
.
Note that average_across_timesteps
and sum_over_timesteps
cannot be
True at same time. Same for average_across_batch
and sum_over_batch
.
The recommended loss reduction in tf 2.0 has been changed to sum_over,
instead of weighted average. User are recommend to use sum_over_timesteps
and sum_over_batch
for reduction.
Args | |
---|---|
logits
|
A Tensor of shape
[batch_size, sequence_length, num_decoder_symbols] and dtype float.
The logits correspond to the prediction across all classes at each
timestep.
|
targets
|
A Tensor of shape [batch_size, sequence_length] and dtype
int. The target represents the true class at each timestep.
|
weights
|
A Tensor of shape [batch_size, sequence_length] and dtype
float. weights constitutes the weighting of each prediction in the
sequence. When using weights as masking, set all valid timesteps to 1
and all padded timesteps to 0, e.g. a mask returned by
tf.sequence_mask .
|
average_across_timesteps
|
If set, sum the cost across the sequence dimension and divide the cost by the total label weight across timesteps. |
average_across_batch
|
If set, sum the cost across the batch dimension and divide the returned cost by the batch size. |
sum_over_timesteps
|
If set, sum the cost across the sequence dimension and divide the size of the sequence. Note that any element with 0 weights will be excluded from size calculation. |
sum_over_batch
|
if set, sum the cost across the batch dimension and divide the total cost by the batch size. Not that any element with 0 weights will be excluded from size calculation. |
softmax_loss_function
|
Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). Note that to avoid confusion, it is required for the function to accept named arguments. |
name
|
Optional name for this operation, defaults to "sequence_loss". |
Returns | |
---|---|
A float Tensor of rank 0, 1, or 2 depending on the
average_across_timesteps and average_across_batch arguments. By
default, it has rank 0 (scalar) and is the weighted average cross-entropy
(log-perplexity) per symbol.
|
Raises | |
---|---|
ValueError
|
logits does not have 3 dimensions or targets does not have 2 dimensions or weights does not have 2 dimensions. |