View source on GitHub |
Computes CTC (Connectionist Temporal Classification) loss.
tf.nn.ctc_loss(
labels,
logits,
label_length,
logit_length,
logits_time_major=True,
unique=None,
blank_index=None,
name=None
)
This op implements the CTC loss as presented in Graves et al., 2006
Connectionist temporal classification (CTC) is a type of neural network output and associated scoring function, for training recurrent neural networks (RNNs) such as LSTM networks to tackle sequence problems where the timing is variable. It can be used for tasks like on-line handwriting recognition or recognizing phones in speech audio. CTC refers to the outputs and scoring, and is independent of the underlying neural network structure.
Notes:
- This class performs the softmax operation for you, so
logits
should be e.g. linear projections of outputs by an LSTM. - Outputs true repeated classes with blanks in between, and can also output repeated classes with no blanks in between that need to be collapsed by the decoder.
labels
may be supplied as either a dense, zero-paddedTensor
with a vector of label sequence lengths OR as aSparseTensor
.- On TPU: Only dense padded
labels
are supported. - On CPU and GPU: Caller may use
SparseTensor
or dense paddedlabels
but calling with aSparseTensor
will be significantly faster. - Default blank label is
0
instead ofnum_labels - 1
(wherenum_labels
is the innermost dimension size oflogits
), unless overridden byblank_index
.
tf.random.set_seed(50)
batch_size = 8
num_labels = 6
max_label_length = 5
num_frames = 12
labels = tf.random.uniform([batch_size, max_label_length],
minval=1, maxval=num_labels, dtype=tf.int64)
logits = tf.random.uniform([num_frames, batch_size, num_labels])
label_length = tf.random.uniform([batch_size], minval=2,
maxval=max_label_length, dtype=tf.int64)
label_mask = tf.sequence_mask(label_length, maxlen=max_label_length,
dtype=label_length.dtype)
labels *= label_mask
logit_length = [num_frames] * batch_size
with tf.GradientTape() as t:
t.watch(logits)
ref_loss = tf.nn.ctc_loss(
labels=labels,
logits=logits,
label_length=label_length,
logit_length=logit_length,
blank_index=0)
ref_grad = t.gradient(ref_loss, logits)
Returns | |
---|---|
loss
|
A 1-D float Tensor of shape [batch_size] , containing negative log
probabilities.
|
Raises | |
---|---|
ValueError
|
Argument blank_index must be provided when labels is a
SparseTensor .
|
References | |
---|---|
Connectionist Temporal Classification - Labeling Unsegmented Sequence Data
with Recurrent Neural Networks:
Graves et al., 2006
(pdf)
https://en.wikipedia.org/wiki/Connectionist_temporal_classification |