View source on GitHub |
Pad model input and generate corresponding input masks.
text.pad_model_inputs(
input, max_seq_length, pad_value=0
)
Used in the notebooks
Used in the guide |
---|
pad_model_inputs
performs the final packaging of a model's inputs commonly
found in text models. This includes padding out (or simply truncating) to a
fixed-size, max 2-dimensional Tensor
and generating mask Tensor
s (of the
same shape) with values of 0 if the corresponding item is a pad value and 1 if
it is part of the original input.
Note that a simple truncation strategy (drop everything after max sequence
length) is used to force the inputs to the specified shape. This may be
incorrect and users should instead apply a Trimmer
upstream to safely
truncate large inputs.
input_data = tf.ragged.constant([
[101, 1, 2, 102, 10, 20, 102],
[101, 3, 4, 102, 30, 40, 50, 60, 70, 80],
[101, 5, 6, 7, 8, 9, 102, 70],
], np.int32)
data, mask = pad_model_inputs(input=input_data, max_seq_length=9)
print("data: %s, mask: %s" % (data, mask))
data: tf.Tensor(
[[101 1 2 102 10 20 102 0 0]
[101 3 4 102 30 40 50 60 70]
[101 5 6 7 8 9 102 70 0]], shape=(3, 9), dtype=int32),
mask: tf.Tensor(
[[1 1 1 1 1 1 1 0 0]
[1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 0]], shape=(3, 9), dtype=int32)