tf.keras.preprocessing.sequence.pad_sequences

TensorFlow 1 version View source on GitHub

Pads sequences to the same length.

This function transforms a list (of length num_samples) of sequences (lists of integers) into a 2D Numpy array of shape (num_samples, num_timesteps). num_timesteps is either the maxlen argument if provided, or the length of the longest sequence in the list.

Sequences that are shorter than num_timesteps are padded with value until they are num_timesteps long.

Sequences longer than num_timesteps are truncated so that they fit the desired length.

The position where padding or truncation happens is determined by the arguments padding and truncating, respectively. Pre-padding or removing values from the beginning of the sequence is the default.

sequence = [[1], [2, 3], [4, 5, 6]]
tf.keras.preprocessing.sequence.pad_sequences(sequence)
array([[0, 0, 1],
       [0, 2, 3],
       [4, 5, 6]], dtype=int32)
tf.keras.preprocessing.sequence.pad_sequences(sequence, value=-1)
array([[-1, -1,  1],
       [-1,  2,  3],
       [ 4,  5,  6]], dtype=int32)
tf.keras.preprocessing.sequence.pad_sequences(sequence, padding='post')
array([[1, 0, 0],
       [2, 3, 0],
       [4, 5, 6]], dtype=int32)
tf.keras.preprocessing.sequence.pad_sequences(sequence, maxlen=2)
array([[0, 1],
       [2, 3],
       [5, 6]], dtype=int32)

sequences List of sequences (each sequence is a list of integers).
maxlen Optional Int, maximum length of all sequences. If not provided, sequences will be padded to the length of the longest individual sequence.
dtype (Optional, defaults to int32). Type of the output sequences. To pad sequences with variable length strings, you can use object.
padding String, 'pre' or 'post' (optional, defaults to 'pre'): pad either before or after each sequence.
truncating String, 'pre' or 'post' (optional, defaults to 'pre'): remove values from sequences larger than maxlen, either at the beginning or at the end of the sequences.
value Float or String, padding value. (Optional, defaults to 0.)

Numpy array with shape (len(sequences), maxlen)

ValueError In case of invalid values for truncating or padding, or in case of invalid shape for a sequences entry.