View on TensorFlow.org | Run in Google Colab | View source on GitHub | Download notebook |
Overview
This notebook will demonstrate how to use the lazy adam optimizer from the Addons package.
LazyAdam
LazyAdam is a variant of the Adam optimizer that handles sparse updates more efficiently. The original Adam algorithm maintains two moving-average accumulators for each trainable variable; the accumulators are updated at every step. This class provides lazier handling of gradient updates for sparse variables. It only updates moving-average accumulators for sparse variable indices that appear in the current batch, rather than updating the accumulators for all indices. Compared with the original Adam optimizer, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original Adam algorithm, and may lead to different empirical results.
Setup
pip install -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
# Hyperparameters
batch_size=64
epochs=10
Build the Model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])
Prepare the Data
# Load MNIST dataset as NumPy arrays
dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Preprocess the data
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255
Train and Evaluate
Simply replace typical keras optimizers with the new tfa optimizer
# Compile the model
model.compile(
optimizer=tfa.optimizers.LazyAdam(0.001), # Utilize TFA optimizer
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
# Train the network
history = model.fit(
x_train,
y_train,
batch_size=batch_size,
epochs=epochs)
Epoch 1/10 938/938 [==============================] - 2s 2ms/step - loss: 0.3141 - accuracy: 0.9086 Epoch 2/10 938/938 [==============================] - 2s 2ms/step - loss: 0.1447 - accuracy: 0.9574 Epoch 3/10 938/938 [==============================] - 2s 2ms/step - loss: 0.1064 - accuracy: 0.9681 Epoch 4/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0835 - accuracy: 0.9751 Epoch 5/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0665 - accuracy: 0.9798 Epoch 6/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0566 - accuracy: 0.9827 Epoch 7/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0472 - accuracy: 0.9852 Epoch 8/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0412 - accuracy: 0.9869 Epoch 9/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0353 - accuracy: 0.9890 Epoch 10/10 938/938 [==============================] - 2s 2ms/step - loss: 0.0320 - accuracy: 0.9901
# Evaluate the network
print('Evaluate on test data:')
results = model.evaluate(x_test, y_test, batch_size=128, verbose = 2)
print('Test loss = {0}, Test acc: {1}'.format(results[0], results[1]))
Evaluate on test data: 79/79 - 0s - loss: 0.0990 - accuracy: 0.9741 - 236ms/epoch - 3ms/step Test loss = 0.09902079403400421, Test acc: 0.9740999937057495