Ver en TensorFlow.org | Ejecutar en Google Colab | Ver fuente en GitHub | Descargar cuaderno |
Visión general
Este cuaderno demuestra cómo usar el Optimizador de promedios móviles junto con el Punto de control promedio del modelo del paquete de complementos de tensorflow.
Promedio móvil
La ventaja de Moving Averaging es que son menos propensos a cambios de pérdida desenfrenados o representación de datos irregular en el último lote. Da una idea suavizada y más general del entrenamiento del modelo hasta cierto punto.
Promedio estocástico
El promedio de peso estocástico converge hacia óptimos más amplios. Al hacerlo, se asemeja a un conjunto geométrico. SWA es un método simple para mejorar el rendimiento del modelo cuando se usa como envoltorio alrededor de otros optimizadores y promediando los resultados de diferentes puntos de trayectoria del optimizador interno.
Punto de control promedio del modelo
callbacks.ModelCheckpoint
no le da la opción de guardar los pesos medios en movimiento en el medio de la formación, que es la razón por modelo de media optimizadores requiere una devolución de llamada personalizado. Utilizando elupdate_weights
parámetro,ModelAverageCheckpoint
le permite:
- Asigne los pesos medios móviles al modelo y guárdelos.
- Mantenga las ponderaciones antiguas no promediadas, pero el modelo guardado usa las ponderaciones promedio.
Configuración
pip install -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os
Modelo de construcción
def create_model(opt):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
Preparar conjunto de datos
#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train
images = images/255.0
labels = labels.astype(np.int32)
fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)
test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 32768/29515 [=================================] - 0s 0us/step 40960/29515 [=========================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26427392/26421880 [==============================] - 0s 0us/step 26435584/26421880 [==============================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 16384/5148 [===============================================================================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4423680/4422102 [==============================] - 0s 0us/step 4431872/4422102 [==============================] - 0s 0us/step
Compararemos tres optimizadores aquí:
- SGD sin envolver
- SGD con media móvil
- SGD con promedio de peso estocástico
Y mira cómo funcionan con el mismo modelo.
#Optimizers
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)
Ambos MovingAverage
y StocasticAverage
optimers utilizan ModelAverageCheckpoint
.
#Callback
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
save_weights_only=True,
verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir,
update_weights=True)
Modelo de tren
Optimizador de vainilla SGD
#Build Model
model = create_model(sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.8031 - accuracy: 0.7282 Epoch 00001: saving model to ./training Epoch 2/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.5049 - accuracy: 0.8240 Epoch 00002: saving model to ./training Epoch 3/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.4591 - accuracy: 0.8375 Epoch 00003: saving model to ./training Epoch 4/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.4328 - accuracy: 0.8492 Epoch 00004: saving model to ./training Epoch 5/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.4128 - accuracy: 0.8561 Epoch 00005: saving model to ./training <keras.callbacks.History at 0x7fc9d0262250>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796 Loss : 95.46446990966797 Accuracy : 0.7796000242233276
Promedio móvil SGD
#Build Model
model = create_model(moving_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.8064 - accuracy: 0.7303 2021-09-02 00:35:29.787996: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.5114 - accuracy: 0.8223 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4620 - accuracy: 0.8382 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4345 - accuracy: 0.8470 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4146 - accuracy: 0.8547 INFO:tensorflow:Assets written to: ./training/assets <keras.callbacks.History at 0x7fc8e16f30d0>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796 Loss : 95.46446990966797 Accuracy : 0.7796000242233276
Peso estocástico promedio SGD
#Build Model
model = create_model(stocastic_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.7896 - accuracy: 0.7350 INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5670 - accuracy: 0.8065 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5345 - accuracy: 0.8142 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5194 - accuracy: 0.8188 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5089 - accuracy: 0.8235 INFO:tensorflow:Assets written to: ./training/assets <keras.callbacks.History at 0x7fc8e0538790>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796 Loss : 95.46446990966797 Accuracy : 0.7796000242233276