Ver no TensorFlow.org | Executar no Google Colab | Ver fonte no GitHub | Baixar caderno |
Visão geral
Este bloco de notas demonstra como usar o Moving Average Optimizer junto com o Model Average Checkpoint do pacote de complementos tensorflow.
Média móvel
A vantagem de Moving Averaging é que eles são menos propensos a mudanças de perda excessivas ou representação irregular de dados no lote mais recente. Dá uma ideia mais suave e geral do treinamento do modelo até certo ponto.
Média Estocástica
A Média de Peso Estocástica converge para ótimos mais amplos. Ao fazer isso, ele se assemelha a um conjunto geométrico. SWA é um método simples para melhorar o desempenho do modelo quando usado como um invólucro em torno de outros otimizadores e calculando a média dos resultados de diferentes pontos da trajetória do otimizador interno.
Ponto de verificação médio do modelo
callbacks.ModelCheckpoint
não lhe dá a opção de salvar movendo pesos médios no meio da formação, razão pela qual modelo de média Optimizers exigiu um retorno de chamada personalizada. Usando oupdate_weights
parâmetro,ModelAverageCheckpoint
permite-lhe:
- Atribua os pesos médios móveis ao modelo e salve-os.
- Mantenha os pesos não médios antigos, mas o modelo salvo usa os pesos médios.
Configurar
pip install -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os
Modelo de construção
def create_model(opt):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
Prepare o conjunto de dados
#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train
images = images/255.0
labels = labels.astype(np.int32)
fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)
test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 32768/29515 [=================================] - 0s 0us/step 40960/29515 [=========================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26427392/26421880 [==============================] - 0s 0us/step 26435584/26421880 [==============================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 16384/5148 [===============================================================================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4423680/4422102 [==============================] - 0s 0us/step 4431872/4422102 [==============================] - 0s 0us/step
Estaremos comparando três otimizadores aqui:
- SGD desembrulhado
- SGD com média móvel
- SGD com Média de Peso Estocástico
E veja como eles funcionam com o mesmo modelo.
#Optimizers
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)
Ambos os MovingAverage
e StocasticAverage
optimers usar ModelAverageCheckpoint
.
#Callback
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
save_weights_only=True,
verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir,
update_weights=True)
Modelo de trem
Vanilla SGD Optimizer
#Build Model
model = create_model(sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.8031 - accuracy: 0.7282 Epoch 00001: saving model to ./training Epoch 2/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.5049 - accuracy: 0.8240 Epoch 00002: saving model to ./training Epoch 3/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.4591 - accuracy: 0.8375 Epoch 00003: saving model to ./training Epoch 4/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.4328 - accuracy: 0.8492 Epoch 00004: saving model to ./training Epoch 5/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.4128 - accuracy: 0.8561 Epoch 00005: saving model to ./training <keras.callbacks.History at 0x7fc9d0262250>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796 Loss : 95.46446990966797 Accuracy : 0.7796000242233276
Média Móvel SGD
#Build Model
model = create_model(moving_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.8064 - accuracy: 0.7303 2021-09-02 00:35:29.787996: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.5114 - accuracy: 0.8223 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4620 - accuracy: 0.8382 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4345 - accuracy: 0.8470 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4146 - accuracy: 0.8547 INFO:tensorflow:Assets written to: ./training/assets <keras.callbacks.History at 0x7fc8e16f30d0>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796 Loss : 95.46446990966797 Accuracy : 0.7796000242233276
Peso estocástico SGD médio
#Build Model
model = create_model(stocastic_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.7896 - accuracy: 0.7350 INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5670 - accuracy: 0.8065 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5345 - accuracy: 0.8142 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5194 - accuracy: 0.8188 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 5s 2ms/step - loss: 0.5089 - accuracy: 0.8235 INFO:tensorflow:Assets written to: ./training/assets <keras.callbacks.History at 0x7fc8e0538790>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796 Loss : 95.46446990966797 Accuracy : 0.7796000242233276