Uśrednianie modelu

Zobacz na TensorFlow.org Uruchom w Google Colab Wyświetl źródło na GitHub Pobierz notatnik

Przegląd

Ten notatnik pokazuje, jak używać Moving Average Optimizer wraz z pakietem Model Average Checkpoint z pakietu dodatków tensorflow.

Średnia krocząca

Zaletą uśredniania kroczącego jest to, że są one mniej podatne na gwałtowne zmiany strat lub nieregularną reprezentację danych w najnowszej partii. Daje to wygładzone i bardziej ogólne wyobrażenie o treningu modelu aż do pewnego momentu.

Uśrednianie stochastyczne

Stochastyczne uśrednianie wagi zbiega się do szerszego optima. W ten sposób przypomina układanie geometryczne. SWA to prosta metoda poprawy wydajności modelu, gdy jest używana jako opakowanie wokół innych optymalizatorów i uśrednianie wyników z różnych punktów trajektorii wewnętrznego optymalizatora.

Model Średni Punkt Kontrolny

callbacks.ModelCheckpoint nie daje możliwość zapisywania ruchomych średnie ciężary w środku treningu, dlatego model Średnie optymalizacja wymaga niestandardowego zwrotnego. Korzystanie z update_weights parametru ModelAverageCheckpoint pozwala na:

  1. Przypisz do modelu wagi średniej ruchomej i zapisz je.
  2. Zachowaj stare nieśrednie wagi, ale zapisany model używa wag średnich.

Ustawiać

pip install -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os

Zbuduj model

def create_model(opt):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(),                         
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=opt,
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy'])

    return model

Przygotuj zbiór danych

#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()

images, labels = train
images = images/255.0
labels = labels.astype(np.int32)

fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)

test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
40960/29515 [=========================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
26435584/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
16384/5148 [===============================================================================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
4431872/4422102 [==============================] - 0s 0us/step

Porównamy tutaj trzy optymalizatory:

  • Nieopakowane SGD
  • SGD ze średnią kroczącą
  • SGD ze stochastycznym uśrednianiem wagi

I zobacz, jak radzą sobie z tym samym modelem.

#Optimizers 
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)

Oba MovingAverage i StocasticAverage optimers używać ModelAverageCheckpoint .

#Callback 
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
                                                 save_weights_only=True,
                                                 verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir, 
                                                    update_weights=True)

Model pociągu

Optymalizator waniliowy SGD

#Build Model
model = create_model(sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.8031 - accuracy: 0.7282

Epoch 00001: saving model to ./training
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.5049 - accuracy: 0.8240

Epoch 00002: saving model to ./training
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.4591 - accuracy: 0.8375

Epoch 00003: saving model to ./training
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.4328 - accuracy: 0.8492

Epoch 00004: saving model to ./training
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.4128 - accuracy: 0.8561

Epoch 00005: saving model to ./training
<keras.callbacks.History at 0x7fc9d0262250>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796
Loss : 95.46446990966797
Accuracy : 0.7796000242233276

Średnia krocząca SGD

#Build Model
model = create_model(moving_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.8064 - accuracy: 0.7303
2021-09-02 00:35:29.787996: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: ./training/assets
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.5114 - accuracy: 0.8223
INFO:tensorflow:Assets written to: ./training/assets
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4620 - accuracy: 0.8382
INFO:tensorflow:Assets written to: ./training/assets
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4345 - accuracy: 0.8470
INFO:tensorflow:Assets written to: ./training/assets
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4146 - accuracy: 0.8547
INFO:tensorflow:Assets written to: ./training/assets
<keras.callbacks.History at 0x7fc8e16f30d0>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796
Loss : 95.46446990966797
Accuracy : 0.7796000242233276

Średnia waga Stokastyczna SGD

#Build Model
model = create_model(stocastic_avg_sgd)

#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.7896 - accuracy: 0.7350
INFO:tensorflow:Assets written to: ./training/assets
Epoch 2/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.5670 - accuracy: 0.8065
INFO:tensorflow:Assets written to: ./training/assets
Epoch 3/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.5345 - accuracy: 0.8142
INFO:tensorflow:Assets written to: ./training/assets
Epoch 4/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.5194 - accuracy: 0.8188
INFO:tensorflow:Assets written to: ./training/assets
Epoch 5/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.5089 - accuracy: 0.8235
INFO:tensorflow:Assets written to: ./training/assets
<keras.callbacks.History at 0x7fc8e0538790>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 95.4645 - accuracy: 0.7796
Loss : 95.46446990966797
Accuracy : 0.7796000242233276