TensorFlow.org에서 보기 | Google Colab에서 실행하기 | GitHub에서 소스 보기 | 노트북 다운로드하기 |
개요
이 노트북은 TensorFlow Addons 패키지의 모델 평균 체크포인트와 함께 Moving Average Optimizer를 사용하는 방법을 보여줍니다.
이동 평균
이동 평균의 장점은 최신 배치에서 급격한 손실 이동이나 불규칙한 데이터 표현에 덜 취약하다는 것입니다. 어느 시점까지는 모델 훈련에 대한 좀 더 일반적인 아이디어를 제공합니다.
확률적 평균
Stocastic Weight Averaging(SWA)은 더 넓은 최적값으로 수렴됩니다. 기하학적 앙상블링과 비슷하게 됩니다. SWA는 다른 옵티마이저의 래퍼로 사용될 때 모델 성능을 개선하고 내부 옵티마이저의 서로 다른 궤적 포인트에서 결과를 평균화하는 간단한 방법입니다.
모델 평균 체크포인트
callbacks.ModelCheckpoint
는 훈련 중에 이동 평균 가중치를 저장하는 옵션을 제공하지 않습니다. 따라서 Moving Average Optimizer에서 사용자 정의 콜백이 필요합니다.update_weights
매개변수와ModelAverageCheckpoint
를 사용하면 다음이 가능합니다.
- 이동 평균 가중치를 모델에 할당하고 저장합니다.
- 이전의 평균이 아닌 가중치를 유지하지만, 저장된 모델은 평균 가중치를 사용합니다.
설정
pip install -q -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
import os
모델 빌드하기
def create_model(opt):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
데이터세트 준비하기
#Load Fashion MNIST dataset
train, test = tf.keras.datasets.fashion_mnist.load_data()
images, labels = train
images = images/255.0
labels = labels.astype(np.int32)
fmnist_train_ds = tf.data.Dataset.from_tensor_slices((images, labels))
fmnist_train_ds = fmnist_train_ds.shuffle(5000).batch(32)
test_images, test_labels = test
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 32768/29515 [=================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26427392/26421880 [==============================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 8192/5148 [===============================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4423680/4422102 [==============================] - 0s 0us/step
여기에서는 3가지 옵티마이저를 비교할 것입니다.
- 래핑되지 않은 SGD
- 이동 평균을 사용하는 SGD
- 확률적 가중치 평균을 사용하는 SGD
같은 모델에서 어떻게 동작하는지 확인합니다.
#Optimizers
sgd = tf.keras.optimizers.SGD(0.01)
moving_avg_sgd = tfa.optimizers.MovingAverage(sgd)
stocastic_avg_sgd = tfa.optimizers.SWA(sgd)
MovingAverage
및 StocasticAverage
옵티마이저 모두 ModelAverageCheckpoint
를 사용합니다.
#Callback
checkpoint_path = "./training/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_dir,
save_weights_only=True,
verbose=1)
avg_callback = tfa.callbacks.AverageModelCheckpoint(filepath=checkpoint_dir,
update_weights=True)
모델 훈련하기
Vanilla SGD 옵티마이저
#Build Model
model = create_model(sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[cp_callback])
Epoch 1/5 1875/1875 [==============================] - 4s 2ms/step - loss: 1.0748 - accuracy: 0.6571 Epoch 00001: saving model to ./training Epoch 2/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.5133 - accuracy: 0.8224 Epoch 00002: saving model to ./training Epoch 3/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4605 - accuracy: 0.8380 Epoch 00003: saving model to ./training Epoch 4/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4315 - accuracy: 0.8469 Epoch 00004: saving model to ./training Epoch 5/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4078 - accuracy: 0.8563 Epoch 00005: saving model to ./training <tensorflow.python.keras.callbacks.History at 0x7fb839e9fd68>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019 Loss : 79.60128021240234 Accuracy : 0.8019000291824341
이동 평균 SGD
#Build Model
model = create_model(moving_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 5s 2ms/step - loss: 1.1034 - accuracy: 0.6502 INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.5254 - accuracy: 0.8154 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4718 - accuracy: 0.8335 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4436 - accuracy: 0.8423 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 4s 2ms/step - loss: 0.4221 - accuracy: 0.8531 INFO:tensorflow:Assets written to: ./training/assets <tensorflow.python.keras.callbacks.History at 0x7fb839f0d630>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019 Loss : 79.60128021240234 Accuracy : 0.8019000291824341
확률적 가중치 평균 SGD
#Build Model
model = create_model(stocastic_avg_sgd)
#Train the network
model.fit(fmnist_train_ds, epochs=5, callbacks=[avg_callback])
Epoch 1/5 1875/1875 [==============================] - 6s 3ms/step - loss: 1.1160 - accuracy: 0.6463 INFO:tensorflow:Assets written to: ./training/assets Epoch 2/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.6035 - accuracy: 0.7968 INFO:tensorflow:Assets written to: ./training/assets Epoch 3/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.5594 - accuracy: 0.8102 INFO:tensorflow:Assets written to: ./training/assets Epoch 4/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.5365 - accuracy: 0.8170 INFO:tensorflow:Assets written to: ./training/assets Epoch 5/5 1875/1875 [==============================] - 5s 3ms/step - loss: 0.5239 - accuracy: 0.8199 INFO:tensorflow:Assets written to: ./training/assets <tensorflow.python.keras.callbacks.History at 0x7fb7ac51dac8>
#Evalute results
model.load_weights(checkpoint_dir)
loss, accuracy = model.evaluate(test_images, test_labels, batch_size=32, verbose=2)
print("Loss :", loss)
print("Accuracy :", accuracy)
313/313 - 0s - loss: 79.6013 - accuracy: 0.8019 Loss : 79.60128021240234 Accuracy : 0.8019000291824341