TensorFlow.org에서 보기 | Google Colab에서 실행하기 | GitHub에서 소스 보기 | 노트북 다운로드하기 |
개요
이 노트북은 애드온 패키지에서 Conditional Graident 옵티마이저를 사용하는 방법을 보여줍니다.
ConditionalGradient
신경망의 매개변수를 제한하면 기본적인 정규화 효과로 인해 훈련에 유익한 것으로 나타났습니다. 종종 매개변수는 소프트 페널티(제약 조건 만족을 보장하지 않음) 또는 프로젝션 연산(계산적으로 비쌈)을 통해 제한됩니다. 반면에 CG(Conditional Gradient) 옵티마이저는 값 비싼 프로젝션 단계 없이 제약 조건을 엄격하게 적용합니다. 제약 조건 세트 내에서 목표의 선형 근사치를 최소화하여 동작합니다. 이 노트북의 MNIST 데이터세트에서 CG 옵티마이저를 통해 Frobenius norm 제약 조건의 적용을 보여줍니다. CG는 이제 tensorflow API로 사용 가능합니다. 옵티마이저에 대한 자세한 내용은 https://arxiv.org/pdf/1803.06453.pdf를 참조하세요.
설정
pip install -q -U tensorflow-addons
import tensorflow as tf
import tensorflow_addons as tfa
from matplotlib import pyplot as plt
# Hyperparameters
batch_size=64
epochs=10
모델 빌드하기
model_1 = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])
데이터 준비하기
# Load MNIST dataset as NumPy arrays
dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Preprocess the data
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255
사용자 정의 콜백 함수 정의하기
def frobenius_norm(m):
"""This function is to calculate the frobenius norm of the matrix of all
layer's weight.
Args:
m: is a list of weights param for each layers.
"""
total_reduce_sum = 0
for i in range(len(m)):
total_reduce_sum = total_reduce_sum + tf.math.reduce_sum(m[i]**2)
norm = total_reduce_sum**0.5
return norm
CG_frobenius_norm_of_weight = []
CG_get_weight_norm = tf.keras.callbacks.LambdaCallback(
on_epoch_end=lambda batch, logs: CG_frobenius_norm_of_weight.append(
frobenius_norm(model_1.trainable_weights).numpy()))
훈련 및 평가: CG를 옵티마이저로 사용하기
일반적인 keras 옵티마이저를 새로운 tfa 옵티마이저로 간단히 교체합니다.
# Compile the model
model_1.compile(
optimizer=tfa.optimizers.ConditionalGradient(
learning_rate=0.99949, lambda_=203), # Utilize TFA optimizer
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
history_cg = model_1.fit(
x_train,
y_train,
batch_size=batch_size,
validation_data=(x_test, y_test),
epochs=epochs,
callbacks=[CG_get_weight_norm])
Epoch 1/10 938/938 [==============================] - 3s 3ms/step - loss: 0.3775 - accuracy: 0.8859 - val_loss: 0.2121 - val_accuracy: 0.9358 Epoch 2/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1916 - accuracy: 0.9423 - val_loss: 0.1583 - val_accuracy: 0.9516 Epoch 3/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1533 - accuracy: 0.9540 - val_loss: 0.1763 - val_accuracy: 0.9428 Epoch 4/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1347 - accuracy: 0.9595 - val_loss: 0.1292 - val_accuracy: 0.9601 Epoch 5/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1226 - accuracy: 0.9627 - val_loss: 0.1129 - val_accuracy: 0.9661 Epoch 6/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1164 - accuracy: 0.9639 - val_loss: 0.1418 - val_accuracy: 0.9586 Epoch 7/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1112 - accuracy: 0.9659 - val_loss: 0.1108 - val_accuracy: 0.9643 Epoch 8/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1089 - accuracy: 0.9666 - val_loss: 0.1114 - val_accuracy: 0.9675 Epoch 9/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1057 - accuracy: 0.9677 - val_loss: 0.1072 - val_accuracy: 0.9654 Epoch 10/10 938/938 [==============================] - 2s 3ms/step - loss: 0.1039 - accuracy: 0.9683 - val_loss: 0.1197 - val_accuracy: 0.9627
훈련 및 평가: SGD를 옵티마이저로 사용하기
model_2 = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])
SGD_frobenius_norm_of_weight = []
SGD_get_weight_norm = tf.keras.callbacks.LambdaCallback(
on_epoch_end=lambda batch, logs: SGD_frobenius_norm_of_weight.append(
frobenius_norm(model_2.trainable_weights).numpy()))
# Compile the model
model_2.compile(
optimizer=tf.keras.optimizers.SGD(0.01), # Utilize SGD optimizer
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
history_sgd = model_2.fit(
x_train,
y_train,
batch_size=batch_size,
validation_data=(x_test, y_test),
epochs=epochs,
callbacks=[SGD_get_weight_norm])
Epoch 1/10 938/938 [==============================] - 2s 2ms/step - loss: 1.0198 - accuracy: 0.7126 - val_loss: 0.4461 - val_accuracy: 0.8761 Epoch 2/10 938/938 [==============================] - 2s 2ms/step - loss: 0.3951 - accuracy: 0.8888 - val_loss: 0.3319 - val_accuracy: 0.9061 Epoch 3/10 938/938 [==============================] - 2s 2ms/step - loss: 0.3221 - accuracy: 0.9070 - val_loss: 0.2851 - val_accuracy: 0.9182 Epoch 4/10 938/938 [==============================] - 2s 2ms/step - loss: 0.2855 - accuracy: 0.9170 - val_loss: 0.2595 - val_accuracy: 0.9255 Epoch 5/10 938/938 [==============================] - 2s 2ms/step - loss: 0.2603 - accuracy: 0.9239 - val_loss: 0.2371 - val_accuracy: 0.9304 Epoch 6/10 938/938 [==============================] - 2s 2ms/step - loss: 0.2408 - accuracy: 0.9301 - val_loss: 0.2235 - val_accuracy: 0.9335 Epoch 7/10 938/938 [==============================] - 2s 2ms/step - loss: 0.2243 - accuracy: 0.9349 - val_loss: 0.2084 - val_accuracy: 0.9390 Epoch 8/10 938/938 [==============================] - 2s 2ms/step - loss: 0.2107 - accuracy: 0.9384 - val_loss: 0.1987 - val_accuracy: 0.9413 Epoch 9/10 938/938 [==============================] - 2s 2ms/step - loss: 0.1986 - accuracy: 0.9426 - val_loss: 0.1870 - val_accuracy: 0.9453 Epoch 10/10 938/938 [==============================] - 2s 2ms/step - loss: 0.1877 - accuracy: 0.9457 - val_loss: 0.1786 - val_accuracy: 0.9473
가중치의 Frobenius Norm: CG vs SGD
CG 옵티마이저의 현재 구현은 Frobenius Norm을 기반으로 하며 Frobenius Norm을 대상 함수의 regularizer로 간주합니다. 따라서 CG의 정규화 효과를 Frobenius Norm regularizer를 부과하지 않은 SGD 옵티마이저와 비교합니다.
plt.plot(
CG_frobenius_norm_of_weight,
color='r',
label='CG_frobenius_norm_of_weights')
plt.plot(
SGD_frobenius_norm_of_weight,
color='b',
label='SGD_frobenius_norm_of_weights')
plt.xlabel('Epoch')
plt.ylabel('Frobenius norm of weights')
plt.legend(loc=1)
<matplotlib.legend.Legend at 0x7fdf68259cc0>
훈련 및 검증 정확성: CG vs SGD
plt.plot(history_cg.history['accuracy'], color='r', label='CG_train')
plt.plot(history_cg.history['val_accuracy'], color='g', label='CG_test')
plt.plot(history_sgd.history['accuracy'], color='pink', label='SGD_train')
plt.plot(history_sgd.history['val_accuracy'], color='b', label='SGD_test')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc=4)
<matplotlib.legend.Legend at 0x7fdf68582f98>