TensorFlow.org で表示 | Google Colab で実行 | GitHub でソースを表示 | ノートブックをダウンロード |
TensorFlow 1 でトレーニングの動作をカスタマイズするには、tf.estimator.Estimator
で tf.estimator.SessionRunHook
を使用します。このガイドでは、SessionRunHook
から tf.keras.callbacks.Callback
API を使用して TensorFlow 2 のカスタムコールバックに移行する方法を示します。これは、トレーニングのために Keras Model.fit
(Model.evaluate
および Model.predict
も)と使用できます。この方法を学習するために、トレーニング時に 1 秒あたりのサンプルを測定する SessionRunHook
と Callback
タスクを実装します。
コールバックの例は、チェックポイントの保存 (tf.keras.callbacks.ModelCheckpoint
)と TensorBoard の要約の書き込みです。Keras コールバックは、組み込みの Keras Model.fit
/Model.evaluate
/Model.predict
API のトレーニング/評価/予測時にさまざまな時点で呼び出されるオブジェクトです。コールバックの詳細については、tf.keras.callbacks.Callback
API ドキュメント、および独自のコールバックの作成と組み込みメソッドを使用したトレーニングと評価(コールバックの使用セクション)ガイドを参照してください。
セットアップ
まず、インポートとデモ用の単純なデータセットから始めます。
import tensorflow as tf
import tensorflow.compat.v1 as tf1
import time
from datetime import datetime
from absl import flags
2022-12-14 22:33:11.339837: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-14 22:33:11.339931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-14 22:33:11.339940: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
features = [[1., 1.5], [2., 2.5], [3., 3.5]]
labels = [[0.3], [0.5], [0.7]]
eval_features = [[4., 4.5], [5., 5.5], [6., 6.5]]
eval_labels = [[0.8], [0.9], [1.]]
TensorFlow 1: tf.estimator API を使用してカスタム SessionRunHook を作成する
次の TensorFlow 1 の例は、トレーニング時に 1 秒あたりのサンプルを測定するカスタム SessionRunHook
を設定する方法を示しています。フック (LoggerHook
) を作成し、tf.estimator.Estimator.train
の hooks
パラメータに渡します。
def _input_fn():
return tf1.data.Dataset.from_tensor_slices(
(features, labels)).batch(1).repeat(100)
def _model_fn(features, labels, mode):
logits = tf1.layers.Dense(1)(features)
loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)
optimizer = tf1.train.AdagradOptimizer(0.05)
train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())
return tf1.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
class LoggerHook(tf1.train.SessionRunHook):
"""Logs loss and runtime."""
def begin(self):
self._step = -1
self._start_time = time.time()
self.log_frequency = 10
def before_run(self, run_context):
self._step += 1
def after_run(self, run_context, run_values):
if self._step % self.log_frequency == 0:
current_time = time.time()
duration = current_time - self._start_time
self._start_time = current_time
examples_per_sec = self.log_frequency / duration
print('Time:', datetime.now(), ', Step #:', self._step,
', Examples per second:', examples_per_sec)
estimator = tf1.estimator.Estimator(model_fn=_model_fn)
# Begin training.
estimator.train(_input_fn, hooks=[LoggerHook()])
INFO:tensorflow:Using default config. WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmp26pzfs8q INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmp26pzfs8q', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/training_util.py:396: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. INFO:tensorflow:Calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/adagrad.py:138: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmp26pzfs8q/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... Time: 2022-12-14 22:33:16.456983 , Step #: 0 , Examples per second: 2.627821170997297 INFO:tensorflow:loss = 1.4186838, step = 0 Time: 2022-12-14 22:33:16.490084 , Step #: 10 , Examples per second: 302.08099562828147 Time: 2022-12-14 22:33:16.496783 , Step #: 20 , Examples per second: 1492.63487544484 Time: 2022-12-14 22:33:16.503836 , Step #: 30 , Examples per second: 1417.6651118772393 Time: 2022-12-14 22:33:16.510666 , Step #: 40 , Examples per second: 1464.1848774698037 Time: 2022-12-14 22:33:16.517181 , Step #: 50 , Examples per second: 1534.8571010356059 Time: 2022-12-14 22:33:16.524040 , Step #: 60 , Examples per second: 1457.8235028327135 Time: 2022-12-14 22:33:16.530655 , Step #: 70 , Examples per second: 1511.7877739331027 Time: 2022-12-14 22:33:16.537104 , Step #: 80 , Examples per second: 1550.7464783524974 Time: 2022-12-14 22:33:16.543900 , Step #: 90 , Examples per second: 1471.324236152524 INFO:tensorflow:global_step/sec: 1059.29 Time: 2022-12-14 22:33:16.552185 , Step #: 100 , Examples per second: 1207.0634281109703 INFO:tensorflow:loss = 0.00024032235, step = 100 (0.095 sec) Time: 2022-12-14 22:33:16.559951 , Step #: 110 , Examples per second: 1287.5442043222004 Time: 2022-12-14 22:33:16.566704 , Step #: 120 , Examples per second: 1480.9349622201821 Time: 2022-12-14 22:33:16.573589 , Step #: 130 , Examples per second: 1452.3213296398892 Time: 2022-12-14 22:33:16.580484 , Step #: 140 , Examples per second: 1450.4128916245936 Time: 2022-12-14 22:33:16.587762 , Step #: 150 , Examples per second: 1374.0103518312258 Time: 2022-12-14 22:33:16.594642 , Step #: 160 , Examples per second: 1453.529248683116 Time: 2022-12-14 22:33:16.601982 , Step #: 170 , Examples per second: 1362.4505440961507 Time: 2022-12-14 22:33:16.609029 , Step #: 180 , Examples per second: 1418.864043841548 Time: 2022-12-14 22:33:16.616247 , Step #: 190 , Examples per second: 1385.4475787804718 INFO:tensorflow:global_step/sec: 1373.04 Time: 2022-12-14 22:33:16.624774 , Step #: 200 , Examples per second: 1172.8054134161005 INFO:tensorflow:loss = 0.0048819412, step = 200 (0.073 sec) Time: 2022-12-14 22:33:16.632616 , Step #: 210 , Examples per second: 1275.213280228634 Time: 2022-12-14 22:33:16.639508 , Step #: 220 , Examples per second: 1450.964818210122 Time: 2022-12-14 22:33:16.646278 , Step #: 230 , Examples per second: 1477.0236292566117 Time: 2022-12-14 22:33:16.653058 , Step #: 240 , Examples per second: 1474.8422940328421 Time: 2022-12-14 22:33:16.660016 , Step #: 250 , Examples per second: 1437.2422300654491 Time: 2022-12-14 22:33:16.666939 , Step #: 260 , Examples per second: 1444.5682796624762 Time: 2022-12-14 22:33:16.673454 , Step #: 270 , Examples per second: 1534.80093676815 Time: 2022-12-14 22:33:16.680179 , Step #: 280 , Examples per second: 1487.1309034179549 Time: 2022-12-14 22:33:16.687003 , Step #: 290 , Examples per second: 1465.25903930131 INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 300... INFO:tensorflow:Saving checkpoints for 300 into /tmpfs/tmp/tmp26pzfs8q/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 300... INFO:tensorflow:Loss for final step: 0.004006439. <tensorflow_estimator.python.estimator.estimator.Estimator at 0x7f6db86372b0>
TensorFlow 2: Model.fit のカスタム Keras コールバックを作成する
TensorFlow 2 では、組み込みの Keras Model.fit
(または Model.evaluate
)をトレーニング/評価に使用する場合、カスタム tf.keras.callbacks.Callback
を構成し、Model.fit
(または Model.evaluate
)の callbacks
パラメータに渡します。(詳細については、独自のコールバックの作成ガイドを参照してください)。
以下の例では、さまざまな指標をログに記録するカスタム tf.keras.callbacks.Callback
を記述します。これは 1 秒あたりのサンプルを測定します。これは、前の SessionRunHook
のサンプルの指標と同様になるはずです。
class CustomCallback(tf.keras.callbacks.Callback):
def on_train_begin(self, logs = None):
self._step = -1
self._start_time = time.time()
self.log_frequency = 10
def on_train_batch_begin(self, batch, logs = None):
self._step += 1
def on_train_batch_end(self, batch, logs = None):
if self._step % self.log_frequency == 0:
current_time = time.time()
duration = current_time - self._start_time
self._start_time = current_time
examples_per_sec = self.log_frequency / duration
print('Time:', datetime.now(), ', Step #:', self._step,
', Examples per second:', examples_per_sec)
callback = CustomCallback()
dataset = tf.data.Dataset.from_tensor_slices(
(features, labels)).batch(1).repeat(100)
model = tf.keras.models.Sequential([tf.keras.layers.Dense(1)])
optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)
model.compile(optimizer, "mse")
# Begin training.
result = model.fit(dataset, callbacks=[callback], verbose = 0)
# Provide the results of training metrics.
result.history
Time: 2022-12-14 22:33:17.621502 , Step #: 0 , Examples per second: 20.297883975561067 Time: 2022-12-14 22:33:17.639965 , Step #: 10 , Examples per second: 541.4939709261794 Time: 2022-12-14 22:33:17.656627 , Step #: 20 , Examples per second: 600.1379330080556 Time: 2022-12-14 22:33:17.673115 , Step #: 30 , Examples per second: 606.498929955463 Time: 2022-12-14 22:33:17.690284 , Step #: 40 , Examples per second: 582.4613248159977 Time: 2022-12-14 22:33:17.707047 , Step #: 50 , Examples per second: 596.527477528729 Time: 2022-12-14 22:33:17.723359 , Step #: 60 , Examples per second: 613.0589335827876 Time: 2022-12-14 22:33:17.739348 , Step #: 70 , Examples per second: 625.4367600131222 Time: 2022-12-14 22:33:17.756666 , Step #: 80 , Examples per second: 577.4255899116165 Time: 2022-12-14 22:33:17.772840 , Step #: 90 , Examples per second: 618.273264641283 Time: 2022-12-14 22:33:17.789366 , Step #: 100 , Examples per second: 605.1338873499539 Time: 2022-12-14 22:33:17.806987 , Step #: 110 , Examples per second: 567.4880259775402 Time: 2022-12-14 22:33:17.824667 , Step #: 120 , Examples per second: 565.6207352266904 Time: 2022-12-14 22:33:17.841617 , Step #: 130 , Examples per second: 589.9825578124121 Time: 2022-12-14 22:33:17.858799 , Step #: 140 , Examples per second: 582.00063829492 Time: 2022-12-14 22:33:17.875624 , Step #: 150 , Examples per second: 594.3466062066034 Time: 2022-12-14 22:33:17.892795 , Step #: 160 , Examples per second: 582.3561917720729 Time: 2022-12-14 22:33:17.909760 , Step #: 170 , Examples per second: 589.468476824915 Time: 2022-12-14 22:33:17.926615 , Step #: 180 , Examples per second: 593.2957069099654 Time: 2022-12-14 22:33:17.943153 , Step #: 190 , Examples per second: 604.6453696228808 Time: 2022-12-14 22:33:17.959402 , Step #: 200 , Examples per second: 615.4247061758103 Time: 2022-12-14 22:33:17.975720 , Step #: 210 , Examples per second: 612.8260424885304 Time: 2022-12-14 22:33:17.991463 , Step #: 220 , Examples per second: 635.2118733908829 Time: 2022-12-14 22:33:18.007972 , Step #: 230 , Examples per second: 605.7106547670624 Time: 2022-12-14 22:33:18.024608 , Step #: 240 , Examples per second: 601.1184521676819 Time: 2022-12-14 22:33:18.041785 , Step #: 250 , Examples per second: 582.1864416190106 Time: 2022-12-14 22:33:18.059335 , Step #: 260 , Examples per second: 569.793101574493 Time: 2022-12-14 22:33:18.075667 , Step #: 270 , Examples per second: 612.3250313877777 Time: 2022-12-14 22:33:18.092859 , Step #: 280 , Examples per second: 581.6455187142045 Time: 2022-12-14 22:33:18.110883 , Step #: 290 , Examples per second: 554.8241332328002 {'loss': [1.8442449569702148]}
次のステップ
コールバックの詳細については、次を参照してください。
- API ドキュメント:
tf.keras.callbacks.Callback
- ガイド: コールバックを記述する
- ガイド: 組み込みメソッドを使用したトレーニングと評価(コールバックの使用セクション)
次の移行関連のリソースも参照してください。
- 早期停止移行ガイド:
tf.keras.callbacks.EarlyStopping
は組み込みの早期停止コールバックです - TensorBoard 移行ガイド: TensorBoard により、指標の追跡と表示が可能になります
- LoggingTensorHook と StopAtStepHook から Keras コールバックへの移行ガイド