View source on GitHub |
Session-like object that handles initialization, recovery and hooks.
tf.compat.v1.train.MonitoredSession(
session_creator=None, hooks=None, stop_grace_period_secs=120
)
Example usage:
saver_hook = CheckpointSaverHook(...)
summary_hook = SummarySaverHook(...)
with MonitoredSession(session_creator=ChiefSessionCreator(...),
hooks=[saver_hook, summary_hook]) as sess:
while not sess.should_stop():
sess.run(train_op)
Initialization: At creation time the monitored session does following things in given order:
- calls
hook.begin()
for each given hook - finalizes the graph via
scaffold.finalize()
- create session
- initializes the model via initialization ops provided by
Scaffold
- restores variables if a checkpoint exists
- launches queue runners
- calls
hook.after_create_session()
Run: When run()
is called, the monitored session does following things:
- calls
hook.before_run()
- calls TensorFlow
session.run()
with merged fetches and feed_dict - calls
hook.after_run()
- returns result of
session.run()
asked by user - if
AbortedError
orUnavailableError
occurs, it recovers or reinitializes the session before executing the run() call again
Exit: At the close()
, the monitored session does following things in order:
- calls
hook.end()
- closes the queue runners and the session
- suppresses
OutOfRange
error which indicates that all inputs have been processed if the monitored_session is used as a context
How to set tf.compat.v1.Session
arguments:
- In most cases you can set session arguments as follows:
MonitoredSession(
session_creator=ChiefSessionCreator(master=..., config=...))
- In distributed setting for a non-chief worker, you can use following:
MonitoredSession(
session_creator=WorkerSessionCreator(master=..., config=...))
See MonitoredTrainingSession
for an example usage based on chief or worker.
- it cannot be set as default session.
- it cannot be sent to saver.save.
- it cannot be sent to tf.train.start_queue_runners.
Args | |
---|---|
session_creator
|
A factory object to create session. Typically a
ChiefSessionCreator which is the default one.
|
hooks
|
An iterable of `SessionRunHook' objects. |
Returns | |
---|---|
A MonitoredSession object. |
Args | |
---|---|
session_creator
|
A factory object to create session. Typically a
ChiefSessionCreator or a WorkerSessionCreator .
|
hooks
|
An iterable of SessionRunHook' objects.
</td>
</tr><tr>
<td> should_recover</td>
<td>
A bool. Indicates whether to recover from AbortedErrorand UnavailableErroror not.
</td>
</tr><tr>
<td> stop_grace_period_secs</td>
<td>
Number of seconds given to threads to stop after close()` has been called.
|
Attributes | |
---|---|
graph
|
The graph that was launched in this session. |
Child Classes
Methods
close
close()
run
run(
fetches, feed_dict=None, options=None, run_metadata=None
)
Run ops in the monitored session.
This method is completely compatible with the tf.Session.run()
method.
Args | |
---|---|
fetches
|
Same as tf.Session.run() .
|
feed_dict
|
Same as tf.Session.run() .
|
options
|
Same as tf.Session.run() .
|
run_metadata
|
Same as tf.Session.run() .
|
Returns | |
---|---|
Same as tf.Session.run() .
|
run_step_fn
run_step_fn(
step_fn
)
Run ops using a step function.
Args | |
---|---|
step_fn
|
A function or a method with a single argument of type
StepContext . The function may use methods of the argument to perform
computations with access to a raw session. The returned value of the
step_fn will be returned from run_step_fn , unless a stop is
requested. In that case, the next should_stop call will return True.
Example usage:
Hooks interact with the |
Returns | |
---|---|
Returns the returned value of step_fn .
|
Raises | |
---|---|
StopIteration
|
if step_fn has called request_stop() . It may be
caught by with tf.MonitoredSession() to close the session.
|
ValueError
|
if step_fn doesn't have a single argument called
step_context . It may also optionally have self for cases when it
belongs to an object.
|
should_stop
should_stop()
__enter__
__enter__()
__exit__
__exit__(
exception_type, exception_value, traceback
)