View source on GitHub |
Actor.
tf_agents.train.Actor(
env,
policy,
train_step,
steps_per_run=None,
episodes_per_run=None,
observers=None,
transition_observers=None,
info_observers=None,
metrics=None,
reference_metrics=None,
image_metrics=None,
summary_dir=None,
summary_interval=1000,
end_episode_on_boundary=True,
name=''
)
Used in the notebooks
Used in the tutorials |
---|
The actor manages interactions between a policy and an environment. Users should configure the metrics and summaries for a specific task like evaluation or data collection.
The main point of access for users is the run
method. This will iterate
over either n steps_per_run
or episodes_per_run
. At least one of
steps_per_run
or episodes_per_run
must be provided.
Args | |
---|---|
env
|
An instance of either a tf or py environment. Note the policy, and observers should match the tf/pyness of the env. |
policy
|
An instance of a policy used to interact with the environment. |
train_step
|
A scalar tf.int64 tf.Variable which will keep track of the
number of train steps. This is used for artifacts created like
summaries.
|
steps_per_run
|
Number of steps to evaluated per run call. See below. |
episodes_per_run
|
Number of episodes evaluated per run call. |
observers
|
A list of observers that are notified after every step in the environment. Each observer is a callable(trajectory.Trajectory). |
transition_observers
|
A list of observers that are updated after every step in the environment. Each observer is a callable((TimeStep, PolicyStep, NextTimeStep)). The transition is shaped just as trajectories are for regular observers. |
info_observers
|
A list of observers that are notified after every step in the environment. Each observer is a callable(info). |
metrics
|
A list of metric observers that output a scaler. |
reference_metrics
|
Optional list of metrics for which other metrics are plotted against. As an example passing in a metric that tracks number of environment episodes will result in having summaries of all other metrics over this value. Note summaries against the train_step are done by default. If you want reference_metrics to be updated make sure they are also added to the metrics list. |
image_metrics
|
A list of metric observers that output an image. |
summary_dir
|
Path used for summaries. If no path is provided no summaries are written. |
summary_interval
|
How often summaries are written. |
end_episode_on_boundary
|
This parameter should be False when using transition observers and be True when using trajectory observers. It is used in py_driver. |
name
|
Name for the actor used as a prefix to generated summaries. |
Attributes | |
---|---|
image_metrics
|
|
metrics
|
|
policy
|
|
summary_writer
|
|
train_step
|
Methods
log_metrics
log_metrics()
Logs metric results to stdout.
reset
reset()
Reset the environment to the start and the policy state.
run
run()
run_and_log
run_and_log()
write_metric_summaries
write_metric_summaries()
Generates scalar summaries for the actor metrics.