Optimization parameters for Adam with TPU embeddings.
tf.compat.v1.tpu.experimental.AdamParameters(
learning_rate: float,
beta1: float = 0.9,
beta2: float = 0.999,
epsilon: float = 1e-08,
lazy_adam: bool = True,
sum_inside_sqrt: bool = True,
use_gradient_accumulation: bool = True,
clip_weight_min: Optional[float] = None,
clip_weight_max: Optional[float] = None,
weight_decay_factor: Optional[float] = None,
multiply_weight_decay_factor_by_learning_rate: Optional[bool] = None,
clip_gradient_min: Optional[float] = None,
clip_gradient_max: Optional[float] = None
)
Pass this to tf.estimator.tpu.experimental.EmbeddingConfigSpec
via the
optimization_parameters
argument to set the optimizer and its parameters.
See the documentation for tf.estimator.tpu.experimental.EmbeddingConfigSpec
for more details.
estimator = tf.estimator.tpu.TPUEstimator(
...
embedding_config_spec=tf.estimator.tpu.experimental.EmbeddingConfigSpec(
...
optimization_parameters=tf.tpu.experimental.AdamParameters(0.1),
...))
Args |
learning_rate
|
a floating point value. The learning rate.
|
beta1
|
A float value. The exponential decay rate for the 1st moment
estimates.
|
beta2
|
A float value. The exponential decay rate for the 2nd moment
estimates.
|
epsilon
|
A small constant for numerical stability.
|
lazy_adam
|
Use lazy Adam instead of Adam. Lazy Adam trains faster. See
optimization_parameters.proto for details.
|
sum_inside_sqrt
|
This improves training speed. Please see
optimization_parameters.proto for details.
|
use_gradient_accumulation
|
setting this to False makes embedding
gradients calculation less accurate but faster. Please see
optimization_parameters.proto for details.
|
clip_weight_min
|
the minimum value to clip by; None means -infinity.
|
clip_weight_max
|
the maximum value to clip by; None means +infinity.
|
weight_decay_factor
|
amount of weight decay to apply; None means that the
weights are not decayed.
|
multiply_weight_decay_factor_by_learning_rate
|
if true,
weight_decay_factor is multiplied by the current learning rate.
|
clip_gradient_min
|
the minimum value to clip by; None means -infinity.
Gradient accumulation must be set to true if this is set.
|
clip_gradient_max
|
the maximum value to clip by; None means +infinity.
Gradient accumulation must be set to true if this is set.
|