Note that in dense implementation of this algorithm, ms and mom will
update even if the grad is zero, but in this sparse implementation, ms
and mom will not update in iterations during which the grad is zero.
ms <- rho * ms{t-1} + (1-rho) * grad * grad
mom <- momentum * mom{t-1} + lr * grad / sqrt(ms + epsilon)
var <- var - mom
Args
var
A Tensor of type resource. Should be from a Variable().
ms
A Tensor of type resource. Should be from a Variable().
mom
A Tensor of type resource. Should be from a Variable().
lr
A Tensor. Must be one of the following types: float32, float64, int32, uint8, int16, int8, complex64, int64, qint8, quint8, qint32, bfloat16, qint16, quint16, uint16, complex128, half, uint32, uint64.
Scaling factor. Must be a scalar.
rho
A Tensor. Must have the same type as lr.
Decay rate. Must be a scalar.
momentum
A Tensor. Must have the same type as lr.
epsilon
A Tensor. Must have the same type as lr.
Ridge term. Must be a scalar.
grad
A Tensor. Must have the same type as lr. The gradient.
use_locking
An optional bool. Defaults to False.
If True, updating of the var, ms, and mom tensors is protected
by a lock; otherwise the behavior is undefined, but may exhibit less
contention.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-01-23 UTC."],[],[]]