View source on GitHub |
NCCL all-reduce implementation of CrossDeviceOps.
Inherits From: CrossDeviceOps
tf.distribute.NcclAllReduce(
num_packs=1
)
It uses Nvidia NCCL for all-reduce. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.
For reduces that are not all-reduce, it falls back to
tf.distribute.ReductionToOneDevice
.
Here is how you can use NcclAllReduce
in tf.distribute.MirroredStrategy
:
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.NcclAllReduce())
Args | |
---|---|
num_packs
|
a non-negative integer. The number of packs to split values into. If zero, no packing will be done. |
Raises | |
---|---|
ValueError
|
if num_packs is negative.
|
Methods
batch_reduce
batch_reduce(
reduce_op, value_destination_pairs, options=None
)
Reduce values to destinations in batches.
See tf.distribute.StrategyExtended.batch_reduce_to
. This can only be
called in the cross-replica context.
Args | |
---|---|
reduce_op
|
a tf.distribute.ReduceOp specifying how values should be
combined.
|
value_destination_pairs
|
a sequence of (value, destinations) pairs. See
tf.distribute.CrossDeviceOps.reduce for descriptions.
|
options
|
a tf.distribute.experimental.CommunicationOptions . See
tf.distribute.experimental.CommunicationOptions for details.
|
Returns | |
---|---|
A list of tf.Tensor or tf.distribute.DistributedValues , one per pair
in value_destination_pairs .
|
Raises | |
---|---|
ValueError
|
if value_destination_pairs is not an iterable of
tuples of tf.distribute.DistributedValues and destinations.
|
broadcast
broadcast(
tensor, destinations
)
Broadcast tensor
to destinations
.
This can only be called in the cross-replica context.
Args | |
---|---|
tensor
|
a tf.Tensor like object. The value to broadcast.
|
destinations
|
a tf.distribute.DistributedValues , a tf.Variable , a
tf.Tensor alike object, or a device string. It specifies the devices
to broadcast to. Note that if it's a tf.Variable , the value is
broadcasted to the devices of that variable, this method doesn't update
the variable.
|
Returns | |
---|---|
A tf.Tensor or tf.distribute.DistributedValues .
|
reduce
reduce(
reduce_op, per_replica_value, destinations, options=None
)
Reduce per_replica_value
to destinations
.
See tf.distribute.StrategyExtended.reduce_to
. This can only be called in
the cross-replica context.
Args | |
---|---|
reduce_op
|
a tf.distribute.ReduceOp specifying how values should be
combined.
|
per_replica_value
|
a tf.distribute.DistributedValues , or a tf.Tensor
like object.
|
destinations
|
a tf.distribute.DistributedValues , a tf.Variable , a
tf.Tensor alike object, or a device string. It specifies the devices
to reduce to. To perform an all-reduce, pass the same to value and
destinations . Note that if it's a tf.Variable , the value is reduced
to the devices of that variable, and this method doesn't update the
variable.
|
options
|
a tf.distribute.experimental.CommunicationOptions . See
tf.distribute.experimental.CommunicationOptions for details.
|
Returns | |
---|---|
A tf.Tensor or tf.distribute.DistributedValues .
|
Raises | |
---|---|
ValueError
|
if per_replica_value can't be converted to a
tf.distribute.DistributedValues or if destinations is not a string,
tf.Variable or tf.distribute.DistributedValues .
|