Quantize the 'input' tensor of type float to 'output' tensor of type 'T'.
[min_range, max_range] are scalar floats that specify the range for the 'input' data. The 'mode' attribute controls exactly which calculations are used to convert the float values to their quantized equivalents. The 'round_mode' attribute controls which rounding tie-breaking algorithm is used when rounding float values to their quantized equivalents.
In 'MIN_COMBINED' mode, each value of the tensor will undergo the following:
out[i] = (in[i] - min_range) * range(T) / (max_range - min_range)
if T == qint8: out[i] -= (range(T) + 1) / 2.0
MIN_COMBINED Mode Example
Assume the input is type float and has a possible range of [0.0, 6.0] and the output type is quint8 ([0, 255]). The min_range and max_range values should be specified as 0.0 and 6.0. Quantizing from float to quint8 will multiply each value of the input by 255/6 and cast to quint8.
If the output type was qint8 ([-128, 127]), the operation will additionally subtract each value by 128 prior to casting, so that the range of values aligns with the range of qint8.
If the mode is 'MIN_FIRST', then this approach is used:
num_discrete_values = 1 << (# of bits in T)
range_adjust = num_discrete_values / (num_discrete_values - 1)
range = (range_max - range_min) * range_adjust
range_scale = num_discrete_values / range
quantized = round(input * range_scale) - round(range_min * range_scale) +
numeric_limits<T>::min()
quantized = max(quantized, numeric_limits<T>::min())
quantized = min(quantized, numeric_limits<T>::max())
SCALED mode Example
`SCALED` mode matches the quantization approach used in `QuantizeAndDequantize{V2|V3}`.
If the mode is `SCALED`, the quantization is performed by multiplying each input value by a scaling_factor. The scaling_factor is determined from `min_range` and `max_range` to be as large as possible such that the range from `min_range` to `max_range` is representable within values of type T.
const int min_T = std::numeric_limits<T>::min();
const int max_T = std::numeric_limits<T>::max();
const float max_float = std::numeric_limits<float>::max();
const float scale_factor_from_min_side =
(min_T * min_range > 0) ? min_T / min_range : max_float;
const float scale_factor_from_max_side =
(max_T * max_range > 0) ? max_T / max_range : max_float;
const float scale_factor = std::min(scale_factor_from_min_side,
scale_factor_from_max_side);
min_range = min_T / scale_factor;
max_range = max_T / scale_factor;
So we will quantize input values in the range (-10, 9.921875) to (-128, 127).
The input tensor can now be quantized by clipping values to the range `min_range` to `max_range`, then multiplying by scale_factor as follows:
result = round(min(max_range, max(min_range, input)) * scale_factor)
narrow_range (bool) attribute
If true, we do not use the minimum quantized value. i.e. for int8 the quantized output, it would be restricted to the range -127..127 instead of the full -128..127 range. This is provided for compatibility with certain inference backends. (Only applies to SCALED mode)
axis (int) attribute
An optional `axis` attribute can specify a dimension index of the input tensor, such that quantization ranges will be calculated and applied separately for each slice of the tensor along that dimension. This is useful for per-channel quantization.
If axis is specified, min_range and max_range
if `axis`=None, per-tensor quantization is performed as normal.
ensure_minimum_range (float) attribute
Ensures the minimum quantization range is at least this value. The legacy default value for this is 0.01, but it is strongly suggested to set it to 0 for new uses.
Nested Classes
class | Quantize.Options | Optional attributes for Quantize
|
Constants
String | OP_NAME | The name of this op, as known by TensorFlow core engine |
Public Methods
static Quantize.Options |
axis(Long axis)
|
static <T extends TType> Quantize<T> | |
static Quantize.Options |
ensureMinimumRange(Float ensureMinimumRange)
|
static Quantize.Options |
mode(String mode)
|
static Quantize.Options |
narrowRange(Boolean narrowRange)
|
Output<T> |
output()
The quantized data produced from the float input.
|
Output<TFloat32> |
outputMax()
The final quantization range maximum, used to clip input values before scaling
and rounding them to quantized values.
|
Output<TFloat32> |
outputMin()
The final quantization range minimum, used to clip input values before scaling
and rounding them to quantized values.
|
static Quantize.Options |
roundMode(String roundMode)
|
Inherited Methods
Constants
public static final String OP_NAME
The name of this op, as known by TensorFlow core engine
Public Methods
public static Quantize<T> create (Scope scope, Operand<TFloat32> input, Operand<TFloat32> minRange, Operand<TFloat32> maxRange, Class<T> T, Options... options)
Factory method to create a class wrapping a new Quantize operation.
Parameters
scope | current scope |
---|---|
minRange | The minimum value of the quantization range. This value may be adjusted by the op depending on other parameters. The adjusted value is written to `output_min`. If the `axis` attribute is specified, this must be a 1-D tensor whose size matches the `axis` dimension of the input and output tensors. |
maxRange | The maximum value of the quantization range. This value may be adjusted by the op depending on other parameters. The adjusted value is written to `output_max`. If the `axis` attribute is specified, this must be a 1-D tensor whose size matches the `axis` dimension of the input and output tensors. |
options | carries optional attributes values |
Returns
- a new instance of Quantize
public Output<TFloat32> outputMax ()
The final quantization range maximum, used to clip input values before scaling and rounding them to quantized values. If the `axis` attribute is specified, this will be a 1-D tensor whose size matches the `axis` dimension of the input and output tensors.
public Output<TFloat32> outputMin ()
The final quantization range minimum, used to clip input values before scaling and rounding them to quantized values. If the `axis` attribute is specified, this will be a 1-D tensor whose size matches the `axis` dimension of the input and output tensors.