QuantizeDownAndShrinkRange

public final class QuantizeDownAndShrinkRange

Convert the quantized 'input' tensor into a lower-precision 'output', using the

actual distribution of the values to maximize the usage of the lower bit depth and adjusting the output min and max ranges accordingly.

[input_min, input_max] are scalar floats that specify the range for the float interpretation of the 'input' data. For example, if input_min is -1.0f and input_max is 1.0f, and we are dealing with quint16 quantized data, then a 0 value in the 16-bit data should be interpreted as -1.0f, and a 65535 means 1.0f.

This operator tries to squeeze as much precision as possible into an output with a lower bit depth by calculating the actual min and max values found in the data. For example, maybe that quint16 input has no values lower than 16,384 and none higher than 49,152. That means only half the range is actually needed, all the float interpretations are between -0.5f and 0.5f, so if we want to compress the data into a quint8 output, we can use that range rather than the theoretical -1.0f to 1.0f that is suggested by the input min and max.

In practice, this is most useful for taking output from operations like QuantizedMatMul that can produce higher bit-depth outputs than their inputs and may have large potential output ranges, but in practice have a distribution of input values that only uses a small fraction of the possible range. By feeding that output into this operator, we can reduce it from 32 bits down to 8 with minimal loss of accuracy.

Constants

String OP_NAME The name of this op, as known by TensorFlow core engine

Public Methods

static <U extends TType> QuantizeDownAndShrinkRange<U>	create(Scope scope, Operand<? extends TType> input, Operand<TFloat32> inputMin, Operand<TFloat32> inputMax, Class<U> outType) Factory method to create a class wrapping a new QuantizeDownAndShrinkRange operation.
Output<U>	output()
Output<TFloat32>	outputMax() The float value that the maximum quantized output value represents.
Output<TFloat32>	outputMin() The float value that the minimum quantized output value represents.

Inherited Methods

From class org.tensorflow.op.RawOp

final boolean	equals(Object obj)
final int	hashCode()
Operation	op() Return this unit of computation as a single `Operation`.
final String	toString()

From class java.lang.Object

boolean	equals(Object arg0)
final Class<?>	getClass()
int	hashCode()
final void	notify()
final void	notifyAll()
String	toString()
final void	wait(long arg0, int arg1)
final void	wait(long arg0)
final void	wait()

From interface org.tensorflow.op.Op

abstract ExecutionEnvironment	env() Return the execution environment this op was created in.
abstract Operation	op() Return this unit of computation as a single `Operation`.

Constants

public static final String OP_NAME

The name of this op, as known by TensorFlow core engine

Constant Value: "QuantizeDownAndShrinkRange"

Public Methods

public static QuantizeDownAndShrinkRange<U> create (Scope scope, Operand<? extends TType> input, Operand<TFloat32> inputMin, Operand<TFloat32> inputMax, Class<U> outType)

Factory method to create a class wrapping a new QuantizeDownAndShrinkRange operation.

Parameters

scope	current scope
inputMin	The float value that the minimum quantized input value represents.
inputMax	The float value that the maximum quantized input value represents.
outType	The type of the output. Should be a lower bit depth than Tinput.

Returns

a new instance of QuantizeDownAndShrinkRange

public Output<U> output ()

public Output<TFloat32> outputMax ()

The float value that the maximum quantized output value represents.

public Output<TFloat32> outputMin ()

The float value that the minimum quantized output value represents.

QuantizeDownAndShrinkRange Stay organized with collections Save and categorize content based on your preferences.

Constants

Public Methods

Inherited Methods

Constants

public static final String OP_NAME

Public Methods

public static QuantizeDownAndShrinkRange<U> create (Scope scope, Operand<? extends TType> input, Operand<TFloat32> inputMin, Operand<TFloat32> inputMax, Class<U> outType)

Parameters

Returns

public Output<U> output ()

public Output<TFloat32> outputMax ()

public Output<TFloat32> outputMin ()

QuantizeDownAndShrinkRange