View source on GitHub |
An offline converter for TF-TRT transformation for TF 2.0 SavedModels.
tf.experimental.tensorrt.Converter(
input_saved_model_dir=None,
input_saved_model_tags=None,
input_saved_model_signature_key=None,
use_dynamic_shape=None,
dynamic_shape_profile_strategy=None,
conversion_params=None
)
Currently this is not available on Windows platform.
There are several ways to run the conversion:
FP32/FP16 precision
params = tf.experimental.tensorrt.ConversionParams( precision_mode='FP16') converter = tf.experimental.tensorrt.Converter( input_saved_model_dir="my_dir", conversion_params=params) converter.convert() converter.save(output_saved_model_dir)
In this case, no TRT engines will be built or saved in the converted SavedModel. But if input data is available during conversion, we can still build and save the TRT engines to reduce the cost during inference (see option 2 below).
FP32/FP16 precision with pre-built engines
params = tf.experimental.tensorrt.ConversionParams( precision_mode='FP16', # Set this to a large enough number so it can cache all the engines. maximum_cached_engines=16) converter = tf.experimental.tensorrt.Converter( input_saved_model_dir="my_dir", conversion_params=params) converter.convert() # Define a generator function that yields input data, and use it to execute # the graph to build TRT engines. def my_input_fn(): for _ in range(num_runs): inp1, inp2 = ... yield inp1, inp2 converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines converter.save(output_saved_model_dir) # Generated engines will be saved.
In this way, one engine will be built/saved for each unique input shapes of the TRTEngineOp. This is good for applications that cannot afford building engines during inference but have access to input data that is similar to the one used in production (for example, that has the same input shapes). Also, the generated TRT engines is platform dependent, so we need to run
build()
in an environment that is similar to production (e.g. with same type of GPU).INT8 precision and calibration with pre-built engines
params = tf.experimental.tensorrt.ConversionParams( precision_mode='INT8', # Currently only one INT8 engine is supported in this mode. maximum_cached_engines=1, use_calibration=True) converter = tf.experimental.tensorrt.Converter( input_saved_model_dir="my_dir", conversion_params=params) # Define a generator function that yields input data, and run INT8 # calibration with the data. All input data should have the same shape. # At the end of convert(), the calibration stats (e.g. range information) # will be saved and can be used to generate more TRT engines with different # shapes. Also, one TRT engine will be generated (with the same shape as # the calibration data) for save later. def my_calibration_input_fn(): for _ in range(num_runs): inp1, inp2 = ... yield inp1, inp2 converter.convert(calibration_input_fn=my_calibration_input_fn) # (Optional) Generate more TRT engines offline (same as the previous # option), to avoid the cost of generating them during inference. def my_input_fn(): for _ in range(num_runs): inp1, inp2 = ... yield inp1, inp2 converter.build(input_fn=my_input_fn) # Save the TRT engine and the engines. converter.save(output_saved_model_dir)
To use dynamic shape, we need to call the build method with an input function to generate profiles. This step is similar to the INT8 calibration step described above. The converter also needs to be created with use_dynamic_shape=True and one of the following profile_strategies for creating profiles based on the inputs produced by the input function:
Range
: create one profile that works for inputs with dimension values in the range of [min_dims, max_dims] where min_dims and max_dims are derived from the provided inputs.Optimal
: create one profile for each input. The profile only works for inputs with the same dimensions as the input it is created for. The GPU engine will be run with optimal performance with such inputs.Range+Optimal
: create the profiles for bothRange
andOptimal
.ImplicitBatchModeCompatible
: create the profiles that will produce the same GPU engines as the implicit_batch_mode would produce.
Raises | |
---|---|
ValueError
|
if the combination of the parameters is invalid. |
Methods
build
build(
input_fn
)
Run inference with converted graph in order to build TensorRT engines.
Args | |
---|---|
input_fn
|
a generator function that yields input data as a list or tuple,
which will be used to execute the converted signature to generate TRT
engines. Example:
def input_fn(): # Let's assume a network with 2 input tensors. We
generate 3 sets
# of dummy input data: input_shapes = [[(1, 16), (2, 16)], # 1st
input list [(2, 32), (4, 32)], # 2nd list of two tensors [(4,
32), (8, 32)]] # 3rd input list
for shapes in input_shapes: # return a list of input tensors yield
[np.zeros(x).astype(np.float32) for x in shapes]
|
Raises | |
---|---|
NotImplementedError
|
build() is already called. |
RuntimeError
|
the input_fx is None. |
convert
convert(
calibration_input_fn=None
)
Convert the input SavedModel in 2.0 format.
Args | |
---|---|
calibration_input_fn
|
a generator function that yields input data as a
list or tuple, which will be used to execute the converted signature for
calibration. All the returned input data should have the same shape.
Example: def input_fn(): yield input1, input2, input3
|
Raises | |
---|---|
ValueError
|
if the input combination is invalid. |
Returns | |
---|---|
The TF-TRT converted Function. |
save
save(
output_saved_model_dir
)
Save the converted SavedModel.
Args | |
---|---|
output_saved_model_dir
|
directory to saved the converted SavedModel. |