tfma.ExtractEvaluateAndWriteResults

PTransform for performing extraction, evaluation, and writing results.

Used in the notebooks

Used in the tutorials

Users who want to construct their own Beam pipelines instead of using the lightweight run_model_analysis functions should use this PTransform.

Example usage:

eval_config = tfma.EvalConfig(model_specs=[...], metrics_specs=[...],
                              slicing_specs=[...])
eval_shared_model = tfma.default_eval_shared_model(
    eval_saved_model_path=model_location, eval_config=eval_config)
tfx_io = tf_example_record.TFExampleRecord(
    file_pattern=data_location,
    raw_record_column_name=tfma.ARROW_INPUT_COLUMN)
with beam.Pipeline(runner=...) as p:
  _ = (p
       | 'ReadData' >> tfx_io.BeamSource()
       | 'ExtractEvaluateAndWriteResults' >>
       tfma.ExtractEvaluateAndWriteResults(
           eval_shared_model=eval_shared_model,
           eval_config=eval_config,
           ...))
result = tfma.load_eval_result(output_path=output_path)
tfma.view.render_slicing_metrics(result)

Note: If running with an EvalSavedModel (i.e. the ModelSpec has signature_name
"eval"), then instead of using the tfxio.BeamSource() code use the following
beam.io.ReadFromTFRecord(data_location)

Note that the exact serialization format is an internal implementation detail and subject to change. Users should only use the TFMA functions to write and read the results.

examples PCollection of input examples or Arrow Record batches. Examples can be any format the model accepts (e.g. string containing CSV row, TensorFlow.Example, etc). If the examples are in the form of a dict it will be assumed that input is already in the form of tfma.Extracts with examples stored under tfma.INPUT_KEY (any other keys will be passed along unchanged to downstream extractors and evaluators).
eval_shared_model Optional shared model (single-model evaluation) or list of shared models (multi-model evaluation). Only required if needed by default extractors, evaluators, or writers and for display purposes of the model path.
eval_config Eval config.
extractors Optional list of Extractors to apply to Extracts. Typically these will be added by calling the default_extractors function. If no extractors are provided, default_extractors (non-materialized) will be used.
evaluators Optional list of Evaluators for evaluating Extracts. Typically these will be added by calling the default_evaluators function. If no evaluators are provided, default_evaluators will be used.
writers Optional list of Writers for writing Evaluation output. Typically these will be added by calling the default_writers function. If no writers are provided, default_writers will be used.
output_path Path to output results to (config file, metrics, plots, etc).
display_only_data_location Optional path indicating where the examples were read from. This is used only for display purposes - data will not actually be read from this path.
display_only_file_format Optional format of the examples. This is used only for display purposes.
slice_spec Deprecated (use EvalConfig).
write_config Deprecated (use EvalConfig).
compute_confidence_intervals Deprecated (use EvalConfig).
min_slice_size Deprecated (use EvalConfig).
random_seed_for_testing Provide for deterministic tests only.
tensor_adapter_config Tensor adapter config which specifies how to obtain tensors from the Arrow RecordBatch. If None, an attempt will be made to create the tensors using default TensorRepresentations.
schema A schema to use for customizing evaluators.
config_version Optional config version for this evaluation. This should not be explicitly set by users. It is only intended to be used in cases where the provided eval_config was generated internally, and thus not a reliable indicator of user intent.

ValueError If EvalConfig invalid or matching Extractor not found for an Evaluator.

A dict of writer results keyed by the writer stage name.