View source on GitHub |
API for detecting feature skew between training and serving examples.
tfdv.DetectFeatureSkew(
identifier_features: List[types.FeatureName],
features_to_ignore: Optional[List[types.FeatureName]] = None,
sample_size: int = 0,
float_round_ndigits: Optional[int] = None,
allow_duplicate_identifiers: bool = False
) -> None
Example:
with beam.Pipeline(runner=...) as p:
training_examples = p | 'ReadTrainingData' >>
beam.io.ReadFromTFRecord(
training_filepaths, coder=beam.coders.ProtoCoder(tf.train.Example))
serving_examples = p | 'ReadServingData' >>
beam.io.ReadFromTFRecord(
serving_filepaths, coder=beam.coders.ProtoCoder(tf.train.Example))
_ = ((training_examples, serving_examples) | 'DetectFeatureSkew' >>
DetectFeatureSkew(identifier_features=['id1'], sample_size=5)
| 'WriteFeatureSkewResultsOutput' >>
tfdv.WriteFeatureSkewResultsToTFRecord(output_path)
| 'WriteFeatureSkwePairsOutput' >>
tfdv.WriteFeatureSkewPairsToTFRecord(output_path))
See the documentation for DetectFeatureSkewImpl for more detail about feature skew detection.
Class Variables | |
---|---|
pipeline |
None
|