Infers schema from the input statistics.
tfdv.infer_schema(
statistics: statistics_pb2.DatasetFeatureStatisticsList,
infer_feature_shape: bool = True,
max_string_domain_size: int = 100,
schema_transformations: Optional[List[Callable[[schema_pb2.Schema, statistics_pb2.
DatasetFeatureStatistics], schema_pb2.Schema]]] = None
) -> schema_pb2.Schema
Used in the notebooks
Args |
statistics
|
A DatasetFeatureStatisticsList protocol buffer. Schema inference
is currently supported only for lists with a single
DatasetFeatureStatistics proto or lists with multiple
DatasetFeatureStatistics protos corresponding to data slices that include
the default slice (i.e., the slice with all examples). If a list with
multiple DatasetFeatureStatistics protos is used, this function will infer
the schema from the statistics corresponding to the default slice.
|
infer_feature_shape
|
A boolean to indicate if shape of the features need to
be inferred from the statistics.
|
max_string_domain_size
|
Maximum size of the domain of a string feature in
order to be interpreted as a categorical feature.
|
schema_transformations
|
List of transformation functions to apply to the
auto-inferred schema. Each transformation function should take the
schema and statistics as input and should return the transformed schema.
The transformations are applied in the order provided in the list.
|
Returns |
A Schema protocol buffer.
|
Raises |
TypeError
|
If the input argument is not of the expected type.
|
ValueError
|
If the input statistics proto contains multiple datasets, none
of which corresponds to the default slice.
|