tfdv.infer_schema
Stay organized with collections
Save and categorize content based on your preferences.
Infers schema from the input statistics.
tfdv.infer_schema(
statistics: statistics_pb2.DatasetFeatureStatisticsList,
infer_feature_shape: bool = True,
max_string_domain_size: int = 100,
schema_transformations: Optional[List[Callable[[schema_pb2.Schema, statistics_pb2.
DatasetFeatureStatistics], schema_pb2.Schema]]] = None
) -> schema_pb2.Schema
Used in the notebooks
Args |
statistics
|
A DatasetFeatureStatisticsList protocol buffer. Schema inference
is currently supported only for lists with a single
DatasetFeatureStatistics proto or lists with multiple
DatasetFeatureStatistics protos corresponding to data slices that include
the default slice (i.e., the slice with all examples). If a list with
multiple DatasetFeatureStatistics protos is used, this function will infer
the schema from the statistics corresponding to the default slice.
|
infer_feature_shape
|
A boolean to indicate if shape of the features need to
be inferred from the statistics.
|
max_string_domain_size
|
Maximum size of the domain of a string feature in
order to be interpreted as a categorical feature.
|
schema_transformations
|
List of transformation functions to apply to the
auto-inferred schema. Each transformation function should take the
schema and statistics as input and should return the transformed schema.
The transformations are applied in the order provided in the list.
|
Returns |
A Schema protocol buffer.
|
Raises |
TypeError
|
If the input argument is not of the expected type.
|
ValueError
|
If the input statistics proto contains multiple datasets, none
of which corresponds to the default slice.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-10-18 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-10-18 UTC."],[],[]]