View source on GitHub |
Compute data statistics for the input pandas DataFrame.
tfdv.generate_statistics_from_dataframe(
dataframe: DataFrame,
stats_options: tfdv.StatsOptions
= options.StatsOptions()
,
n_jobs: int = 1
) -> statistics_pb2.DatasetFeatureStatisticsList
This is a utility function for users with in-memory data represented as a pandas DataFrame.
This function supports only DataFrames with columns of primitive string or numeric types. DataFrames with multivalent features or holding non-string object types are not supported.
Args | |
---|---|
dataframe
|
Input pandas DataFrame. |
stats_options
|
tfdv.StatsOptions for generating data statistics.
|
n_jobs
|
Number of processes to run (defaults to 1). If -1 is provided, uses the same number of processes as the number of CPU cores. |
Returns | |
---|---|
A DatasetFeatureStatisticsList proto. |