TFDV checks for anomalies by comparing a schema and statistics proto(s). The following chart lists the anomaly types that TFDV can detect, the schema and statistics fields that are used to detect each anomaly type, and the condition(s) under which each anomaly type is detected.
BOOL_TYPE_BIG_INT
- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.num_stats.max
features.type
- Detection Condition:
feature.bool_domain
is specified andfeatures.type
==INT
andfeatures.num_stats.max
> 1
- Schema Fields:
BOOL_TYPE_BYTES_NOT_INT
- Anomaly type not detected in TFDV
BOOL_TYPE_BYTES_NOT_STRING
- Anomaly type not detected in TFDV
BOOL_TYPE_FLOAT_NOT_INT
- Anomaly type not detected in TFDV
BOOL_TYPE_FLOAT_NOT_STRING
- Anomaly type not detected in TFDV
BOOL_TYPE_INT_NOT_STRING
- Anomaly type not detected in TFDV
BOOL_TYPE_SMALL_INT
- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.num_stats.min
features.type
- Detection Condition:
features.type
==INT
andfeature.bool_domain
is specified andfeatures.num_stats.min
< 0
- Schema Fields:
BOOL_TYPE_STRING_NOT_INT
- Anomaly type not detected in TFDV
BOOL_TYPE_UNEXPECTED_STRING
- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.string_stats.rank_histogram
*
- Detection Condition:
features.type
==STRING
andfeature.bool_domain
is specified and- at least one value in
rank_histogram
* is notfeature.bool_domain.true_value
orfeature.bool_domain.false_value
- Schema Fields:
BOOL_TYPE_UNEXPECTED_FLOAT
- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.num_stats.min
features.num_stats.max
features.num_stats.histograms.num_nan
features.num_stats.histograms.buckets.low_value
features.num_stats.histograms.buckets.high_value
features.type
- Detection Condition:
features.type
==FLOAT
andfeature.bool_domain
is specified and either- (
features.num_stats.min
!= 0 orfeatures.num_stats.min
!= 1) or - (
features.num_stats.max
!= 0 orfeatures.num_stats.max
!= 1) or features.num_stats.histograms.num_nan
> 0 or- (
features.num_stats.histograms.buckets.low_value
!= 0 orfeatures.num_stats.histograms.buckets.high_value
!= 1) andfeatures.num_stats.histograms.buckets.sample_count
> 0
- (
- Schema Fields:
BOOL_TYPE_INVALID_CONFIG
- Schema Fields:
feature.bool_domain
- Statistics Fields:
features.type
- Detection Condition:
- If
features.type
==INT
orFLOAT
,feature.bool_domain
is specified andfeature.bool_domain.true_value
orfeature.bool_domain.false_value
is specified, or
- if
features.type
==STRING
,feature.bool_domain
is specified andfeature.bool_domain.true_value
andfeature.bool_domain.false_value
are not specified
- If
- Schema Fields:
ENUM_TYPE_BYTES_NOT_STRING
- Anomaly type not detected in TFDV
ENUM_TYPE_FLOAT_NOT_STRING
- Anomaly type not detected in TFDV
ENUM_TYPE_INT_NOT_STRING
- Anomaly type not detected in TFDV
ENUM_TYPE_INVALID_UTF8
- Statistics Fields:
features.string_stats.invalid_utf8_count
- Detection Condition:
invalid_utf8_count
> 0
- Statistics Fields:
ENUM_TYPE_UNEXPECTED_STRING_VALUES
- Schema Fields:
string_domain
andfeature.domain
; orfeature.string_domain
feature.distribution_constraints.min_domain_mass
- Statistics Fields:
features.string_stats.rank_histogram
*
- Detection Condition:
- Either (number of values in
rank_histogram
* that are not in domain / total number of values) > (1 -feature.distribution_constraints.min_domain_mass
) or feature.distribution_constraints.min_domain_mass
== 1.0 and there are values in the histogram that are not in the domain
- Either (number of values in
- Schema Fields:
FEATURE_TYPE_HIGH_NUMBER_VALUES
- Schema Fields:
feature.value_count.max
feature.value_counts.value_count.max
- Statistics Fields:
features.common_stats.max_num_values
features.common_stats.presence_and_valency_stats.max_num_values
- Detection Condition:
- If
feature.value_count.max
is specifiedfeatures.common_stats.max_num_values
>feature.value_count.max
; or
- if
feature.value_counts
is specifiedfeature.value_counts.value_count.max
<features.common_stats.presence_and_valency_stats.max_num_values
at a given nestedness level
- If
- Schema Fields:
FEATURE_TYPE_LOW_FRACTION_PRESENT
- Schema Fields:
feature.presence.min_fraction
- Statistics Fields:
features.common_stats.num_non_missing
*num_examples
*
- Detection Condition:
feature.presence.min_fraction
is specified and (features.common_stats.num_non_missing
* /num_examples
*) <feature.presence.min_fraction
orfeature.presence.min_fraction
== 1.0 andcommon_stats.num_missing
!= 0
- Schema Fields:
FEATURE_TYPE_LOW_NUMBER_PRESENT
- Schema Fields:
feature.presence.min_count
- Statistics Fields:
features.common_stats.num_non_missing
*
- Detection Condition:
feature.presence.min_count
is specified and eitherfeatures.common_stats.num_non_missing
* == 0 orfeatures.common_stats.num_non_missing
* <feature.presence.min_count
- Schema Fields:
FEATURE_TYPE_LOW_NUMBER_VALUES
- Schema Fields:
feature.value_count.min
feature.value_counts.value_count.min
- Statistics Fields:
features.common_stats.min_num_values
features.common_stats.presence_and_valency_stats.min_num_values
- Detection Condition:
- If
feature.value_count.min
is specifiedfeatures.common_stats.min_num_values
<feature.value_count.min
; or
- if
feature.value_counts
is specifiedfeatures.common_stats.presence_and_valency_stats.min_num_values
<feature.value_counts.value_count.min
at a given nestedness level
- If
- Schema Fields:
FEATURE_TYPE_NOT_PRESENT
- Schema Fields:
feature.in_environment
orfeature.not_in_environment
orschema.default_environment
feature.lifecycle_stage
feature.presence.min_count
orfeature.presence.min_fraction
- Statistics Fields:
features.common_stats.num_non_missing
*
- Detection Condition:
feature.lifecycle_stage
not in [PLANNED
,ALPHA
,DEBUG
,DEPRECATED
] andcommon_stats.num_non_missing
* == 0 and- (
feature.presence.min_count
> 0 orfeature.presence.min_fraction
> 0) and eitherfeature.in_environment
== current environment orfeature.not_in_environment
!= current environment orschema.default_environment
!= current environment
- Schema Fields:
FEATURE_TYPE_NO_VALUES
- Anomaly type not detected in TFDV
FEATURE_TYPE_UNEXPECTED_REPEATED
- Anomaly type not detected in TFDV
FEATURE_TYPE_HIGH_UNIQUE
- Schema Fields:
feature.unique_constraints.max
- Statistics Fields:
features.string_stats.unique
- Detection Condition:
features.string_stats.unique
>feature.unique_constraints.max
- Schema Fields:
FEATURE_TYPE_LOW_UNIQUE
- Schema Fields:
feature.unique_constraints.min
- Statistics Fields:
features.string_stats.unique
- Detection Condition:
features.string_stats.unique
<feature.unique_constraints.min
- Schema Fields:
FEATURE_TYPE_NO_UNIQUE
- Schema Fields:
feature.unique_constraints
- Statistics Fields:
features.string_stats.unique
- Detection Condition:
feature.unique_constraints
specified but nofeatures.string_stats.unique
present (as is the case where the feature is not a string or categorical)
- Schema Fields:
FLOAT_TYPE_BIG_FLOAT
- Schema Fields:
feature.float_domain.max
- Statistics Fields:
features.type
features.num_stats.max
orfeatures.string_stats.rank_histogram
- Detection Condition:
- If
features.type
==FLOAT
,features.num_stats.max
>feature.float_domain.max
; or
- if
features.type
==BYTES
orSTRING
,- maximum value in
features.string_stats.rank_histogram
(when converted to float) >feature.float_domain.max
- maximum value in
- If
- Schema Fields:
FLOAT_TYPE_NOT_FLOAT
- Anomaly type not detected in TFDV
FLOAT_TYPE_SMALL_FLOAT
- Schema Fields:
feature.float_domain.min
- Statistics Fields:
features.type
features.num_stats.min
orfeatures.string_stats.rank_histogram
- Detection Condition:
- If
features.type
==FLOAT
,features.num_stats.min
<feature.float_domain.min
; or
- if
features.type
==BYTES
orSTRING
,- minimum value in
features.string_stats.rank_histogram
(when converted to float) <feature.float_domain.min
- minimum value in
- If
- Schema Fields:
FLOAT_TYPE_STRING_NOT_FLOAT
- Schema Fields:
feature.float_domain
- Statistics Fields:
features.type
features.string_stats.rank_histogram
- Detection Condition:
features.type
==BYTES
orSTRING
andfeatures.string_stats.rank_histogram
has at least one value that cannot be converted to a float
- Schema Fields:
FLOAT_TYPE_NON_STRING
- Anomaly type not detected in TFDV
FLOAT_TYPE_UNKNOWN_TYPE_NUMBER
- Anomaly type not detected in TFDV
FLOAT_TYPE_HAS_NAN
- Schema Fields:
feature.float_domain.disallow_nan
- Statistics Fields:
features.type
features.num_stats.histograms.num_nan
- Detection Condition:
float_domain.disallow_nan
is true andfeatures.num_stats.histograms.num_nan
> 0
- Schema Fields:
FLOAT_TYPE_HAS_INF
- Schema Fields:
feature.float_domain.disallow_inf
- Statistics Fields:
features.type
features.num_stats.min
features.num_stats.max
- Detection Condition:
features.type
==FLOAT
float_domain.disallow_inf
is true and eitherfeatures.num_stats.min
==inf/-inf
orfeatures.num_stats.max
==inf/-inf
- Schema Fields:
INT_TYPE_BIG_INT
- Schema Fields:
feature.int_domain.max
- Statistics Fields:
features.type
features.num_stats.max
features.string_stats.rank_histogram
- Detection Condition:
- If
features.type
==INT
,features.num_stats.max
>feature.int_domain.max
; or
- if
features.type
==BYTES
orSTRING
,- maximum value in
features.string_stats.rank_histogram
(when converted to int) >feature.int_domain.max
- maximum value in
- If
- Schema Fields:
INT_TYPE_INT_EXPECTED
- Anomaly type not detected in TFDV
INT_TYPE_NOT_INT_STRING
- Schema Fields:
feature.int_domain
- Statistics Fields:
features.type
features.string_stats.rank_histogram
- Detection Condition:
features.type
==BYTES
orSTRING
andfeatures.string_stats.rank_histogram
has at least one value that cannot be converted to an int
- Schema Fields:
INT_TYPE_NOT_STRING
- Anomaly type not detected in TFDV
INT_TYPE_SMALL_INT
- Schema Fields:
feature.int_domain.min
- Statistics Fields:
features.type
features.num_stats.min
features.string_stats.rank_histogram
- Detection Condition:
- If
features.type
==INT
,features.num_stats.min
<feature.int_domain.min
; or
- if
features.type
==BYTES
orSTRING
,- minimum value in
features.string_stats.rank_histogram
(when converted to int) <feature.int_domain.min
- minimum value in
- If
- Schema Fields:
INT_TYPE_STRING_EXPECTED
- Anomaly type not detected in TFDV
INT_TYPE_UNKNOWN_TYPE_NUMBER
- Anomaly type not detected in TFDV
LOW_SUPPORTED_IMAGE_FRACTION
- Schema Fields:
feature.image_domain.minimum_supported_image_fraction
- Statistics Fields:
features.custom_stats.rank_histogram
for the custom_stats with nameimage_format_histogram
. Note that semantic domain stats must be enabled for the image_format_histogram to be generated and for this validation to be performed. Semantic domain stats are not generated by default.
- Detection Condition:
- The fraction of values that are supported Tensorflow image types to
all image types is less than
feature.image_domain.minimum_supported_image_fraction
.
- The fraction of values that are supported Tensorflow image types to
all image types is less than
- Schema Fields:
SCHEMA_MISSING_COLUMN
- Schema Fields:
feature.in_environment
orfeature.not_in_environment
orschema.default_environment
feature.lifecycle_stage
feature.presence.min_count
orfeature.presence.min_fraction
- Detection Condition:
feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
andfeature.presence.min_count
> 0 orfeature.presence.min_fraction
> 0 andfeature.in_environment
== current environment orfeature.not_in_environment
!= current environment orschema.default_environment
!= current environment and- no feature with the specified name/path is found in the statistics proto
- Schema Fields:
SCHEMA_NEW_COLUMN
- Detection Condition:
- there is a feature in the statistics proto but no feature with its name/path in the schema proto
- Detection Condition:
SCHEMA_TRAINING_SERVING_SKEW
- Anomaly type not detected in TFDV
STRING_TYPE_NOW_FLOAT
- Anomaly type not detected in TFDV
STRING_TYPE_NOW_INT
- Anomaly type not detected in TFDV
COMPARATOR_CONTROL_DATA_MISSING
- Schema Fields:
feature.skew_comparator.infinity_norm.threshold
feature.drift_comparator.infinity_norm.threshold
- Detection Condition:
- control statistics proto (i.e., serving statistics for skew or previous statistics for drift) is available but does not contain the specified feature
- Schema Fields:
COMPARATOR_TREATMENT_DATA_MISSING
- Anomaly type not detected in TFDV
COMPARATOR_L_INFTY_HIGH
- Schema Fields:
feature.skew_comparator.infinity_norm.threshold
feature.drift_comparator.infinity_norm.threshold
- Statistics Fields:
features.string_stats.rank_histogram
*
- Detection Condition:
- L-infinity norm of the vector that represents the difference between
the normalized counts from the
features.string_stats.rank_histogram
* in the control statistics (i.e., serving statistics for skew or previous statistics for drift) and the treatment statistics (i.e., training statistics for skew or current statistics for drift) >feature.skew_comparator.infinity_norm.threshold
orfeature.drift_comparator.infinity_norm.threshold
- L-infinity norm of the vector that represents the difference between
the normalized counts from the
- Schema Fields:
COMPARATOR_NORMALIZED_ABSOLUTE_DIFFERENCE_HIGH
- Schema Fields:
feature.skew_comparator.normalized_abs_difference.threshold
feature.drift_comparator.normalized_abs_difference.threshold
- Statistics Fields:
features.string_stats.rank_histogram
- Detection Condition:
- The normalized absolute count difference of value counts from the
features.string_stats.rank_histogram
in the control statistics (i.e., serving statistics for skew or previous statistics for drift) and the treatment statistics (i.e., training statistics for skew or current statistics for drift) exceeded feature.skew_comparator.normalized_abs_difference.threshold or feature.drift_comparator.normalized_abs_difference.threshold. Count differences are normalized by the total count across both conditions.
- The normalized absolute count difference of value counts from the
- Schema Fields:
COMPARATOR_JENSEN_SHANNON_DIVERGENCE_HIGH
- Schema Fields:
feature.skew_comparator.jensen_shannon_divergence.threshold
feature.drift_comparator.jensen_shannon_divergence.threshold
- Statistics Fields:
features.num_stats.histograms
of typeSTANDARD
features.string_stats.rank_histogram
*
- Detection Condition:
- Approximate Jensen-Shannon divergence computed between in the
control statistics (i.e., serving statistics for skew or previous
statistics for drift) and the treatment statistics (i.e., training
statistics for skew or current statistics for drift) >
feature.skew_comparator.jensen_shannon_divergence.threshold
orfeature.drift_comparator.jensen_shannon_divergence.threshold
. The approximate Jensen-Shannon divergence is computed based on the normalized sample counts in bothfeatures.num_stats.histograms
standard histogram andfeatures.string_stats.rank_histogram
*.
- Approximate Jensen-Shannon divergence computed between in the
control statistics (i.e., serving statistics for skew or previous
statistics for drift) and the treatment statistics (i.e., training
statistics for skew or current statistics for drift) >
- Schema Fields:
NO_DATA_IN_SPAN
- Anomaly type not detected in TFDV
SPARSE_FEATURE_MISSING_VALUE
- Schema Fields:
sparse_feature.value_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "missing_value" as name andmissing_value
custom stat != 0
- Schema Fields:
SPARSE_FEATURE_MISSING_INDEX
- Schema Fields:
sparse_feature.index_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "missing_index" as name andmissing_index
custom stat contains any value != 0
- Schema Fields:
SPARSE_FEATURE_LENGTH_MISMATCH
- Schema Fields:
sparse_feature.value_feature
sparse_feature.index_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "min_length_diff" or "max_length_diff" as namemin_length_diff
ormax_length_diff
custom stat contains any value != 0
- Schema Fields:
SPARSE_FEATURE_NAME_COLLISION
- Schema Fields:
sparse_feature.name
sparse_feature.lifecycle_stage
feature.name
feature.lifecycle_stage
- Detection Condition:
sparse_feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
, andfeature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
, andsparse_feature.name
==feature.name
- Schema Fields:
SEMANTIC_DOMAIN_UPDATE
- Schema Fields:
feature.domain_info
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "domain_info" as name andfeature.domain_info
is not already set in the schema and- there is a single
domain_info
custom stat for the feature
- Schema Fields:
COMPARATOR_LOW_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.num_examples_drift_comparator.min_fraction_threshold
schema.dataset_constraints.num_examples_version_comparator.min_fraction_threshold
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
* > 0 and- previous statistics proto is available and
num_examples
* / previous statisticsnum_examples
* < comparatormin_fraction_threshold
- Schema Fields:
COMPARATOR_HIGH_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.num_examples_drift_comparator.max_fraction_threshold
schema.dataset_constraints.num_examples_version_comparator.max_fraction_threshold
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
* > 0 and- previous statistics proto is available and
num_examples
* / previous statisticsnum_examples
* > comparatormax_fraction_threshold
- Schema Fields:
DATASET_LOW_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.min_examples_count
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
* <dataset_constraints.min_examples_count
- Schema Fields:
DATASET_HIGH_NUM_EXAMPLES
- Schema Fields:
schema.dataset_constraints.max_examples_count
- Statistics Fields:
num_examples
*
- Detection Condition:
num_examples
* >dataset_constraints.max_examples_count
- Schema Fields:
WEIGHTED_FEATURE_NAME_COLLISION
- Schema Fields:
weighted_feature.name
weighted_feature.lifecycle_stage
sparse_feature.name
sparse_feature.lifecycle_stage
feature.name
feature.lifecycle_stage
- Detection Condition:
weighted_feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
and either- if
feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
,weighted_feature.name
==feature.name
; or
- if
sparse_feature.lifecycle_stage
!=PLANNED
,ALPHA
,DEBUG
, orDEPRECATED
,weighted_feature.name
==sparse_feature.name
- if
- Schema Fields:
WEIGHTED_FEATURE_MISSING_VALUE
- Schema Fields:
weighted_feature.feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "missing_value" as name andmissing_value
custom stat != 0
- Schema Fields:
WEIGHTED_FEATURE_MISSING_WEIGHT
- Schema Fields:
weighted_feature.weight_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "missing_weight" as name andmissing_weight
custom stat != 0
- Schema Fields:
WEIGHTED_FEATURE_LENGTH_MISMATCH
- Schema Fields:
weighted_feature.feature
weighted_feature.weight_feature
- Statistics Fields:
features.custom_stats
- Detection Condition:
features.custom_stats
with "min_weighted_length_diff" or "max_weight_length_diff" as name, andmin_weight_length_diff
ormax_weight_length_diff
custom stat != 0
- Schema Fields:
VALUE_NESTEDNESS_MISMATCH
- Schema Fields:
feature.value_count
feature.value_counts
- Statistics Fields:
features.common_stats.presence_and_valency_stats
- Detection Condition:
feature.value_count
is specified, and there is a repeatedpresence_and_valency_stats
of the feature (which indicates a nestedness level that is greater than one) andfeature.value_counts
is specified, and the number of times thepresence_and_valency_stats
of the feature is repeated does not match the number of timesvalue_count
is repeated withinfeature.value_counts
- Schema Fields:
DOMAIN_INVALID_FOR_TYPE
- Schema Fields:
feature.type
feature.domain_info
- Statistics Fields:
features.type
- Detection Condition:
- If
features.type
==BYTES
,feature.domain_info
is of an incompatible type; or
- if
features.type
!=BYTES
,feature.domain_info
does not matchfeature.type
(e.g.,int_domain
is specified, but feature'stype
isFLOAT
)
- If
- Schema Fields:
FEATURE_MISSING_NAME
- Schema Fields:
feature.name
- Detection Condition:
feature.name
is not specified
- Schema Fields:
FEATURE_MISSING_TYPE
- Schema Fields:
feature.type
- Detection Condition:
feature.type
is not specified
- Schema Fields:
INVALID_SCHEMA_SPECIFICATION
- Schema Fields:
feature.domain_info
feature.presence.min_fraction
feature.value_count.min
feature.value_count.max
feature.distribution_constraints
- Detection Condition:
feature.presence.min_fraction
< 0.0 or > 1.0, orfeature.value_count.min
< 0 or >feature.value_count.max
, or- a bool, int, float, struct, or semantic domain is specified for a
feature and
feature.distribution_constraints
is also specified for that feature, or feature.distribution_constraints
is specified for a feature, but neither a schema-level domain norfeature.string_domain
is specified for that feature
- Schema Fields:
INVALID_DOMAIN_SPECIFICATION
- Schema Fields:
feature.domain_info
feature.bool_domain
feature.string_domain
- Detection Condition:
- Unknown
feature.domain_info
type is specified or feature.domain
is specified, but there is no matching domain specified at the schema level, or- if
feature.bool_domain
,feature.bool_domain.true_value
, andfeature.bool_domain.false_value
are specified,feature.bool_domain.true_value
==feature.bool_domain.false_value
, or
- if
feature.string_domain
is specified,- has duplicated
feature.string_domain.values
or feature.string_domain
exceeds the maximum size
- has duplicated
- Unknown
- Schema Fields:
UNEXPECTED_DATA_TYPE
- Schema Fields:
feature.type
- Statistics Fields:
features.type
- Detection Condition:
features.type
is not of type specified infeature.type
- Schema Fields:
SEQUENCE_VALUE_TOO_FEW_OCCURRENCES
- Schema Fields:
feature.natural_language_domain.token_constraints.min_per_sequence
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.per_sequence_min_frequency
- Detection Condition:
min_per_sequence
>per_sequence_min_frequency
- Schema Fields:
SEQUENCE_VALUE_TOO_MANY_OCCURRENCES
- Schema Fields:
feature.natural_language_domain.token_constraints.max_per_sequence
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.per_sequence_max_frequency
- Detection Condition:
max_per_sequence
<per_sequence_max_frequency
- Schema Fields:
SEQUENCE_VALUE_TOO_SMALL_FRACTION
- Schema Fields:
feature.natural_language_domain.token_constraints.min_fraction_of_sequences
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.fraction_of_sequences
- Detection Condition:
min_fraction_of_sequences
>fraction_of_sequences
- Schema Fields:
SEQUENCE_VALUE_TOO_LARGE_FRACTION
- Schema Fields:
feature.natural_language_domain.token_constraints.max_fraction_of_sequences
- Statistics Fields:
features.custom_stats.nl_statistics.token_statistics.fraction_of_sequences
- Detection Condition:
max_fraction_of_sequences
<fraction_of_sequences
- Schema Fields:
FEATURE_COVERAGE_TOO_LOW
- Schema Fields:
feature.natural_language_domain.coverage.min_coverage
- Statistics Fields:
features.custom_stats.nl_statistics.feature_coverage
- Detection Condition:
feature_coverage
<coverage.min_coverage
- Schema Fields:
FEATURE_COVERAGE_TOO_SHORT_AVG_TOKEN_LENGTH
- Schema Fields:
feature.natural_language_domain.coverage.min_avg_token_length
- Statistics Fields:
features.custom_stats.nl_statistics.avg_token_length
- Detection Condition:
avg_token_length
<min_avg_token_length
- Schema Fields:
NLP_WRONG_LOCATION
- Anomaly type not detected in TFDV
EMBEDDING_SHAPE_INVALID
- Anomaly type not detected in TFDV
MAX_IMAGE_BYTE_SIZE_EXCEEDED
- Schema Fields:
feature.image_domain.max_image_byte_size
- Statistics Fields:
features.bytes_stats.max_num_bytes_int
- Detection Condition:
max_num_bytes_int
>max_image_byte_size
- Schema Fields:
INVALID_FEATURE_SHAPE
- Schema Fields:
feature.shape
- Statistics Fields:
features.common_stats.num_missing
features.common_stats.min_num_values
features.common_stats.max_num_values
features.common_stats.presence_and_valency_stats.num_missing
features.common_stats.presence_and_valency_stats.min_num_values
features.common_stats.presence_and_valency_stats.max_num_values
features.common_stats.weighted_presence_and_valency_stats
- Detection Condition:
feature.shape
is specified, and either- the feature may be missing (
num_missing
!= 0) at some nest level or - the feature may have variable number of values (
min_num_values
!=max_num_values
) at some nest level or - the specified shape is not compatible with the feature's value
count stats. For example, shape
[16]
is compatible with (min_num_values
==max_num_values
==[2, 2, 4]
(for a 3-nested feature))
- the feature may be missing (
- Schema Fields:
STATS_NOT_AVAILBLE
- Anomaly occurs when stats needed to validate constraints are not present.
DERIVED_FEATURE_BAD_LIFECYCLE
- Schema Fields:
feature.lifecycle_stage
- Statistics Fields:
features.validation_derived_source
- Detection Condition:
feature.lifecycle_stage
is not one ofDERIVED
orDISABLED
, andfeatures.validation_derived_source
is present, indicating that this is a derived feature.
- Schema Fields:
DERIVED_FEATURE_INVALID_SOURCE
- Schema Fields:
feature.validation_derived_source
- Statistics Fields:
features.validation_derived_source
- Detection Condition:
features.validation_derived_source
is present for a feature, but the correspondingfeature.validation_derived_source
is not.
- Schema Fields:
* If a weighted statistic is available for this field, it will be used instead of the non-weighted statistic.