Semantic and hyper-parameters for a single feature.
tfdf.keras.FeatureUsage(
name: str,
semantic: Optional[tfdf.keras.FeatureSemantic
] = None,
num_discretized_numerical_bins: Optional[int] = None,
max_vocab_count: Optional[int] = None,
min_vocab_frequency: Optional[int] = None,
override_global_imputation_value: Optional[str] = None,
monotonic: tfdf.keras.core.MonotonicConstraint
= None
)
Used in the notebooks
This class allows to |
- Limit the input features of the model.
- Set manually the semantic of a feature.
- Specify feature specific hyper-parameters.
|
Note that the model's "features" argument is optional. If it is not specified,
all available feature will be used. See the "CoreModel" class
documentation for more details.
Usage example:
# A feature named "A". The semantic will be detected automatically. The
# global hyper-parameters of the model will be used.
feature_a = FeatureUsage(name="A")
# A feature named "C" representing a CATEGORICAL value.
# Specifying the semantic ensure the feature is correctly detected.
# In this case, the feature might be stored as an integer, and would have be
# detected as NUMERICAL.
feature_b = FeatureUsage(name="B", semantic=Semantic.CATEGORICAL)
# A feature with a specific maximum dictionary size.
feature_c = FeatureUsage(name="C",
semantic=Semantic.CATEGORICAL,
max_vocab_count=32)
model = CoreModel(features=[feature_a, feature_b, feature_c])
Attributes |
name
|
The name of the feature. Used as an identifier if the dataset is a
dictionary of tensors.
|
semantic
|
Semantic of the feature. If None, the semantic is automatically
determined. The semantic controls how a feature is interpreted by a model.
Using the wrong semantic (e.g. numerical instead of categorical) will hurt
your model. See "FeatureSemantic" and "Semantic" for the definition of the
of available semantics.
|
num_discretized_numerical_bins
|
For DISCRETIZED_NUMERICAL features only.
Number of bins used to discretize DISCRETIZED_NUMERICAL features.
|
max_vocab_count
|
For CATEGORICAL and CATEGORICAL_SET features only. Number
of unique categorical values stored as string. If more categorical values
are present, the least frequent values are grouped into a
Out-of-vocabulary item. Reducing the value can improve or hurt the model.
|
min_vocab_frequency
|
For CATEGORICAL and CATEGORICAL_SET features only.
Minimum number of occurence of a categorical value. Values present less
than "min_vocab_frequency" times in the training dataset are treated as
"Out-of-vocabulary".
|
override_global_imputation_value
|
For CATEGORICAL and CATEGORICAL_SET
features only. If set, replaces the global imputation value used to handle
missing values. That is, at inference time, missing values will be treated
as "override_global_imputation_value". "override_global_imputation_value"
can only be used on categorical features and on columns not containing
missing values in the training dataset. If the algorithm used to handle
missing values is not "GLOBAL_IMPUTATION" (default algorithm), this value
is ignored.
|
monotonic
|
Monotonic constraints between the feature and the model output.
Use None (default) for a non monotonic constrainted features.
Monotonic.INCREASING ensures the model is monotonically increasing with
the features. Monotonic.DECREASING ensures the model is monotonically
decreasing with the features. Alternatively, you can also use 0 , +1
and -1 to respectively define a non-constrained, monotonically
increasing, and monotonically decreasing feature.
|
guide
|
|