tft.apply_vocabulary
Stay organized with collections
Save and categorize content based on your preferences.
Maps x
to a vocabulary specified by the deferred tensor.
tft.apply_vocabulary(
x: common_types.ConsistentTensorType,
deferred_vocab_filename_tensor: common_types.TemporaryAnalyzerOutputType,
*,
default_value: Any = -1,
num_oov_buckets: int = 0,
lookup_fn: Optional[Callable[[common_types.TensorType, tf.Tensor], Tuple[tf.Tensor, tf
.Tensor]]] = None,
file_format: common_types.VocabularyFileFormatType = analyzers.DEFAULT_VOCABULARY_FILE_FORMAT,
name: Optional[str] = None
) -> common_types.ConsistentTensorType
This function also writes domain statistics about the vocabulary min and max
values. Note that the min and max are inclusive, and depend on the vocab size,
num_oov_buckets and default_value.
Args |
x
|
A categorical Tensor , SparseTensor , or RaggedTensor of type
tf.string or tf.int[8|16|32|64] to which the vocabulary transformation
should be applied. The column names are those intended for the transformed
tensors.
|
deferred_vocab_filename_tensor
|
The deferred vocab filename tensor as
returned by tft.vocabulary , as long as the frequencies were not stored.
|
default_value
|
The value to use for out-of-vocabulary values, unless
'num_oov_buckets' is greater than zero.
|
num_oov_buckets
|
Any lookup of an out-of-vocabulary token will return a
bucket ID based on its hash if num_oov_buckets is greater than zero.
Otherwise it is assigned the default_value .
|
lookup_fn
|
Optional lookup function, if specified it should take a tensor
and a deferred vocab filename as an input and return a lookup op along
with the table size, by default apply_vocabulary constructs a
StaticHashTable for the table lookup.
|
file_format
|
(Optional) A str. The format of the given vocabulary. Accepted
formats are: 'tfrecord_gzip', 'text'. The default value is 'text'.
|
name
|
(Optional) A name for this operation.
|
Returns |
A Tensor , SparseTensor , or RaggedTensor where each string value is
mapped to an integer. Each unique string value that appears in the
vocabulary is mapped to a different integer and integers are consecutive
starting from zero, and string value not in the vocabulary is
assigned default_value.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-11-01 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-11-01 UTC."],[],[]]