tf.data.experimental.table_from_dataset
Stay organized with collections
Save and categorize content based on your preferences.
Returns a lookup table based on the given dataset.
tf.data.experimental.table_from_dataset(
dataset=None,
num_oov_buckets=0,
vocab_size=None,
default_value=None,
hasher_spec=lookup_ops.FastHashSpec,
key_dtype=tf.dtypes.string
,
name=None
)
This operation constructs a lookup table based on the given dataset of pairs
of (key, value).
Any lookup of an out-of-vocabulary token will return a bucket ID based on its
hash if num_oov_buckets
is greater than zero. Otherwise it is assigned the
default_value
.
The bucket ID range is
[vocabulary size, vocabulary size + num_oov_buckets - 1]
.
Sample Usages:
keys = tf.data.Dataset.range(100)
values = tf.data.Dataset.range(100).map(
lambda x: tf.strings.as_string(x * 2))
ds = tf.data.Dataset.zip((keys, values))
table = tf.data.experimental.table_from_dataset(
ds, default_value='n/a', key_dtype=tf.int64)
table.lookup(tf.constant([0, 1, 2], dtype=tf.int64)).numpy()
array([b'0', b'2', b'4'], dtype=object)
Args |
dataset
|
A dataset containing (key, value) pairs.
|
num_oov_buckets
|
The number of out-of-vocabulary buckets.
|
vocab_size
|
Number of the elements in the vocabulary, if known.
|
default_value
|
The value to use for out-of-vocabulary feature values.
Defaults to -1.
|
hasher_spec
|
A HasherSpec to specify the hash function to use for
assignation of out-of-vocabulary buckets.
|
key_dtype
|
The key data type.
|
name
|
A name for this op (optional).
|
Returns |
The lookup table based on the given dataset.
|
Raises |
ValueError
|
If
dataset does not contain pairs
- The 2nd item in the
dataset pairs has a dtype which is incompatible
with default_value
num_oov_buckets is negative
vocab_size is not greater than zero
- The
key_dtype is not integer or string
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-10-06 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-10-06 UTC."],[],[]]