View source on GitHub |
Gets the top unique word multiset from the input dataset
.
tff.analytics.data_processing.get_top_multi_elements(
dataset: tf.data.Dataset,
max_user_contribution: int,
string_max_bytes: Optional[int] = None
)
This method returns the max_user_contribution
most common unique elements
from the dataset, but returns a multiset. That is, a word will appear in the
output as many times as it did in the dataset, but each unique word only
counts one toward the max_user_contribution
limit.
This differs from get_top_elements
in that it returns a multiset rather than
a set.
The input dataset
must yield batched rank-1 tensors. This function reads
each coordinate of the tensor as an individual element and caps the total
number of elements to return. Note that the returned set of top elements will
not necessarily be sorted.
Args | |
---|---|
dataset
|
A tf.data.Dataset to extract top elements from. Element type must
be tf.string .
|
max_user_contribution
|
The maximum number of elements to keep. |
string_max_bytes
|
The maximum length (in bytes) of strings in the dataset.
Strings longer than string_max_bytes will be truncated. Defaults to
None , which means there is no limit of the string length.
|
Returns | |
---|---|
A rank-1 Tensor containing the top max_user_contribution unique elements
of the input dataset . If the total number of unique elements is less than
or equal to max_user_contribution , returns the list of all unique
elements.
|
Raises | |
---|---|
ValueError
|
-- If the shape of elements in dataset is not rank 1.
-- If max_user_contribution is less than 1.
-- If string_max_bytes is not None and is less than 1.
|
TypeError
|
If dataset.element_spec.dtype must be tf.string is not
tf.string .
|