View source on GitHub |
TextEncoder backed by a list of tokens.
Inherits From: TextEncoder
tfds.deprecated.text.TokenTextEncoder(
vocab_list,
oov_buckets=1,
oov_token='UNK',
lowercase=False,
tokenizer=None,
strip_vocab=True,
decode_token_separator=' '
)
Tokenization splits on (and drops) non-alphanumeric characters with regex "\W+".
Attributes | |
---|---|
lowercase
|
|
oov_token
|
|
tokenizer
|
|
tokens
|
|
vocab_size
|
Size of the vocabulary. Decode produces ints [1, vocab_size). |
Methods
decode
decode(
ids
)
Decodes a list of integers into text.
encode
encode(
s
)
Encodes text into a list of integers.
load_from_file
@classmethod
load_from_file( filename_prefix )
Load from file. Inverse of save_to_file.
save_to_file
save_to_file(
filename_prefix
)
Store to file. Inverse of load_from_file.