Given a path to new and old vocabulary files, returns a remapping Tensor of
length `num_new_vocab`, where `remapping[i]` contains the row number in the old vocabulary that corresponds to row `i` in the new vocabulary (starting at line `new_vocab_offset` and up to `num_new_vocab` entities), or `-1` if entry `i` in the new vocabulary is not in the old vocabulary. The old vocabulary is constrained to the first `old_vocab_size` entries if `old_vocab_size` is not the default value of -1.
`num_vocab_offset` enables use in the partitioned variable case, and should generally be set through examining partitioning info. The format of the files should be a text file, with each line containing a single entity within the vocabulary.
For example, with `new_vocab_file` a text file containing each of the following elements on a single line: `[f0, f1, f2, f3]`, old_vocab_file = [f1, f0, f3], `num_new_vocab = 3, new_vocab_offset = 1`, the returned remapping would be `[0, -1, 2]`.
The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping
This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).
Nested Classes
class | GenerateVocabRemapping.Options | Optional attributes for GenerateVocabRemapping
|
Constants
String | OP_NAME | The name of this op, as known by TensorFlow core engine |
Public Methods
static GenerateVocabRemapping | |
Output<TInt32> |
numPresent()
Number of new vocab entries found in old vocab.
|
static GenerateVocabRemapping.Options |
oldVocabSize(Long oldVocabSize)
|
Output<TInt64> |
remapping()
A Tensor of length num_new_vocab where the element at index i
is equal to the old ID that maps to the new ID i.
|
Inherited Methods
Constants
public static final String OP_NAME
The name of this op, as known by TensorFlow core engine
Public Methods
public static GenerateVocabRemapping create (Scope scope, Operand<TString> newVocabFile, Operand<TString> oldVocabFile, Long newVocabOffset, Long numNewVocab, Options... options)
Factory method to create a class wrapping a new GenerateVocabRemapping operation.
Parameters
scope | current scope |
---|---|
newVocabFile | Path to the new vocab file. |
oldVocabFile | Path to the old vocab file. |
newVocabOffset | How many entries into the new vocab file to start reading. |
numNewVocab | Number of entries in the new vocab file to remap. |
options | carries optional attributes values |
Returns
- a new instance of GenerateVocabRemapping
public static GenerateVocabRemapping.Options oldVocabSize (Long oldVocabSize)
Parameters
oldVocabSize | Number of entries in the old vocab file to consider. If -1, use the entire old vocabulary. |
---|