FixedUnigramCandidateSampler.Options

public static class FixedUnigramCandidateSampler.Options

Optional attributes for FixedUnigramCandidateSampler

Public Methods

FixedUnigramCandidateSampler.Options	distortion(Float distortion)
FixedUnigramCandidateSampler.Options	numReservedIds(Long numReservedIds)
FixedUnigramCandidateSampler.Options	numShards(Long numShards)
FixedUnigramCandidateSampler.Options	seed(Long seed)
FixedUnigramCandidateSampler.Options	seed2(Long seed2)
FixedUnigramCandidateSampler.Options	shard(Long shard)
FixedUnigramCandidateSampler.Options	unigrams(List<Float> unigrams)
FixedUnigramCandidateSampler.Options	vocabFile(String vocabFile)

Inherited Methods

From class java.lang.Object

boolean	equals(Object arg0)
final Class<?>	getClass()
int	hashCode()
final void	notify()
final void	notifyAll()
String	toString()
final void	wait(long arg0, int arg1)
final void	wait(long arg0)
final void	wait()

Public Methods

public FixedUnigramCandidateSampler.Options distortion (Float distortion)

Parameters

distortion	The distortion is used to skew the unigram probability distribution. Each weight is first raised to the distortion's power before adding to the internal unigram distribution. As a result, distortion = 1.0 gives regular unigram sampling (as defined by the vocab file), and distortion = 0.0 gives a uniform distribution.

public FixedUnigramCandidateSampler.Options numReservedIds (Long numReservedIds)

Parameters

numReservedIds	Optionally some reserved IDs can be added in the range [0, ..., num_reserved_ids) by the users. One use case is that a special unknown word token is used as ID 0. These IDs will have a sampling probability of 0.

public FixedUnigramCandidateSampler.Options numShards (Long numShards)

Parameters

numShards	A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism. This parameter (together with 'shard') indicates the number of partitions that are being used in the overall computation.

public FixedUnigramCandidateSampler.Options seed (Long seed)

Parameters

seed	If either seed or seed2 are set to be non-zero, the random number generator is seeded by the given seed. Otherwise, it is seeded by a random seed.

public FixedUnigramCandidateSampler.Options seed2 (Long seed2)

Parameters

seed2	An second seed to avoid seed collision.

public FixedUnigramCandidateSampler.Options shard (Long shard)

Parameters

shard	A sampler can be used to sample from a subset of the original range in order to speed up the whole computation through parallelism. This parameter (together with 'num_shards') indicates the particular partition number of a sampler op, when partitioning is being used.

public FixedUnigramCandidateSampler.Options unigrams (List<Float> unigrams)

Parameters

unigrams	A list of unigram counts or probabilities, one per ID in sequential order. Exactly one of vocab_file and unigrams should be passed to this op.

public FixedUnigramCandidateSampler.Options vocabFile (String vocabFile)

Parameters

vocabFile	Each valid line in this file (which should have a CSV-like format) corresponds to a valid word ID. IDs are in sequential order, starting from num_reserved_ids. The last entry in each line is expected to be a value corresponding to the count or relative probability. Exactly one of vocab_file and unigrams needs to be passed to this op.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2021-11-29 UTC.