Returns a float Tensor
as the de-noised counts
.
tfr.utils.de_noise(
counts, noise, ratio=0.9
)
The implementation is based on the the paper by Zhang and Xu: "Fast Exact
Maximum Likelihood Estimation for Mixture of Language Models." It assumes that
the observed counts
are generated from a mixture of noise
and the true
distribution: ratio * noise_distribution + (1 - ratio) * true_distribution
,
where the contribution of noise
is controlled by ratio
. This method
returns the true distribution.
Args |
counts
|
A 2-D Tensor representing the observations. All values should be
nonnegative.
|
noise
|
A 2-D Tensor representing the noise distribution. This should be
the same shape as counts . All values should be positive and are
normalized to a simplex per row.
|
ratio
|
A float in (0, 1) representing the contribution from noise.
|
Returns |
A 2-D float Tensor and each row is a simplex.
|
Raises |
ValueError
|
if ratio is not in (0,1).
|
InvalidArgumentError
|
if any of counts is negative or any of noise is
not positive.
|