View source on GitHub |
Computes an approximate probability density at each x, given the bins.
tft.estimated_probability_density(
x: tf.Tensor,
boundaries: Optional[Union[tf.Tensor, int]] = None,
categorical: bool = False,
name: Optional[str] = None
) -> tf.Tensor
Using this type of fixed-interval method has several benefits compared to bucketization, although may not always be preferred.
- Quantiles does not work on categorical data.
- The quantiles algorithm does not currently operate on multiple features jointly, only independently.
Ex: Outlier detection in a multi-modal or arbitrary distribution. Imagine a value x where a simple model is highly predictive of a target y within certain densely populated ranges. Outside these ranges, we may want to treat the data differently, but there are too few samples for the model to detect them by case-by-case treatment. One option would be to use the density estimate for this purpose:
outputs['x_density'] = tft.estimated_prob(inputs['x'], bins=100) outputs['outlier_x'] = tf.where(outputs['x_density'] < OUTLIER_THRESHOLD, tf.constant([1]), tf.constant([0]))
This exercise uses a single variable for illustration, but a direct density metric would become more useful with higher dimensions.
Note that we normalize by average bin_width to arrive at a probability density estimate. The result resembles a pdf, not the probability that a value falls in the bucket (except in the categorical case).
Returns | |
---|---|
A Tensor the same shape as x, the probability density estimate at x (or
probability mass estimate if categorical is True).
|
Raises | |
---|---|
NotImplementedError
|
If x is CompositeTensor.
|