tf.keras.layers.experimental.RandomFourierFeatures

Layer that projects its inputs into a random feature space.

Inherits From: Layer, Module

This layer implements a mapping from input space to a space with output_dim dimensions, which approximates shift-invariant kernels. A kernel function K(x, y) is shift-invariant if K(x, y) == k(x - y) for some function k. Many popular Radial Basis Functions (RBF), including Gaussian and Laplacian kernels, are shift-invariant.

The implementation of this layer is based on the following paper: "Random Features for Large-Scale Kernel Machines" by Ali Rahimi and Ben Recht.

The distribution from which the parameters of the random features map (layer) are sampled determines which shift-invariant kernel the layer approximates (see paper for more details). You can use the distribution of your choice. The layer supports out-of-the-box approximations of the following two RBF kernels:

  • Gaussian: K(x, y) == exp(- square(x - y) / (2 * square(scale)))
  • Laplacian: K(x, y) = exp(-abs(x - y) / scale))

Usage: Typically, this layer is used to "kernelize" linear models by applying a non-linear transformation (this layer) to the input features and then training a linear model on top of the transformed features. Depending on the loss function of the linear model, the composition of this layer and the linear model results to models that are equivalent (up to approximation) to kernel SVMs (for hinge loss), kernel logistic regression (for logistic loss), kernel linear regression (for squared loss), etc.

Examples:

A kernel multinomial logistic regression model with Gaussian kernel for MNIST:

model = keras.Sequential([
  keras.Input(shape=(784,)),
  RandomFourierFeatures(
      output_dim=4096,
      scale=10.,
      kernel_initializer='gaussian'),
  layers.Dense(units=10, activation='softmax'),
])
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['categorical_accuracy']
)

A quasi-SVM classifier for MNIST:

model = keras.Sequential([
  keras.Input(shape=(784,)),
  RandomFourierFeatures(
      output_dim=4096,
      scale=10.,
      kernel_initializer='gaussian'),
  layers.Dense(units=10),
])
model.compile(
    optimizer='adam',
    loss='hinge',
    metrics=['categorical_accuracy']
)

To use another kernel, just replace the layer creation line with:

random_features_layer = RandomFourierFeatures(
    output_dim=500,
    kernel_initializer=<my_initializer>,
    scale=...,
    ...)

output_dim Positive integer, the dimension of the layer's output, i.e., the number of random features used to approximate the kernel.
kernel_initializer Determines the distribution of the parameters of the random features map (and therefore the kernel approximated by the layer). It can be either a string identifier or a Keras Initializer instance. Currently only 'gaussian' and 'laplacian' are supported string identifiers (case insensitive). Note that the kernel matrix is not trainable.
scale For Gaussian and Laplacian kernels, this corresponds to a scaling factor of the corresponding kernel approximated by the layer (see concrete definitions above). When provided, it should be a positive float. If None, a default value is used: if the kernel initializer is set to "gaussian", scale becomes sqrt(input_dim / 2), otherwise, it becomes 1.0. Both the approximation error of the kernel and the classification quality are sensitive to this parameter. If trainable is set to True, this parameter is learned end-to-end during training and the provided value serves as the initial value. Note: When features from this layer are fed to a linear model, by making scale trainable, the resulting optimization problem is no longer convex (even if the loss function used by the linear model is convex). Defaults to None.
trainable Whether the scaling parameter of the layer should be trainable. Defaults to False.
name String, name to use for this layer.