View source on GitHub |
Layers are the fundamental building blocks for NLP models.
They can be used to assemble new tf.keras
layers or models.
Modules
util
module: Keras-based transformer block layer.
Classes
class BertPackInputs
: Packs tokens into model inputs for BERT.
class BertTokenizer
: Wraps TF.Text's BertTokenizer with pre-defined vocab as a Keras Layer.
class BigBirdAttention
: BigBird, a sparse attention mechanism.
class BigBirdMasks
: Creates bigbird attention masks.
class BlockDiagFeedforward
: Block diagonal feedforward layer.
class CachedAttention
: Attention layer with cache used for autoregressive decoding.
class ClassificationHead
: Pooling head for sentence-level classification tasks.
class ExpertsChooseMaskedRouter
: Masked matmul router using experts choose tokens assignment.
class FactorizedEmbedding
: A factorized embeddings layer for supporting larger embeddings.
class FastWordpieceBertTokenizer
: A bert tokenizer keras layer using text.FastWordpieceTokenizer.
class FeedForwardExperts
: Feed-forward layer with multiple experts.
class FourierTransformLayer
: Fourier Transform layer.
class GatedFeedforward
: Gated linear feedforward layer.
class GaussianProcessClassificationHead
: Gaussian process-based pooling head for sentence classification.
class HartleyTransformLayer
: Hartley Transform layer.
class KernelAttention
: A variant of efficient transformers which replaces softmax with kernels.
class KernelMask
: Creates kernel attention mask.
class LinearTransformLayer
: Dense, linear transformation layer.
class MaskedLM
: Masked language model network head for BERT modeling.
class MaskedSoftmax
: Performs a softmax with optional masking on a tensor.
class MatMulWithMargin
: This layer computs a dot product matrix given two encoded inputs.
class MixingMechanism
: Determines the type of mixing layer.
class MobileBertEmbedding
: Performs an embedding lookup for MobileBERT.
class MobileBertMaskedLM
: Masked language model network head for BERT modeling.
class MobileBertTransformer
: Transformer block for MobileBERT.
class MoeLayer
: Sparse MoE layer with per-token routing.
class MoeLayerWithBackbone
: Sparse MoE layer plus a FeedForward layer evaluated for all tokens.
class MultiChannelAttention
: Multi-channel Attention layer.
class MultiClsHeads
: Pooling heads sharing the same pooling stem.
class MultiHeadRelativeAttention
: A multi-head attention layer with relative attention + position encoding.
class OnDeviceEmbedding
: Performs an embedding lookup suitable for accelerator devices.
class PackBertEmbeddings
: Performs packing tricks for BERT inputs to improve TPU utilization.
class PerDimScaleAttention
: Learn scales for individual dims.
class PerQueryDenseHead
: Pooling head used for EncT5 style models.
class PositionEmbedding
: Creates a positional embedding.
class RandomFeatureGaussianProcess
: Gaussian process layer with random feature approximation [1].
class ReZeroTransformer
: Transformer layer with ReZero.
class RelativePositionBias
: Relative position embedding via per-head bias in T5 style.
class RelativePositionEmbedding
: Creates a positional embedding.
class ReuseMultiHeadAttention
: MultiHeadAttention layer.
class ReuseTransformer
: Transformer layer.
class SelectTopK
: Select top-k + random-k tokens according to importance.
class SelfAttentionMask
: Create 3D attention mask from a 2D tensor mask.
class SentencepieceTokenizer
: Wraps tf_text.SentencepieceTokenizer
as a Keras Layer.
class SpectralNormalization
: Implements spectral normalization for Dense layer.
class SpectralNormalizationConv2D
: Implements spectral normalization for Conv2D layer based on [3].
class StridedTransformerEncoderBlock
: Transformer layer for packing optimization to stride over inputs.
class StridedTransformerScaffold
: TransformerScaffold for packing optimization to stride over inputs.
class TNTransformerExpandCondense
: Transformer layer using tensor network Expand-Condense layer.
class TalkingHeadsAttention
: Implements Talking-Heads Attention.
class TokenImportanceWithMovingAvg
: Routing based on per-token importance value.
class Transformer
: Transformer layer.
class TransformerDecoderBlock
: Single transformer layer for decoder.
class TransformerEncoderBlock
: TransformerEncoderBlock layer.
class TransformerScaffold
: Transformer scaffold layer.
class TransformerXL
: Transformer XL.
class TransformerXLBlock
: Transformer XL block.
class TwoStreamRelativeAttention
: Two-stream relative self-attention for XLNet.
class VotingAttention
: Voting Attention layer.
Functions
extract_gp_layer_kwargs(...)
: Extracts Gaussian process layer configs from a given kwarg.
extract_spec_norm_kwargs(...)
: Extracts spectral normalization configs from a given kwarg.
get_mask(...)
: Gets a 3D self-attention mask.