Produces a privacy report summarizing the DP guarantee.
tf_privacy.compute_dp_sgd_privacy_statement(
number_of_examples: int,
batch_size: int,
num_epochs: float,
noise_multiplier: float,
delta: float,
used_microbatching: bool = True,
max_examples_per_user: Optional[int] = None,
accountant_type: AccountantType = AccountantType.RDP
) -> str
Args |
number_of_examples
|
Total number of examples in the dataset. For DP-SGD, an
"example" corresponds to one row in a minibatch. E.g., for sequence models
this would be a sequence of maximum length.
|
batch_size
|
The number of examples in a batch. This should be the number of
examples in a batch, regardless of whether/how they are grouped into
microbatches.
|
num_epochs
|
The number of epochs of training. May be fractional.
|
noise_multiplier
|
The ratio of the Gaussian noise stddev to the l2 clip norm
at each round. It is assumed that the noise_multiplier is constant
although the clip norm may be variable if, for example, adaptive clipping
is used.
|
delta
|
The target delta.
|
used_microbatching
|
Whether microbatching was used (with microbatch size
greater than one). Microbatching inflates sensitivity by a factor of two
in add-or-remove-one adjacency DP. (See "How to DP-fy ML: A Practical
Guide to Machine Learning with Differential Privacy",
https://arxiv.org/abs/2303.00654, Sec 5.6.)
|
max_examples_per_user
|
If the data set is constructed to cap the maximum
number of examples each user contributes, provide this argument to also
print a user-level DP guarantee.
|
accountant_type
|
The privacy accountant for computing epsilon. Since the
current approach for computing user-level privacy when using PLD
accountant can sometimes be overly pessimistic, this method does not
provide user-level privacy guarantee for PLD accountant_type. This remains
to be investigated and fixed (b/271341062).
|
Returns |
A str precisely articulating the privacy guarantee.
|