View source on GitHub
|
Build MinDiff dataset from sensitive and nonsensitive datasets.
model_remediation.min_diff.keras.utils.build_min_diff_dataset(
sensitive_group_dataset, nonsensitive_group_dataset
) -> tf.data.Dataset
Arguments | |
|---|---|
sensitive_group_dataset
|
tf.data.Dataset or valid MinDiff structure
(unnested dict) of tf.data.Datasets containing only examples that
belong to the sensitive group.
|
nonsensitive_group_dataset
|
tf.data.Dataset or valid MinDiff structure
(unnested dict) of tf.data.Datasets containing only examples that do
not belong to the sensitive group.
|
This function builds a tf.data.Dataset containing examples that are meant to
only be used when calculating a min_diff_loss. This resulting dataset will
need to be packed with the original dataset used for the original task of the
model which can be done by calling utils.pack_min_diff_data.
Each input dataset must output a tuple in the format used in
tf.keras.Model.fit. Specifically the output must be a tuple of
length 1, 2 or 3 in the form (x, y, sample_weight).
This output will be parsed internally in the following way:
batch = ... # Batch from any of the input datasets.
x, y, sample_weight = tf.keras.utils.unpack_x_y_sample_weight(batch)
Every batch from the returned tf.data.Dataset will contain one batch from
each of the input datasets. Each returned batch will be a tuple or structure
(matching the structure of the inputs) of (min_diff_x, min_diff_membership,
min_diff_sample_weight) where, for each pair of input datasets:
min_diff_x: is formed by concatenating thexcomponents of the paired datasets. The structure of these must match. If they don't the dataset will raise an error at the first batch.min_diff_membership: is a tensor of size[min_diff_batch_size, 1]indicating which dataset each example comes from (1.0forsensitive_group_datasetand0.0fornonsensitive_group_dataset).min_diff_sample_weight: is formed by concatenating thesample_weightcomponents of the paired datasets. If both areNone, then this will be set toNone. If only one isNone, it is replaced with aTensorof ones of the appropriate shape.
Returns | |
|---|---|
A tf.data.Dataset whose output is a tuple or structure (matching the
structure of the inputs) of (min_diff_x, min_diff_membership,
min_diff_sample_weight).
|
View source on GitHub