Zobacz na TensorFlow.org | Uruchom w Google Colab | Zobacz na GitHub | Pobierz notatnik | Zobacz model piasty TF |
Przegląd
Uczciwość Wskaźniki to zestaw narzędzi zbudowany na szczycie TensorFlow Analiza modelu (TFMA) , które umożliwiają regularną ocenę metryk uczciwości rurociągów produktowych. TFMA to biblioteka do oceny modeli uczenia maszynowego TensorFlow i innych niż TensorFlow. Pozwala oceniać modele na dużych ilościach danych w sposób rozproszony, obliczać na wykresie i inne metryki na różnych wycinkach danych i wizualizować je w notatnikach.
Uczciwość Wskaźniki są pakowane z TensorFlow sprawdzania poprawności danych (TFDV) a co jeśli narzędzie . Korzystanie ze wskaźników uczciwości umożliwia:
- Oceń wydajność modelu, podzieloną na określone grupy użytkowników
- Uzyskaj pewność wyników dzięki przedziałom ufności i ocenom przy wielu progach
- Oceń dystrybucję zbiorów danych
- Zanurz się głęboko w poszczególne plastry, aby zbadać podstawowe przyczyny i możliwości poprawy
W tym notebooku, można użyć rzetelności wskaźników pozwalających rozwiązać problemy uczciwości w modelu trenujesz używając Civil Komentarze zestawu danych . Obejrzyj ten film, aby uzyskać więcej informacji i kontekstu w scenariuszu tym świecie rzeczywistym, na których opiera się to również jeden z podstawowych motywacji do tworzenia rzetelności wskaźników.
Zbiór danych
W tym notebooku, będziesz pracować z Civil Komentarze zbiór danych , około 2 mln komentarze publiczne upublicznione przez cywilnych Komentarze platformę w 2017 dla prowadzonych badań. Wysiłek ten był sponsorowany przez Jigsawa , którzy gospodarzem zawodów na Kaggle ułatwia klasyfikowanie toksycznych komentarzach, a także zminimalizować niezamierzone modelu stronniczości.
Każdy indywidualny komentarz tekstowy w zbiorze danych ma etykietę toksyczności, przy czym etykieta to 1, jeśli komentarz jest toksyczny, a 0, jeśli komentarz nie jest toksyczny. W danych podzbiór komentarzy jest oznaczony różnymi atrybutami tożsamości, w tym kategoriami płci, orientacji seksualnej, religii, rasy lub pochodzenia etnicznego.
Ustawiać
Zainstalować fairness-indicators
i witwidget
.
pip install -q -U pip==20.2
pip install -q fairness-indicators
pip install -q witwidget
Po instalacji należy ponownie uruchomić środowisko wykonawcze Colab. Wybierz Runtime> Runtime Restart z menu Colab.
Nie kontynuuj pozostałej części tego samouczka bez uprzedniego ponownego uruchomienia środowiska wykonawczego.
Zaimportuj wszystkie inne wymagane biblioteki.
import os
import tempfile
import apache_beam as beam
import numpy as np
import pandas as pd
from datetime import datetime
import pprint
from google.protobuf import text_format
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv
from tfx_bsl.tfxio import tensor_adapter
from tfx_bsl.tfxio import tf_example_record
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view
from fairness_indicators.tutorial_utils import util
from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget
from tensorflow_metadata.proto.v0 import schema_pb2
Pobierz i przeanalizuj dane
Domyślnie ten notatnik pobiera wstępnie przetworzoną wersję tego zestawu danych, ale w razie potrzeby możesz użyć oryginalnego zestawu danych i ponownie uruchomić kroki przetwarzania. W oryginalnym zbiorze danych każdy komentarz jest oznaczony jako procent osób oceniających, które uważały, że komentarz odpowiada określonej tożsamości. Na przykład komentarz może być oznaczony w następujący sposób: { mężczyzna: 0,3, kobieta: 1,0, osoba transpłciowa: 0,0, heteroseksualna: 0,8, homoseksualna_gej_lub_lesbijka: 1,0 } Etap przetwarzania grupuje tożsamość według kategorii (płeć, orientacja_seksualna itp.) i usuwa tożsamości z wynikiem poniżej 0,5. Tak więc powyższy przykład zostałby przekonwertowany na następujący: oceniających, którzy wierzyli, że komentarz odpowiada określonej tożsamości. Na przykład komentarz miałby następującą etykietę: { płeć: [kobieta], orientacja_seksualna: [heteroseksualny, homoseksualny_gej_lub_lesbijka] }
download_original_data = False
if download_original_data:
train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')
# The identity terms list will be grouped together by their categories
# (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,
# text column and label column will be kept after processing.
train_tf_file = util.convert_comments_data(train_tf_file)
validate_tf_file = util.convert_comments_data(validate_tf_file)
else:
train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')
Użyj TFDV, aby przeanalizować dane i znaleźć w nich potencjalne problemy, takie jak brakujące wartości i nierównowagi danych, które mogą prowadzić do rozbieżności w sprawiedliwości.
stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)
tfdv.visualize_statistics(stats)
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features. WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)` WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_data_validation/utils/stats_util.py:247: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)`
TFDV pokazuje, że istnieją pewne znaczące nierównowagi w danych, które mogą prowadzić do stronniczych wyników modelu.
Etykieta toksyczności (wartość przewidywana przez model) jest niezrównoważona. Tylko 8% przykładów w zestawie uczącym jest toksycznych, co oznacza, że klasyfikator może uzyskać 92% dokładności, przewidując, że wszystkie komentarze są nietoksyczne.
W dziedzinach związanych z pojęciami tożsamości tylko 6,6 tys. z 1,08 miliona (0,61%) szkoleń dotyczy homoseksualizmu, a te związane z biseksualizmem są jeszcze rzadsze. Wskazuje to, że wydajność tych wycinków może ulec pogorszeniu z powodu braku danych uczących.
Przygotuj dane
Zdefiniuj mapę obiektów, aby przeanalizować dane. Każdy przykład będzie mieć etykietę, komentarz tekstowy, a tożsamość wyposażony sexual orientation
, gender
, religion
, race
i disability
, które są związane z tekstem.
BASE_DIR = tempfile.gettempdir()
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
FEATURE_MAP = {
# Label:
LABEL: tf.io.FixedLenFeature([], tf.float32),
# Text:
TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),
# Identities:
'sexual_orientation':tf.io.VarLenFeature(tf.string),
'gender':tf.io.VarLenFeature(tf.string),
'religion':tf.io.VarLenFeature(tf.string),
'race':tf.io.VarLenFeature(tf.string),
'disability':tf.io.VarLenFeature(tf.string),
}
Następnie skonfiguruj funkcję wejściową, która wprowadzi dane do modelu. Dodaj kolumnę wagi do każdego przykładu i zwiększ toksyczne przykłady, aby uwzględnić niezrównoważenie klasy zidentyfikowane przez TFDV. W fazie oceny używaj tylko funkcji tożsamości, ponieważ podczas uczenia do modelu są wprowadzane tylko komentarze.
def train_input_fn():
def parse_function(serialized):
parsed_example = tf.io.parse_single_example(
serialized=serialized, features=FEATURE_MAP)
# Adds a weight column to deal with unbalanced classes.
parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
return (parsed_example,
parsed_example[LABEL])
train_dataset = tf.data.TFRecordDataset(
filenames=[train_tf_file]).map(parse_function).batch(512)
return train_dataset
Trenuj modelkę
Twórz i trenuj model uczenia głębokiego na danych.
model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
"%Y%m%d-%H%M%S"))
embedded_text_feature_column = hub.text_embedding_column(
key=TEXT_FEATURE,
module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')
classifier = tf.estimator.DNNClassifier(
hidden_units=[500, 100],
weight_column='weight',
feature_columns=[embedded_text_feature_column],
optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.003),
loss_reduction=tf.losses.Reduction.SUM,
n_classes=2,
model_dir=model_dir)
classifier.train(input_fn=train_input_fn, steps=1000)
INFO:tensorflow:Using default config. INFO:tensorflow:Using default config. INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210923-205025', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} INFO:tensorflow:Using config: {'_model_dir': '/tmp/train/20210923-205025', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Saver not created because there are no variables in the graph to restore 2021-09-23 20:50:26.540914: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 808. Shape inference will have run different parts of the graph with different producer versions. INFO:tensorflow:Saver not created because there are no variables in the graph to restore WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:512: NumericColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: NumericColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/keras/optimizer_v2/adagrad.py:84: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Saving checkpoints for 0 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... INFO:tensorflow:loss = 59.34932, step = 0 INFO:tensorflow:loss = 59.34932, step = 0 INFO:tensorflow:global_step/sec: 108.435 INFO:tensorflow:global_step/sec: 108.435 INFO:tensorflow:loss = 56.416668, step = 100 (0.924 sec) INFO:tensorflow:loss = 56.416668, step = 100 (0.924 sec) INFO:tensorflow:global_step/sec: 116.367 INFO:tensorflow:global_step/sec: 116.367 INFO:tensorflow:loss = 47.250374, step = 200 (0.859 sec) INFO:tensorflow:loss = 47.250374, step = 200 (0.859 sec) INFO:tensorflow:global_step/sec: 116.333 INFO:tensorflow:global_step/sec: 116.333 INFO:tensorflow:loss = 55.81682, step = 300 (0.860 sec) INFO:tensorflow:loss = 55.81682, step = 300 (0.860 sec) INFO:tensorflow:global_step/sec: 116.844 INFO:tensorflow:global_step/sec: 116.844 INFO:tensorflow:loss = 55.814293, step = 400 (0.856 sec) INFO:tensorflow:loss = 55.814293, step = 400 (0.856 sec) INFO:tensorflow:global_step/sec: 114.434 INFO:tensorflow:global_step/sec: 114.434 INFO:tensorflow:loss = 41.805046, step = 500 (0.874 sec) INFO:tensorflow:loss = 41.805046, step = 500 (0.874 sec) INFO:tensorflow:global_step/sec: 115.693 INFO:tensorflow:global_step/sec: 115.693 INFO:tensorflow:loss = 45.53726, step = 600 (0.864 sec) INFO:tensorflow:loss = 45.53726, step = 600 (0.864 sec) INFO:tensorflow:global_step/sec: 115.772 INFO:tensorflow:global_step/sec: 115.772 INFO:tensorflow:loss = 51.17028, step = 700 (0.864 sec) INFO:tensorflow:loss = 51.17028, step = 700 (0.864 sec) INFO:tensorflow:global_step/sec: 116.131 INFO:tensorflow:global_step/sec: 116.131 INFO:tensorflow:loss = 47.696205, step = 800 (0.861 sec) INFO:tensorflow:loss = 47.696205, step = 800 (0.861 sec) INFO:tensorflow:global_step/sec: 115.609 INFO:tensorflow:global_step/sec: 115.609 INFO:tensorflow:loss = 47.800926, step = 900 (0.865 sec) INFO:tensorflow:loss = 47.800926, step = 900 (0.865 sec) INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000... INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1000... INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Saving checkpoints for 1000 into /tmp/train/20210923-205025/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000... INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1000... INFO:tensorflow:Loss for final step: 50.67367. INFO:tensorflow:Loss for final step: 50.67367. <tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f113351ebd0>
Przeanalizuj model
Po uzyskaniu wytrenowanego modelu przeanalizuj go, aby obliczyć metryki uczciwości przy użyciu wskaźników TFMA i uczciwości. Zacznij od eksportu modelu jako SavedModel .
Eksportuj zapisany model
def eval_input_receiver_fn():
serialized_tf_example = tf.compat.v1.placeholder(
dtype=tf.string, shape=[None], name='input_example_placeholder')
# This *must* be a dictionary containing a single key 'examples', which
# points to the input placeholder.
receiver_tensors = {'examples': serialized_tf_example}
features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
features['weight'] = tf.ones_like(features[LABEL])
return tfma.export.EvalInputReceiver(
features=features,
receiver_tensors=receiver_tensors,
labels=features[LABEL])
tfma_export_dir = tfma.export.export_eval_savedmodel(
estimator=classifier,
export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
eval_input_receiver_fn=eval_input_receiver_fn)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/encoding.py:141: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Saver not created because there are no variables in the graph to restore 2021-09-23 20:50:39.359797: W tensorflow/core/common_runtime/graph_constructor.cc:1511] Importing a graph with a lower producer version 26 into an existing graph with producer version 808. Shape inference will have run different parts of the graph with different producer versions. INFO:tensorflow:Saver not created because there are no variables in the graph to restore INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Classify: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Regress: None INFO:tensorflow:Signatures INCLUDED in export for Predict: None INFO:tensorflow:Signatures INCLUDED in export for Predict: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Train: None INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval'] INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval'] WARNING:tensorflow:Export includes no default signature! WARNING:tensorflow:Export includes no default signature! INFO:tensorflow:Restoring parameters from /tmp/train/20210923-205025/model.ckpt-1000 INFO:tensorflow:Restoring parameters from /tmp/train/20210923-205025/model.ckpt-1000 INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets added to graph. INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1632430239/assets INFO:tensorflow:Assets written to: /tmp/tfma_eval_model/temp-1632430239/assets INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1632430239/saved_model.pb INFO:tensorflow:SavedModel written to: /tmp/tfma_eval_model/temp-1632430239/saved_model.pb
Obliczanie metryk rzetelności
Wybierz tożsamość, dla której chcesz obliczyć metryki i czy uruchamiać z przedziałami ufności, korzystając z menu rozwijanego w panelu po prawej stronie.
Opcje obliczania wskaźników uczciwości
tfma_eval_result_path = os.path.join(BASE_DIR, 'tfma_eval_result')
slice_selection = 'sexual_orientation'
print(f'Slice selection: {slice_selection}')
compute_confidence_intervals = False
print(f'Compute confidence intervals: {compute_confidence_intervals}')
# Define slices that you want the evaluation to run on.
eval_config_pbtxt = """
model_specs {
label_key: "%s"
}
metrics_specs {
metrics {
class_name: "FairnessIndicators"
config: '{ "thresholds": [0.1, 0.3, 0.5, 0.7, 0.9] }'
}
}
slicing_specs {} # overall slice
slicing_specs {
feature_keys: ["%s"]
}
options {
compute_confidence_intervals { value: %s }
disabled_outputs { values: "analysis" }
}
""" % (LABEL, slice_selection, compute_confidence_intervals)
eval_config = text_format.Parse(eval_config_pbtxt, tfma.EvalConfig())
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=tfma_export_dir)
schema = text_format.Parse(
"""
tensor_representation_group {
key: ""
value {
tensor_representation {
key: "comment_text"
value {
dense_tensor {
column_name: "comment_text"
shape {}
}
}
}
}
}
feature {
name: "comment_text"
type: BYTES
}
feature {
name: "toxicity"
type: FLOAT
}
feature {
name: "sexual_orientation"
type: BYTES
}
feature {
name: "gender"
type: BYTES
}
feature {
name: "religion"
type: BYTES
}
feature {
name: "race"
type: BYTES
}
feature {
name: "disability"
type: BYTES
}
""", schema_pb2.Schema())
tfxio = tf_example_record.TFExampleRecord(
file_pattern=validate_tf_file,
schema=schema,
raw_record_column_name=tfma.ARROW_INPUT_COLUMN)
tensor_adapter_config = tensor_adapter.TensorAdapterConfig(
arrow_schema=tfxio.ArrowSchema(),
tensor_representations=tfxio.TensorRepresentations())
with beam.Pipeline() as pipeline:
(pipeline
| 'ReadFromTFRecordToArrow' >> tfxio.BeamSource()
| 'ExtractEvaluateAndWriteResults' >> tfma.ExtractEvaluateAndWriteResults(
eval_config=eval_config,
eval_shared_model=eval_shared_model,
output_path=tfma_eval_result_path,
tensor_adapter_config=tensor_adapter_config))
eval_result = tfma.load_eval_result(output_path=tfma_eval_result_path)
Slice selection: sexual_orientation Compute confidence intervals: False WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:169: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0. INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1632430239/variables/variables INFO:tensorflow:Restoring parameters from /tmp/tfma_eval_model/1632430239/variables/variables WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/eval_saved_model/graph_ref.py:189: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version. Instructions for updating: This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.get_tensor_from_tensor_info or tf.compat.v1.saved_model.get_tensor_from_tensor_info. WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching:
Wizualizuj dane za pomocą narzędzia What-if
W tej sekcji użyjesz interaktywnego interfejsu wizualnego narzędzia What-If Tool, aby eksplorować i manipulować danymi na poziomie mikro.
Każdy punkt na wykresie punktowym na panelu po prawej stronie reprezentuje jeden z przykładów w podzbiorze załadowanym do narzędzia. Kliknij jeden z punktów, aby zobaczyć szczegóły dotyczące tego konkretnego przykładu w panelu po lewej stronie. Wyświetlany jest tekst komentarza, toksyczność podstawowa i odpowiednie tożsamości. U dołu tego panelu po lewej stronie widać wyniki wnioskowania z właśnie wytrenowanego modelu.
Zmodyfikować tekst przykład, a następnie kliknij przycisk Uruchom wnioskowania do widoku jak zmiany spowodowane przewidywania toksyczności postrzeganych zmian.
DEFAULT_MAX_EXAMPLES = 1000
# Load 100000 examples in memory. When first rendered,
# What-If Tool should only display 1000 of these due to browser constraints.
def wit_dataset(file, num_examples=100000):
dataset = tf.data.TFRecordDataset(
filenames=[file]).take(num_examples)
return [tf.train.Example.FromString(d.numpy()) for d in dataset]
wit_data = wit_dataset(train_tf_file)
config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(
classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)
wit = WitWidget(config_builder)
Wskaźniki rzetelności renderowania
Renderuj widżet Wskaźniki rzetelności z wyeksportowanymi wynikami oceny.
Poniżej zobaczysz wykresy słupkowe przedstawiające wydajność każdego wycinka danych na wybranych metrykach. Wycinek porównania linii bazowej oraz wyświetlane progi można dostosować za pomocą menu rozwijanych u góry wizualizacji.
Widżet Fairness Indicator jest zintegrowany z wyrenderowanym powyżej narzędziem What-If. Jeśli wybierzesz jeden wycinek danych na wykresie słupkowym, narzędzie What-If zaktualizuje się, pokazując przykłady z wybranego wycinka. Kiedy ładuje danych w co-jeśli narzędzie powyżej, spróbuj modyfikacji koloru poprzez toksyczności. To może dać ci wizualne zrozumienie bilansu toksyczności przykładów po wycinku.
event_handlers={'slice-selected':
wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}
widget_view.render_fairness_indicator(eval_result=eval_result,
slicing_column=slice_selection,
event_handlers=event_handlers
)
FairnessIndicatorViewer(slicingMetrics=[{'sliceValue': 'Overall', 'slice': 'Overall', 'metrics': {'prediction/…
W przypadku tego konkretnego zestawu danych i zadania systematycznie wyższe wskaźniki wyników fałszywie pozytywnych i fałszywie negatywnych dla niektórych tożsamości mogą prowadzić do negatywnych konsekwencji. Na przykład w systemie moderacji treści wyższy niż ogólny wskaźnik fałszywych trafień dla pewnej grupy może prowadzić do wyciszenia tych głosów. Dlatego ważne jest, aby regularnie oceniać tego rodzaju kryteria podczas opracowywania i ulepszania modeli oraz wykorzystywać narzędzia, takie jak wskaźniki uczciwości, TFDV i WIT, aby pomóc w wyjaśnieniu potencjalnych problemów. Po zidentyfikowaniu problemów ze sprawiedliwością możesz poeksperymentować z nowymi źródłami danych, równoważeniem danych lub innymi technikami, aby poprawić wydajność w grupach o niskiej skuteczności.
Zobacz tutaj , aby uzyskać więcej informacji i wskazówek na temat korzystania z rzetelności wskaźników.
Wykorzystaj wyniki oceny uczciwości
eval_result
obiekt, wydanego powyżej w render_fairness_indicator()
, posiada własny interfejs API, który można wykorzystać do odczytania wyników TFMA w swoich programach.
Uzyskaj ocenione wycinki i metryki
Stosować get_slice_names()
i get_metric_names()
, aby uzyskać ocenianym plasterki i metryki, odpowiednio.
pp = pprint.PrettyPrinter()
print("Slices:")
pp.pprint(eval_result.get_slice_names())
print("\nMetrics:")
pp.pprint(eval_result.get_metric_names())
Slices: [(), (('sexual_orientation', 'homosexual_gay_or_lesbian'),), (('sexual_orientation', 'heterosexual'),), (('sexual_orientation', 'bisexual'),), (('sexual_orientation', 'other_sexual_orientation'),)] Metrics: ['fairness_indicators_metrics/negative_rate@0.1', 'fairness_indicators_metrics/positive_rate@0.7', 'fairness_indicators_metrics/false_discovery_rate@0.9', 'fairness_indicators_metrics/false_negative_rate@0.3', 'fairness_indicators_metrics/false_omission_rate@0.1', 'accuracy', 'fairness_indicators_metrics/false_discovery_rate@0.7', 'fairness_indicators_metrics/false_negative_rate@0.7', 'label/mean', 'fairness_indicators_metrics/true_positive_rate@0.5', 'fairness_indicators_metrics/false_positive_rate@0.1', 'recall', 'fairness_indicators_metrics/false_omission_rate@0.7', 'fairness_indicators_metrics/false_positive_rate@0.7', 'auc_precision_recall', 'fairness_indicators_metrics/negative_rate@0.7', 'fairness_indicators_metrics/negative_rate@0.3', 'fairness_indicators_metrics/false_discovery_rate@0.3', 'fairness_indicators_metrics/true_negative_rate@0.9', 'fairness_indicators_metrics/false_omission_rate@0.3', 'fairness_indicators_metrics/false_negative_rate@0.1', 'fairness_indicators_metrics/true_negative_rate@0.3', 'fairness_indicators_metrics/true_positive_rate@0.7', 'fairness_indicators_metrics/false_positive_rate@0.3', 'fairness_indicators_metrics/true_positive_rate@0.1', 'fairness_indicators_metrics/true_positive_rate@0.9', 'fairness_indicators_metrics/false_negative_rate@0.9', 'fairness_indicators_metrics/positive_rate@0.5', 'fairness_indicators_metrics/positive_rate@0.9', 'fairness_indicators_metrics/negative_rate@0.9', 'fairness_indicators_metrics/true_negative_rate@0.1', 'fairness_indicators_metrics/false_omission_rate@0.5', 'post_export_metrics/example_count', 'fairness_indicators_metrics/false_omission_rate@0.9', 'fairness_indicators_metrics/negative_rate@0.5', 'fairness_indicators_metrics/false_positive_rate@0.5', 'fairness_indicators_metrics/positive_rate@0.3', 'prediction/mean', 'accuracy_baseline', 'fairness_indicators_metrics/true_negative_rate@0.5', 'fairness_indicators_metrics/false_discovery_rate@0.5', 'fairness_indicators_metrics/false_discovery_rate@0.1', 'precision', 'fairness_indicators_metrics/false_positive_rate@0.9', 'fairness_indicators_metrics/true_positive_rate@0.3', 'auc', 'average_loss', 'fairness_indicators_metrics/positive_rate@0.1', 'fairness_indicators_metrics/false_negative_rate@0.5', 'fairness_indicators_metrics/true_negative_rate@0.7']
Użyj get_metrics_for_slice()
, aby uzyskać dane dla określonego wycinka jak słownik mapowania nazw metrycznych do wartości metryki .
baseline_slice = ()
heterosexual_slice = (('sexual_orientation', 'heterosexual'),)
print("Baseline metric values:")
pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))
print("\nHeterosexual metric values:")
pp.pprint(eval_result.get_metrics_for_slice(heterosexual_slice))
Baseline metric values: {'accuracy': {'doubleValue': 0.7174859642982483}, 'accuracy_baseline': {'doubleValue': 0.9198060631752014}, 'auc': {'doubleValue': 0.796409547328949}, 'auc_precision_recall': {'doubleValue': 0.3000231087207794}, 'average_loss': {'doubleValue': 0.5615971088409424}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9139404145348933}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8796606156634021}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.816806708107944}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7090802784427505}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4814937210839392}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006079867348348763}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08696628437197734}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2705713693519414}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5445108470360647}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.891598728755009}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006604499315158452}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017811407791031682}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03187681488249431}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04993640137936933}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07271999842219298}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9202700382800194}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5818879187535954}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28355525303665063}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09679333307231039}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.00877639469079322}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07382367199944595}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.39155620195304386}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6806884133250225}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8744414433132488}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9832342960038783}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.926176328000554}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6084437980469561}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.3193115866749775}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12555855668675117}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016765703996121616}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0797299617199806}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.41811208124640464}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7164447469633494}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9032066669276896}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9912236053092068}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9939201326516512}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9130337156280227}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7294286306480586}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45548915296393533}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10840127124499102}, 'label/mean': {'doubleValue': 0.08019392192363739}, 'post_export_metrics/example_count': {'doubleValue': 721950.0}, 'precision': {'doubleValue': 0.18319329619407654}, 'prediction/mean': {'doubleValue': 0.3998037576675415}, 'recall': {'doubleValue': 0.7294286489486694} } Heterosexual metric values: {'accuracy': {'doubleValue': 0.5203251838684082}, 'accuracy_baseline': {'doubleValue': 0.7601625919342041}, 'auc': {'doubleValue': 0.6672822833061218}, 'auc_precision_recall': {'doubleValue': 0.4065391719341278}, 'average_loss': {'doubleValue': 0.8273133039474487}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7272727272727273}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7062937062937062}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.655367231638418}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4473684210526316}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0847457627118644}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.288135593220339}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8220338983050848}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.10416666666666667}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.1650485436893204}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.18095238095238095}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.21365638766519823}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5401069518716578}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.31016042780748665}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.045454545454545456}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.1951219512195122}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4186991869918699}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6402439024390244}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9227642276422764}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8048780487804879}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5813008130081301}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3597560975609756}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07723577235772358}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.45989304812834225}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.6898395721925134}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9545454545454546}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9152542372881356}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.711864406779661}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.17796610169491525}, 'label/mean': {'doubleValue': 0.2398373931646347}, 'post_export_metrics/example_count': {'doubleValue': 492.0}, 'precision': {'doubleValue': 0.2937062978744507}, 'prediction/mean': {'doubleValue': 0.5578703880310059}, 'recall': {'doubleValue': 0.7118644118309021} }
Stosować get_metrics_for_all_slices()
, aby uzyskać dane dla wszystkich plastrów jak słownik mapowania każdy kawałek do odpowiednich metryk słownik można uzyskać z systemem get_metrics_for_slice()
na nim.
pp.pprint(eval_result.get_metrics_for_all_slices())
{(): {'accuracy': {'doubleValue': 0.7174859642982483}, 'accuracy_baseline': {'doubleValue': 0.9198060631752014}, 'auc': {'doubleValue': 0.796409547328949}, 'auc_precision_recall': {'doubleValue': 0.3000231087207794}, 'average_loss': {'doubleValue': 0.5615971088409424}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.9139404145348933}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.8796606156634021}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.816806708107944}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7090802784427505}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4814937210839392}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.006079867348348763}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.08696628437197734}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.2705713693519414}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.5445108470360647}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.891598728755009}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.006604499315158452}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.017811407791031682}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.03187681488249431}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.04993640137936933}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.07271999842219298}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9202700382800194}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.5818879187535954}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.28355525303665063}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.09679333307231039}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.00877639469079322}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.07382367199944595}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.39155620195304386}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.6806884133250225}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.8744414433132488}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9832342960038783}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.926176328000554}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.6084437980469561}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.3193115866749775}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.12555855668675117}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.016765703996121616}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0797299617199806}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.41811208124640464}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.7164447469633494}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.9032066669276896}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9912236053092068}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9939201326516512}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9130337156280227}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7294286306480586}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.45548915296393533}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.10840127124499102}, 'label/mean': {'doubleValue': 0.08019392192363739}, 'post_export_metrics/example_count': {'doubleValue': 721950.0}, 'precision': {'doubleValue': 0.18319329619407654}, 'prediction/mean': {'doubleValue': 0.3998037576675415}, 'recall': {'doubleValue': 0.7294286489486694} }, (('sexual_orientation', 'bisexual'),): {'accuracy': {'doubleValue': 0.5258620977401733}, 'accuracy_baseline': {'doubleValue': 0.8017241358757019}, 'auc': {'doubleValue': 0.6252922415733337}, 'auc_precision_recall': {'doubleValue': 0.3546649217605591}, 'average_loss': {'doubleValue': 0.7461641430854797}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7870370370370371}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7816091954022989}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7666666666666667}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.7037037037037037}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.17391304347826086}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.391304347826087}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.6521739130434783}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.9130434782608695}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.13793103448275862}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.16071428571428573}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.16853932584269662}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.18421052631578946}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9139784946236559}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7311827956989247}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4946236559139785}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.20430107526881722}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.06896551724137931}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.25}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4827586206896552}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.7672413793103449}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9827586206896551}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9310344827586207}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.75}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5172413793103449}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.23275862068965517}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.017241379310344827}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.08602150537634409}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.26881720430107525}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5053763440860215}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7956989247311828}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.8260869565217391}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.6086956521739131}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.34782608695652173}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.08695652173913043}, 'label/mean': {'doubleValue': 0.1982758641242981}, 'post_export_metrics/example_count': {'doubleValue': 116.0}, 'precision': {'doubleValue': 0.23333333432674408}, 'prediction/mean': {'doubleValue': 0.4908219575881958}, 'recall': {'doubleValue': 0.6086956262588501} }, (('sexual_orientation', 'heterosexual'),): {'accuracy': {'doubleValue': 0.5203251838684082}, 'accuracy_baseline': {'doubleValue': 0.7601625919342041}, 'auc': {'doubleValue': 0.6672822833061218}, 'auc_precision_recall': {'doubleValue': 0.4065391719341278}, 'average_loss': {'doubleValue': 0.8273133039474487}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7541666666666667}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.7272727272727273}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.7062937062937062}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.655367231638418}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4473684210526316}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0847457627118644}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.288135593220339}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4830508474576271}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8220338983050848}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.10416666666666667}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.1650485436893204}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.18095238095238095}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.21365638766519823}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9679144385026738}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7700534759358288}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5401069518716578}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.31016042780748665}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.045454545454545456}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.024390243902439025}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.1951219512195122}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4186991869918699}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6402439024390244}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.9227642276422764}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.975609756097561}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8048780487804879}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5813008130081301}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.3597560975609756}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.07723577235772358}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.03208556149732621}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.22994652406417113}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.45989304812834225}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.6898395721925134}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9545454545454546}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9152542372881356}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.711864406779661}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5169491525423728}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.17796610169491525}, 'label/mean': {'doubleValue': 0.2398373931646347}, 'post_export_metrics/example_count': {'doubleValue': 492.0}, 'precision': {'doubleValue': 0.2937062978744507}, 'prediction/mean': {'doubleValue': 0.5578703880310059}, 'recall': {'doubleValue': 0.7118644118309021} }, (('sexual_orientation', 'homosexual_gay_or_lesbian'),): {'accuracy': {'doubleValue': 0.5851936340332031}, 'accuracy_baseline': {'doubleValue': 0.7182232141494751}, 'auc': {'doubleValue': 0.7057511806488037}, 'auc_precision_recall': {'doubleValue': 0.469566285610199}, 'average_loss': {'doubleValue': 0.7369641661643982}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.7107050831576481}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.6717557251908397}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6172690763052209}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5427319211102994}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.4092664092664093}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0016168148746968471}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.06143896523848019}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.22958771220695232}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.4939369442198868}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.8763136620856912}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 0.01652892561983471}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.08909730363423213}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.14947368421052631}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.20225091029460443}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.2624061970467199}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 0.9622581668252458}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.7535680304471931}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.4874722486520774}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.2356485886457342}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.03361877576910879}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0275626423690205}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.19430523917995443}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4328018223234624}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6881548974943053}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.941002277904328}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 0.9724373576309795}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8056947608200455}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.5671981776765376}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.31184510250569475}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.05899772209567198}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0377418331747542}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.24643196955280686}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5125277513479226}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.7643514113542658}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 0.9663812242308912}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 0.9983831851253031}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 0.9385610347615198}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 0.7704122877930477}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 0.5060630557801131}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 0.12368633791430882}, 'label/mean': {'doubleValue': 0.2817767560482025}, 'post_export_metrics/example_count': {'doubleValue': 4390.0}, 'precision': {'doubleValue': 0.3827309310436249}, 'prediction/mean': {'doubleValue': 0.5428739786148071}, 'recall': {'doubleValue': 0.770412266254425} }, (('sexual_orientation', 'other_sexual_orientation'),): {'accuracy': {'doubleValue': 0.6000000238418579}, 'accuracy_baseline': {'doubleValue': 0.800000011920929}, 'auc': {'doubleValue': 1.0}, 'auc_precision_recall': {'doubleValue': 1.0}, 'average_loss': {'doubleValue': 0.7521011829376221}, 'fairness_indicators_metrics/false_discovery_rate@0.1': {'doubleValue': 0.8}, 'fairness_indicators_metrics/false_discovery_rate@0.3': {'doubleValue': 0.75}, 'fairness_indicators_metrics/false_discovery_rate@0.5': {'doubleValue': 0.6666666666666666}, 'fairness_indicators_metrics/false_discovery_rate@0.7': {'doubleValue': 0.5}, 'fairness_indicators_metrics/false_discovery_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.3': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.5': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.7': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_negative_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.1': {'doubleValue': 'NaN'}, 'fairness_indicators_metrics/false_omission_rate@0.3': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.5': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.7': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_omission_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/false_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/false_positive_rate@0.3': {'doubleValue': 0.75}, 'fairness_indicators_metrics/false_positive_rate@0.5': {'doubleValue': 0.5}, 'fairness_indicators_metrics/false_positive_rate@0.7': {'doubleValue': 0.25}, 'fairness_indicators_metrics/false_positive_rate@0.9': {'doubleValue': 0.0}, 'fairness_indicators_metrics/negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/negative_rate@0.3': {'doubleValue': 0.2}, 'fairness_indicators_metrics/negative_rate@0.5': {'doubleValue': 0.4}, 'fairness_indicators_metrics/negative_rate@0.7': {'doubleValue': 0.6}, 'fairness_indicators_metrics/negative_rate@0.9': {'doubleValue': 0.8}, 'fairness_indicators_metrics/positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/positive_rate@0.3': {'doubleValue': 0.8}, 'fairness_indicators_metrics/positive_rate@0.5': {'doubleValue': 0.6}, 'fairness_indicators_metrics/positive_rate@0.7': {'doubleValue': 0.4}, 'fairness_indicators_metrics/positive_rate@0.9': {'doubleValue': 0.2}, 'fairness_indicators_metrics/true_negative_rate@0.1': {'doubleValue': 0.0}, 'fairness_indicators_metrics/true_negative_rate@0.3': {'doubleValue': 0.25}, 'fairness_indicators_metrics/true_negative_rate@0.5': {'doubleValue': 0.5}, 'fairness_indicators_metrics/true_negative_rate@0.7': {'doubleValue': 0.75}, 'fairness_indicators_metrics/true_negative_rate@0.9': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.1': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.3': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.5': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.7': {'doubleValue': 1.0}, 'fairness_indicators_metrics/true_positive_rate@0.9': {'doubleValue': 1.0}, 'label/mean': {'doubleValue': 0.20000000298023224}, 'post_export_metrics/example_count': {'doubleValue': 5.0}, 'precision': {'doubleValue': 0.3333333432674408}, 'prediction/mean': {'doubleValue': 0.6101843118667603}, 'recall': {'doubleValue': 1.0} } }