Ver en TensorFlow.org | Ejecutar en Google Colab | Ver en GitHub | Descargar cuaderno |
Fondo
Este cuaderno demuestra cómo generar una tarjeta modelo utilizando el kit de herramientas de tarjeta modelo con un modelo scikit-learn en un entorno Jupyter / Colab. Usted puede aprender más sobre las tarjetas modelo en https://modelcards.withgoogle.com/about .
Configuración
Primero necesitamos instalar e importar los paquetes necesarios.
Actualice a Pip 20.2 e instale paquetes
pip install -q --upgrade pip==20.2
pip install -q -U seaborn scikit-learn model-card-toolkit
¿Reinició el tiempo de ejecución?
Si está utilizando Google Colab, la primera vez que ejecuta la celda anterior, debe reiniciar el tiempo de ejecución (Tiempo de ejecución> Reiniciar tiempo de ejecución ...).
Importar paquetes
Importamos los paquetes necesarios, incluido scikit-learn.
from datetime import date
from io import BytesIO
from IPython import display
from model_card_toolkit import ModelCardToolkit
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_roc_curve, plot_confusion_matrix
import base64
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import uuid
Cargar datos
En este ejemplo se utiliza el conjunto de datos de mama Cáncer de Wisconsin de diagnóstico que scikit-learn puede cargar mediante el load_breast_cancer () función.
cancer = load_breast_cancer()
X = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y = pd.Series(cancer.target)
X_train, X_test, y_train, y_test = train_test_split(X, y)
X_train.head()
y_train.head()
28 0 157 1 381 1 436 1 71 1 dtype: int64
Trazar datos
Crearemos varias parcelas a partir de los datos que incluiremos en la ficha modelo.
# Utility function that will export a plot to a base-64 encoded string that the model card will accept.
def plot_to_str():
img = BytesIO()
plt.savefig(img, format='png')
return base64.encodebytes(img.getvalue()).decode('utf-8')
# Plot the mean radius feature for both the train and test sets
sns.displot(x=X_train['mean radius'], hue=y_train)
mean_radius_train = plot_to_str()
sns.displot(x=X_test['mean radius'], hue=y_test)
mean_radius_test = plot_to_str()
# Plot the mean texture feature for both the train and test sets
sns.displot(x=X_train['mean texture'], hue=y_train)
mean_texture_train = plot_to_str()
sns.displot(x=X_test['mean texture'], hue=y_test)
mean_texture_test = plot_to_str()
Modelo de tren
# Create a classifier and fit the training data
clf = GradientBoostingClassifier().fit(X_train, y_train)
Evaluar modelo
# Plot a ROC curve
plot_roc_curve(clf, X_test, y_test)
roc_curve = plot_to_str()
# Plot a confusion matrix
plot_confusion_matrix(clf, X_test, y_test)
confusion_matrix = plot_to_str()
Crea una tarjeta modelo
Inicializar el kit de herramientas y la tarjeta modelo
mct = ModelCardToolkit()
model_card = mct.scaffold_assets()
Anotar información en la tarjeta modelo
model_card.model_details.name = 'Breast Cancer Wisconsin (Diagnostic) Dataset'
model_card.model_details.overview = (
'This model predicts whether breast cancer is benign or malignant based on '
'image measurements.')
model_card.model_details.owners = [
{'name': 'Model Cards Team', 'contact': 'model-cards@google.com'}
]
model_card.model_details.references = [
'https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)',
'https://minds.wisconsin.edu/bitstream/handle/1793/59692/TR1131.pdf'
]
model_card.model_details.version.name = str(uuid.uuid4())
model_card.model_details.version.date = str(date.today())
model_card.considerations.ethical_considerations = [{
'name': ('Manual selection of image sections to digitize could create '
'selection bias'),
'mitigation_strategy': 'Automate the selection process'
}]
model_card.considerations.limitations = ['Breast cancer diagnosis']
model_card.considerations.use_cases = ['Breast cancer diagnosis']
model_card.considerations.users = ['Medical professionals', 'ML researchers']
model_card.model_parameters.data.train.graphics.description = (
f'{len(X_train)} rows with {len(X_train.columns)} features')
model_card.model_parameters.data.train.graphics.collection = [
{'image': mean_radius_train},
{'image': mean_texture_train}
]
model_card.model_parameters.data.eval.graphics.description = (
f'{len(X_test)} rows with {len(X_test.columns)} features')
model_card.model_parameters.data.eval.graphics.collection = [
{'image': mean_radius_test},
{'image': mean_texture_test}
]
model_card.quantitative_analysis.graphics.description = (
'ROC curve and confusion matrix')
model_card.quantitative_analysis.graphics.collection = [
{'image': roc_curve},
{'image': confusion_matrix}
]
mct.update_model_card_json(model_card)
Generar tarjeta modelo
# Return the model card document as an HTML page
html = mct.export_format()
display.display(display.HTML(html))