Rekayasa ML yang Lebih Baik dengan Metadata ML

Asumsikan skenario di mana Anda menyiapkan pipeline ML produksi untuk mengklasifikasikan penguin. Pipeline menyerap data pelatihan Anda, melatih dan mengevaluasi model, dan mendorongnya ke produksi.

Namun, ketika nanti Anda mencoba menggunakan model ini dengan kumpulan data yang lebih besar yang berisi berbagai jenis penguin, Anda mengamati bahwa model Anda tidak berperilaku seperti yang diharapkan dan mulai mengklasifikasikan spesies dengan tidak benar.

Pada titik ini, Anda tertarik untuk mengetahui:

Apa cara paling efisien untuk men-debug model ketika satu-satunya artefak yang tersedia adalah model dalam produksi?
Dataset pelatihan mana yang digunakan untuk melatih model?
Pelatihan mana yang menyebabkan model yang salah ini?
Dimana hasil evaluasi model?
Di mana untuk memulai debugging?

ML Metadata (MLMD) adalah perpustakaan yang memanfaatkan metadata yang terkait dengan model ML untuk membantu Anda menjawab pertanyaan-pertanyaan ini dan banyak lagi. Analogi yang membantu adalah dengan menganggap metadata ini setara dengan masuk dalam pengembangan perangkat lunak. MLMD memungkinkan Anda melacak artefak dan silsilah yang terkait dengan berbagai komponen pipeline ML Anda dengan andal.

Dalam tutorial ini, Anda menyiapkan TFX Pipeline untuk membuat model yang mengklasifikasikan penguin menjadi tiga spesies berdasarkan massa tubuh dan panjang serta kedalaman culmens mereka, dan panjang sirip mereka. Anda kemudian menggunakan MLMD untuk melacak garis keturunan komponen pipa.

Pipeline TFX di Colab

Colab adalah lingkungan pengembangan ringan yang berbeda secara signifikan dari lingkungan produksi. Dalam produksi, Anda mungkin memiliki berbagai komponen saluran seperti penyerapan data, transformasi, pelatihan model, riwayat proses, dll. di beberapa sistem terdistribusi. Untuk tutorial ini, Anda harus menyadari bahwa ada perbedaan signifikan dalam penyimpanan Orkestrasi dan Metadata - semuanya ditangani secara lokal dalam Colab. Pelajari lebih lanjut tentang TFX di CoLab sini .

Mempersiapkan

Pertama, kami menginstal dan mengimpor paket yang diperlukan, mengatur jalur, dan mengunduh data.

Tingkatkan Pip

Untuk menghindari mengupgrade Pip di sistem saat berjalan secara lokal, periksa untuk memastikan bahwa kami berjalan di Colab. Sistem lokal tentu saja dapat ditingkatkan secara terpisah.

try:
  import colab
  !pip install --upgrade pip
except:
  pass

Instal dan impor TFX

pip install -q -U tfx

paket impor

Apakah Anda me-restart runtime?

Jika Anda menggunakan Google Colab, pertama kali menjalankan sel di atas, Anda harus memulai ulang runtime dengan mengeklik tombol "RESTART RUNTIME" di atas atau menggunakan menu "Runtime > Restart runtime ...". Ini karena cara Colab memuat paket.

import os
import tempfile
import urllib
import pandas as pd

import tensorflow_model_analysis as tfma
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

Periksa versi TFX, dan MLMD.

from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
import ml_metadata as mlmd
print('MLMD version: {}'.format(mlmd.__version__))

TFX version: 1.4.0
MLMD version: 1.4.0

Unduh kumpulan data

Dalam colab ini, kita menggunakan dataset Palmer Penguins yang dapat ditemukan di Github . Kami memproses dataset dengan meninggalkan apapun catatan yang tidak lengkap, dan tetes island dan sex kolom, dan diubah label untuk int32 . Dataset berisi 334 catatan dari massa tubuh dan panjang dan kedalaman culmens penguin, dan panjang sirip mereka. Anda menggunakan data ini untuk mengklasifikasikan penguin menjadi salah satu dari tiga spesies.

DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
_data_root = tempfile.mkdtemp(prefix='tfx-data')
_data_filepath = os.path.join(_data_root, "penguins_processed.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

('/tmp/tfx-datal9104odr/penguins_processed.csv',
 <http.client.HTTPMessage at 0x7f9c6d8d2290>)

Buat Konteks Interaktif

Untuk menjalankan komponen TFX interaktif dalam notebook ini, membuat InteractiveContext . The InteractiveContext menggunakan direktori sementara dengan MLMD contoh database singkat. Perhatikan bahwa panggilan ke InteractiveContext ada-ops di luar lingkungan CoLab.

Secara umum, itu adalah praktik yang baik untuk kelompok berjalan pipa yang sama di bawah Context .

interactive_context = InteractiveContext()

WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8 as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/metadata.sqlite.

Bangun Pipa TFX

Pipeline TFX terdiri dari beberapa komponen yang menjalankan berbagai aspek alur kerja ML. Dalam notebook ini, Anda membuat dan menjalankan ExampleGen , StatisticsGen , SchemaGen , dan Trainer komponen dan menggunakan Evaluator dan Pusher komponen untuk mengevaluasi dan mendorong model terlatih.

Mengacu pada tutorial komponen untuk informasi lebih lanjut tentang TFX komponen pipa.

Instansiasi dan jalankan Komponen ExampleGen

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
interactive_context.run(example_gen)

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.

Instansiasi dan jalankan Komponen StatisticsGen

statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
interactive_context.run(statistics_gen)

WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.

Instansiasi dan jalankan Komponen SchemaGen

infer_schema = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'], infer_feature_shape=True)
interactive_context.run(infer_schema)

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1205 11:16:00.941947  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type

Instansiasi dan jalankan Komponen Pelatih

# Define the module file for the Trainer component
trainer_module_file = 'penguin_trainer.py'

%%writefile {trainer_module_file}

# Define the training algorithm for the Trainer module file
import os
from typing import List, Text

import tensorflow as tf
from tensorflow import keras

from tfx import v1 as tfx
from tfx_bsl.public import tfxio

from tensorflow_metadata.proto.v0 import schema_pb2

# Features used for classification - culmen length and depth, flipper length,
# body mass, and species.

_LABEL_KEY = 'species'

_FEATURE_KEYS = [
    'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'body_mass_g'
]


def _input_fn(file_pattern: List[Text],
              data_accessor: tfx.components.DataAccessor,
              schema: schema_pb2.Schema, batch_size: int) -> tf.data.Dataset:
  return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY), schema).repeat()


def _build_keras_model():
  inputs = [keras.layers.Input(shape=(1,), name=f) for f in _FEATURE_KEYS]
  d = keras.layers.concatenate(inputs)
  d = keras.layers.Dense(8, activation='relu')(d)
  d = keras.layers.Dense(8, activation='relu')(d)
  outputs = keras.layers.Dense(3)(d)
  model = keras.Model(inputs=inputs, outputs=outputs)
  model.compile(
      optimizer=keras.optimizers.Adam(1e-2),
      loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[keras.metrics.SparseCategoricalAccuracy()])
  return model


def run_fn(fn_args: tfx.components.FnArgs):
  schema = schema_pb2.Schema()
  tfx.utils.parse_pbtxt_file(fn_args.schema_path, schema)
  train_dataset = _input_fn(
      fn_args.train_files, fn_args.data_accessor, schema, batch_size=10)
  eval_dataset = _input_fn(
      fn_args.eval_files, fn_args.data_accessor, schema, batch_size=10)
  model = _build_keras_model()
  model.fit(
      train_dataset,
      epochs=int(fn_args.train_steps / 20),
      steps_per_epoch=20,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps)
  model.save(fn_args.serving_model_dir, save_format='tf')

Writing penguin_trainer.py

Jalankan Trainer komponen.

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(trainer_module_file),
    examples=example_gen.outputs['examples'],
    schema=infer_schema.outputs['schema'],
    train_args=tfx.proto.TrainArgs(num_steps=100),
    eval_args=tfx.proto.EvalArgs(num_steps=50))
interactive_context.run(trainer)

running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying penguin_trainer.py -> build/lib
installing to /tmp/tmpum1crtxy
running install
running install_lib
copying build/lib/penguin_trainer.py -> /tmp/tmpum1crtxy
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmpum1crtxy/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3.7.egg-info
running install_scripts
creating /tmp/tmpum1crtxy/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL
creating '/tmp/tmpo87nn6ey/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl' and adding '/tmp/tmpum1crtxy' to it
adding 'penguin_trainer.py'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4.dist-info/RECORD'
removing /tmp/tmpum1crtxy
/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,
listing git files failed - pretending there aren't any
I1205 11:16:01.389324  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
I1205 11:16:01.392832  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
Processing /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/_wheels/tfx_user_code_Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4-py3-none-any.whl
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+fef7c4ed90dc336ca26daee59d65660cf8da5fa988b2ca0c89df2f558fda10f4
Epoch 1/5
20/20 [==============================] - 1s 11ms/step - loss: 0.9891 - sparse_categorical_accuracy: 0.4300 - val_loss: 0.9594 - val_sparse_categorical_accuracy: 0.4800
Epoch 2/5
20/20 [==============================] - 0s 6ms/step - loss: 0.8369 - sparse_categorical_accuracy: 0.6350 - val_loss: 0.7484 - val_sparse_categorical_accuracy: 0.8200
Epoch 3/5
20/20 [==============================] - 0s 6ms/step - loss: 0.5289 - sparse_categorical_accuracy: 0.8350 - val_loss: 0.5068 - val_sparse_categorical_accuracy: 0.7800
Epoch 4/5
20/20 [==============================] - 0s 6ms/step - loss: 0.4481 - sparse_categorical_accuracy: 0.7800 - val_loss: 0.4125 - val_sparse_categorical_accuracy: 0.8600
Epoch 5/5
20/20 [==============================] - 0s 6ms/step - loss: 0.3068 - sparse_categorical_accuracy: 0.8650 - val_loss: 0.3279 - val_sparse_categorical_accuracy: 0.8300
2021-12-05 11:16:06.493168: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/Trainer/model/4/Format-Serving/assets
INFO:tensorflow:Assets written to: /tmp/tfx-interactive-2021-12-05T11_15_56.285625-5hcexlo8/Trainer/model/4/Format-Serving/assets

Evaluasi dan dorong modelnya

Gunakan Evaluator komponen untuk mengevaluasi dan 'memberkati' model sebelum menggunakan Pusher komponen untuk mendorong model ke direktori melayani.

_serving_model_dir = os.path.join(tempfile.mkdtemp(),
                                  'serving_model/penguins_classification')

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='species', signature_name='serving_default')
    ],
    metrics_specs=[
        tfma.MetricsSpec(metrics=[
            tfma.MetricConfig(
                class_name='SparseCategoricalAccuracy',
                threshold=tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(
                        lower_bound={'value': 0.6})))
        ])
    ],
    slicing_specs=[tfma.SlicingSpec()])

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    schema=infer_schema.outputs['schema'],
    eval_config=eval_config)
interactive_context.run(evaluator)

I1205 11:16:07.075275  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
I1205 11:16:07.078761  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
interactive_context.run(pusher)

I1205 11:16:11.935312  6108 rdbms_metadata_access_object.cc:686] No property is defined for the Type

Menjalankan pipeline TFX akan mengisi Database MLMD. Di bagian berikutnya, Anda menggunakan MLMD API untuk mengkueri database ini untuk informasi metadata.

Kueri Basis Data MLMD

Basis data MLMD menyimpan tiga jenis metadata:

Metadata tentang pipa dan informasi garis keturunan yang terkait dengan komponen pipa
Metadata tentang artefak yang dihasilkan selama proses pipeline
Metadata tentang eksekusi pipeline

Pipeline lingkungan produksi tipikal melayani beberapa model saat data baru tiba. Saat Anda menemukan hasil yang salah dalam model yang disajikan, Anda dapat meminta database MLMD untuk mengisolasi model yang salah. Anda kemudian dapat melacak garis keturunan komponen pipa yang sesuai dengan model ini untuk men-debug model Anda

Mengatur metadata (MD) toko dengan InteractiveContext ditetapkan sebelumnya untuk query database MLMD.

connection_config = interactive_context.metadata_connection_config
store = mlmd.MetadataStore(connection_config)

# All TFX artifacts are stored in the base directory
base_dir = connection_config.sqlite.filename_uri.split('metadata.sqlite')[0]

Buat beberapa fungsi pembantu untuk melihat data dari penyimpanan MD.

def display_types(types):
  # Helper function to render dataframes for the artifact and execution types
  table = {'id': [], 'name': []}
  for a_type in types:
    table['id'].append(a_type.id)
    table['name'].append(a_type.name)
  return pd.DataFrame(data=table)

def display_artifacts(store, artifacts):
  # Helper function to render dataframes for the input artifacts
  table = {'artifact id': [], 'type': [], 'uri': []}
  for a in artifacts:
    table['artifact id'].append(a.id)
    artifact_type = store.get_artifact_types_by_id([a.type_id])[0]
    table['type'].append(artifact_type.name)
    table['uri'].append(a.uri.replace(base_dir, './'))
  return pd.DataFrame(data=table)

def display_properties(store, node):
  # Helper function to render dataframes for artifact and execution properties
  table = {'property': [], 'value': []}
  for k, v in node.properties.items():
    table['property'].append(k)
    table['value'].append(
        v.string_value if v.HasField('string_value') else v.int_value)
  for k, v in node.custom_properties.items():
    table['property'].append(k)
    table['value'].append(
        v.string_value if v.HasField('string_value') else v.int_value)
  return pd.DataFrame(data=table)

Pertama, permintaan toko MD untuk daftar semua yang disimpan ArtifactTypes .

display_types(store.get_artifact_types())

Berikutnya, query semua PushedModel artefak.

pushed_models = store.get_artifacts_by_type("PushedModel")
display_artifacts(store, pushed_models)

Kueri toko MD untuk model terbaru yang didorong. Tutorial ini hanya memiliki satu model yang didorong.

pushed_model = pushed_models[-1]
display_properties(store, pushed_model)

Salah satu langkah pertama dalam men-debug model yang didorong adalah dengan melihat model terlatih mana yang didorong dan untuk melihat data pelatihan mana yang digunakan untuk melatih model itu.

MLMD menyediakan API traversal untuk menelusuri grafik asal, yang dapat Anda gunakan untuk menganalisis asal model.

def get_one_hop_parent_artifacts(store, artifacts):
  # Get a list of artifacts within a 1-hop of the artifacts of interest
  artifact_ids = [artifact.id for artifact in artifacts]
  executions_ids = set(
      event.execution_id
      for event in store.get_events_by_artifact_ids(artifact_ids)
      if event.type == mlmd.proto.Event.OUTPUT)
  artifacts_ids = set(
      event.artifact_id
      for event in store.get_events_by_execution_ids(executions_ids)
      if event.type == mlmd.proto.Event.INPUT)
  return [artifact for artifact in store.get_artifacts_by_id(artifacts_ids)]

Buat kueri artefak induk untuk model yang didorong.

parent_artifacts = get_one_hop_parent_artifacts(store, [pushed_model])
display_artifacts(store, parent_artifacts)

Kueri properti untuk model.

exported_model = parent_artifacts[0]
display_properties(store, exported_model)

Kueri artefak hulu untuk model.

model_parents = get_one_hop_parent_artifacts(store, [exported_model])
display_artifacts(store, model_parents)

Dapatkan data pelatihan dengan model yang dilatih.

used_data = model_parents[0]
display_properties(store, used_data)

Sekarang setelah Anda memiliki data pelatihan yang digunakan model untuk dilatih, kueri database lagi untuk menemukan langkah pelatihan (eksekusi). Query toko MD untuk daftar jenis eksekusi terdaftar.

display_types(store.get_execution_types())

Langkah pelatihan adalah ExecutionType bernama tfx.components.trainer.component.Trainer . Lintasi toko MD untuk menjalankan pelatih yang sesuai dengan model yang didorong.

def find_producer_execution(store, artifact):
  executions_ids = set(
      event.execution_id
      for event in store.get_events_by_artifact_ids([artifact.id])
      if event.type == mlmd.proto.Event.OUTPUT)
  return store.get_executions_by_id(executions_ids)[0]

trainer = find_producer_execution(store, exported_model)
display_properties(store, trainer)

Ringkasan

Dalam tutorial ini, Anda belajar tentang bagaimana Anda dapat memanfaatkan MLMD untuk melacak garis keturunan komponen pipa TFX Anda dan menyelesaikan masalah.

Untuk mempelajari lebih lanjut tentang cara menggunakan MLMD, lihat sumber daya tambahan berikut: