TPU पर BERT का उपयोग करके GLUE कार्यों को हल करें

TensorFlow.org पर देखें

Google Colab में चलाएं

गिटहब पर देखें

नोटबुक डाउनलोड करें

टीएफ हब मॉडल देखें

प्राकृतिक भाषा प्रसंस्करण में कई समस्याओं को हल करने के लिए BERT का उपयोग किया जा सकता है। आप कैसे से कई कार्यों को परिशोधित बर्ट सीखना होगा GLUE बेंचमार्क :

कोला (भाषाई स्वीकार्यता का कॉर्पस): वाक्य व्याकरण की दृष्टि से सही है?
एसएसटी -2 (स्टैनफोर्ड भावना Treebank): कार्य दिए गए वाक्य की भावना भविष्यवाणी करने के लिए है।
MRPC (माइक्रोसॉफ्ट रिसर्च संक्षिप्त व्याख्या कोर्पस): निर्धारित वाक्य की एक जोड़ी शब्दार्थ बराबर हैं या नहीं।
QQP (Quora प्रश्न Pairs2): निर्धारित करें कि सवालों की एक जोड़ी शब्दार्थ बराबर हैं या नहीं।
MNLI (मल्टी शैली प्राकृतिक भाषा निष्कर्ष): यह देखते हुए एक आधार यह सजा और एक परिकल्पना की सजा, कार्य, आधार जरूरत पर जोर देता है कि क्या परिकल्पना (अनुलाग) भविष्यवाणी करने के लिए है परिकल्पना (विरोधाभास) के विपरीत है, या कोई भी (तटस्थ)।
QNLI (प्रश्न-जवाब देने प्राकृतिक भाषा निष्कर्ष): कार्य निर्धारित करने के लिए संदर्भ की सजा सवाल का जवाब होता है।
आरटीई (स्वीकार करते शाब्दिक अनुलाग): निर्धारित करें कि एक वाक्य एक दिया परिकल्पना जरूरत पर जोर देता है या नहीं है।
WNLI (Winograd प्राकृतिक भाषा निष्कर्ष): काम करता है, तो सर्वनाम एवजी के साथ वाक्य मूल वाक्य से अपरिहार्य है भविष्यवाणी करने के लिए है।

इस ट्यूटोरियल में इन मॉडलों को टीपीयू पर प्रशिक्षित करने के लिए संपूर्ण एंड-टू-एंड कोड है। आप एक लाइन (नीचे वर्णित) को बदलकर, इस नोटबुक को GPU पर भी चला सकते हैं।

इस नोटबुक में, आप:

TensorFlow हब से एक BERT मॉडल लोड करें
GLUE कार्यों में से एक चुनें और डेटासेट डाउनलोड करें
टेक्स्ट को प्रीप्रोसेस करें
फाइन-ट्यून BERT (एकल-वाक्य और बहु-वाक्य डेटासेट के लिए उदाहरण दिए गए हैं)
प्रशिक्षित मॉडल को सहेजें और उसका उपयोग करें

सेट अप

आप BERT को फाइन-ट्यून करने के लिए टेक्स्ट का उपयोग करने से पहले प्रीप्रोसेस करने के लिए एक अलग मॉडल का उपयोग करेंगे। यह मॉडल पर निर्भर करता है tensorflow / पाठ , आप नीचे स्थापित हो जाएगा जो।

pip install -q -U tensorflow-text

आप से AdamW अनुकूलक का उपयोग करेगा tensorflow / मॉडल फ़ाइन-ट्यून बर्ट, जो आपको अच्छी तरह से स्थापित हो जाएगा करने के लिए।

pip install -q -U tf-models-official

pip install -U tfds-nightly

import os
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import tensorflow_text as text  # A dependency of the preprocessing model
import tensorflow_addons as tfa
from official.nlp import optimization
import numpy as np

tf.get_logger().setLevel('ERROR')

/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (2.3.0)/charset_normalizer (2.0.7) doesn't match a supported version!
  RequestsDependencyWarning)

इसके बाद, TFHub को सीधे TFHub के क्लाउड स्टोरेज बकेट से चौकियों को पढ़ने के लिए कॉन्फ़िगर करें। यह केवल TPU पर TFHub मॉडल चलाते समय अनुशंसित है।

इस सेटिंग के बिना TFHub संपीड़ित फ़ाइल को डाउनलोड करेगा और चेकपॉइंट को स्थानीय रूप से निकालेगा। इन स्थानीय फ़ाइलों से लोड करने का प्रयास निम्न त्रुटि के साथ विफल हो जाएगा:

InvalidArgumentError: Unimplemented: File system scheme '[local]' not implemented

इसका कारण यह है TPU केवल क्लाउड संग्रहण बाल्टी से सीधे पढ़ सकते हैं ।

os.environ["TFHUB_MODEL_LOAD_FORMAT"]="UNCOMPRESSED"

TPU कार्यकर्ता से कनेक्ट करें

निम्न कोड TPU कार्यकर्ता से जुड़ता है और TensorFlow के डिफ़ॉल्ट डिवाइस को TPU कार्यकर्ता पर CPU डिवाइस में बदल देता है। यह एक TPU वितरण रणनीति को भी परिभाषित करता है जिसका उपयोग आप इस एक TPU कार्यकर्ता पर उपलब्ध 8 अलग-अलग TPU कोर पर मॉडल प्रशिक्षण वितरित करने के लिए करेंगे। TensorFlow के देखें TPU गाइड अधिक जानकारी के लिए।

import os

if os.environ['COLAB_TPU_ADDR']:
  cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
  tf.config.experimental_connect_to_cluster(cluster_resolver)
  tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
  strategy = tf.distribute.TPUStrategy(cluster_resolver)
  print('Using TPU')
elif tf.config.list_physical_devices('GPU'):
  strategy = tf.distribute.MirroredStrategy()
  print('Using GPU')
else:
  raise ValueError('Running on CPU is not recommended.')

Using TPU

TensorFlow हब से मॉडल लोड हो रहा है

यहां आप चुन सकते हैं कि आप TensorFlow हब और फ़ाइन-ट्यून से कौन सा BERT मॉडल लोड करेंगे। चुनने के लिए कई BERT मॉडल उपलब्ध हैं।

बर्ट-बेस , Uncased और सात अधिक मॉडल मूल बर्ट लेखकों द्वारा जारी किया गया प्रशिक्षित वजन के साथ।
छोटे BERTs ही सामान्य वास्तुकला लेकिन कम और / या छोटे ट्रांसफार्मर ब्लॉक, आप गति, आकार और गुणवत्ता के बीच तालमेल का पता लगाने की सुविधा देता है जो की है।
अल्बर्ट : कि परतों के बीच मानकों को साझा करके मॉडल आकार (लेकिन गणना समय) कम कर देता है "एक लाइट बर्ट" के चार अलग अलग आकार।
बर्ट विशेषज्ञों : आठ मॉडल सभी कि बर्ट आधार वास्तुकला है, लेकिन अलग-पूर्व प्रशिक्षण डोमेन के बीच एक विकल्प प्रदान करते हैं, लक्ष्य कार्य के साथ और अधिक बारीकी से संरेखित करने के लिए।
इलेक्ट्रा बर्ट रूप में एक ही वास्तुकला (तीन अलग अलग आकार में) है, लेकिन एक सेट अप में एक discriminator अनुसार पहले से प्रशिक्षित हो जाता है कि जैसा दिखता है एक उत्पादक विरोधात्मक Network (GAN)।
बात-प्रमुखों ध्यान दें और सुरक्षा पूर्ण GELU [साथ बर्ट आधार , बड़े ] ट्रांसफार्मर वास्तुकला के मूल करने के लिए दोनों के सुधार है।

अधिक विवरण के लिए ऊपर लिंक किए गए मॉडल दस्तावेज़ देखें।

इस ट्यूटोरियल में, आप BERT-बेस से शुरुआत करेंगे। आप उच्च सटीकता के लिए बड़े और अधिक हाल के मॉडल का उपयोग कर सकते हैं, या तेज़ प्रशिक्षण समय के लिए छोटे मॉडल का उपयोग कर सकते हैं। मॉडल को बदलने के लिए, आपको केवल कोड की एक पंक्ति को बदलना होगा (नीचे दिखाया गया है)। सहेजे गए मॉडल में सभी अंतरों को समझाया गया है जिसे आप TensorFlow हब से डाउनलोड करेंगे।

फ़ाइन-ट्यून करने के लिए एक BERT मॉडल चुनें

bert_model_name = 'bert_en_uncased_L-12_H-768_A-12' 

map_name_to_handle = {
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_base/2',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_large/2',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_xlarge/2',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_xxlarge/2',
    'electra_small':
        'https://tfhub.dev/google/electra_small/2',
    'electra_base':
        'https://tfhub.dev/google/electra_base/2',
    'experts_pubmed':
        'https://tfhub.dev/google/experts/bert/pubmed/2',
    'experts_wiki_books':
        'https://tfhub.dev/google/experts/bert/wiki_books/2',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1',
}

map_model_to_preprocess = {
    'bert_en_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_en_wwm_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',
    'bert_en_wwm_uncased_L-24_H-1024_A-16':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-2_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-4_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-6_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-8_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-10_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-128_A-2':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-256_A-4':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-512_A-8':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'small_bert/bert_en_uncased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'bert_multi_cased_L-12_H-768_A-12':
        'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',
    'albert_en_base':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_large':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'albert_en_xxlarge':
        'https://tfhub.dev/tensorflow/albert_en_preprocess/3',
    'electra_small':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'electra_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_pubmed':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'experts_wiki_books':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_base':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
    'talking-heads_large':
        'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',
}

tfhub_handle_encoder = map_name_to_handle[bert_model_name]
tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]

print('BERT model selected           :', tfhub_handle_encoder)
print('Preprocessing model auto-selected:', tfhub_handle_preprocess)

BERT model selected           : https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3
Preprocessing model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

टेक्स्ट को प्रीप्रोसेस करें

पर बर्ट के साथ बाँटिए पाठ colab preprocessing मॉडल सीधे बर्ट एनकोडर के साथ एम्बेडेड किया जाता है।

यह ट्यूटोरियल दर्शाता है कि Dataset.map का उपयोग करके प्रशिक्षण के लिए अपनी इनपुट पाइपलाइन के हिस्से के रूप में प्रीप्रोसेसिंग कैसे करें, और फिर इसे उस मॉडल में मर्ज करें जो अनुमान के लिए निर्यात किया जाता है। इस तरह, प्रशिक्षण और अनुमान दोनों कच्चे पाठ इनपुट से काम कर सकते हैं, हालांकि टीपीयू को स्वयं संख्यात्मक इनपुट की आवश्यकता होती है।

TPU आवश्यकताओं अलग रूप में, यह मदद कर सकते हैं प्रदर्शन preprocessing (आप में अधिक सीख सकते हैं एक इनपुट पाइप लाइन में अतुल्यकालिक रूप से किया है tf.data प्रदर्शन गाइड )।

यह ट्यूटोरियल यह भी दर्शाता है कि मल्टी-इनपुट मॉडल कैसे बनाया जाए, और इनपुट की अनुक्रम लंबाई को BERT में कैसे समायोजित किया जाए।

आइए प्रीप्रोसेसिंग मॉडल का प्रदर्शन करें।

bert_preprocess = hub.load(tfhub_handle_preprocess)
tok = bert_preprocess.tokenize(tf.constant(['Hello TensorFlow!']))
print(tok)

<tf.RaggedTensor [[[7592], [23435, 12314], [999]]]>

प्रत्येक preprocessing मॉडल भी एक तरीका है, प्रदान करता है .bert_pack_inputs(tensors, seq_length) , जो (जैसे टोकन की एक सूची लेता tok ऊपर) और एक दृश्य लंबाई तर्क। यह BERT मॉडल द्वारा अपेक्षित प्रारूप में टेंसरों का एक शब्दकोश बनाने के लिए इनपुट पैक करता है।

text_preprocessed = bert_preprocess.bert_pack_inputs([tok, tok], tf.constant(20))

print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Shape Word Ids :  (1, 20)
Word Ids       :  tf.Tensor(
[  101  7592 23435 12314   999   102  7592 23435 12314   999   102     0
     0     0     0     0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 20)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 20)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0], shape=(16,), dtype=int32)

ध्यान देने के लिए यहां कुछ विवरण दिए गए हैं:

input_mask मुखौटा सामग्री और गद्दी के बीच सफाई से अलग करने के लिए मॉडल की अनुमति देता है। मुखौटा के रूप में ही आकार input_word_ids , और एक 1 कहीं भी शामिल input_word_ids padding नहीं है।
input_type_ids रूप में एक ही आकार है input_mask , लेकिन गैर गद्देदार क्षेत्र के अंदर मौजूद है, जिसमें एक 0 या एक 1 का संकेत जो वाक्य टोकन का एक हिस्सा है।

इसके बाद, आप एक प्रीप्रोसेसिंग मॉडल तैयार करेंगे जो इस सारे तर्क को समाहित करता है। आपका मॉडल स्ट्रिंग्स को इनपुट के रूप में लेगा, और उचित रूप से स्वरूपित ऑब्जेक्ट लौटाएगा जिसे BERT को पास किया जा सकता है।

प्रत्येक BERT मॉडल में एक विशिष्ट प्रीप्रोसेसिंग मॉडल होता है, सुनिश्चित करें कि BERT के मॉडल दस्तावेज़ में वर्णित उचित मॉडल का उपयोग करें।

def make_bert_preprocess_model(sentence_features, seq_length=128):
  """Returns Model mapping string features to BERT inputs.

  Args:
    sentence_features: a list with the names of string-valued features.
    seq_length: an integer that defines the sequence length of BERT inputs.

  Returns:
    A Keras Model that can be called on a list or dict of string Tensors
    (with the order or names, resp., given by sentence_features) and
    returns a dict of tensors for input to BERT.
  """

  input_segments = [
      tf.keras.layers.Input(shape=(), dtype=tf.string, name=ft)
      for ft in sentence_features]

  # Tokenize the text to word pieces.
  bert_preprocess = hub.load(tfhub_handle_preprocess)
  tokenizer = hub.KerasLayer(bert_preprocess.tokenize, name='tokenizer')
  segments = [tokenizer(s) for s in input_segments]

  # Optional: Trim segments in a smart way to fit seq_length.
  # Simple cases (like this example) can skip this step and let
  # the next step apply a default truncation to approximately equal lengths.
  truncated_segments = segments

  # Pack inputs. The details (start/end token ids, dict of output tensors)
  # are model-dependent, so this gets loaded from the SavedModel.
  packer = hub.KerasLayer(bert_preprocess.bert_pack_inputs,
                          arguments=dict(seq_length=seq_length),
                          name='packer')
  model_inputs = packer(truncated_segments)
  return tf.keras.Model(input_segments, model_inputs)

आइए प्रीप्रोसेसिंग मॉडल का प्रदर्शन करें। आप दो वाक्य इनपुट (इनपुट 1 और इनपुट 2) के साथ एक परीक्षण तैयार करेंगे। : उत्पादन क्या एक बर्ट मॉडल इनपुट के रूप में उम्मीद करेंगे है input_word_ids , input_masks और input_type_ids ।

test_preprocess_model = make_bert_preprocess_model(['my_input1', 'my_input2'])
test_text = [np.array(['some random test sentence']),
             np.array(['another sentence'])]
text_preprocessed = test_preprocess_model(test_text)

print('Keys           : ', list(text_preprocessed.keys()))
print('Shape Word Ids : ', text_preprocessed['input_word_ids'].shape)
print('Word Ids       : ', text_preprocessed['input_word_ids'][0, :16])
print('Shape Mask     : ', text_preprocessed['input_mask'].shape)
print('Input Mask     : ', text_preprocessed['input_mask'][0, :16])
print('Shape Type Ids : ', text_preprocessed['input_type_ids'].shape)
print('Type Ids       : ', text_preprocessed['input_type_ids'][0, :16])

Keys           :  ['input_word_ids', 'input_mask', 'input_type_ids']
Shape Word Ids :  (1, 128)
Word Ids       :  tf.Tensor(
[ 101 2070 6721 3231 6251  102 2178 6251  102    0    0    0    0    0
    0    0], shape=(16,), dtype=int32)
Shape Mask     :  (1, 128)
Input Mask     :  tf.Tensor([1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)
Shape Type Ids :  (1, 128)
Type Ids       :  tf.Tensor([0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0], shape=(16,), dtype=int32)

आइए मॉडल की संरचना पर एक नज़र डालें, आपके द्वारा अभी-अभी परिभाषित किए गए दो इनपुट पर ध्यान दें।

tf.keras.utils.plot_model(test_preprocess_model, show_shapes=True, show_dtype=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')

डाटासेट से सभी आदानों में पूर्व प्रसंस्करण लागू करने के लिए आप का उपयोग करेगा map डाटासेट से कार्य करते हैं। परिणाम तो के लिए कैश किया गया है प्रदर्शन ।

AUTOTUNE = tf.data.AUTOTUNE


def load_dataset_from_tfds(in_memory_ds, info, split, batch_size,
                           bert_preprocess_model):
  is_training = split.startswith('train')
  dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[split])
  num_examples = info.splits[split].num_examples

  if is_training:
    dataset = dataset.shuffle(num_examples)
    dataset = dataset.repeat()
  dataset = dataset.batch(batch_size)
  dataset = dataset.map(lambda ex: (bert_preprocess_model(ex), ex['label']))
  dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
  return dataset, num_examples

अपने मॉडल को परिभाषित करें

अब आप BERT एनकोडर के माध्यम से प्रीप्रोसेस्ड इनपुट को फीड करके और ऊपर एक लीनियर क्लासिफायर (या परतों की अन्य व्यवस्था जैसा आप चाहें) लगाकर और नियमितीकरण के लिए ड्रॉपआउट का उपयोग करके वाक्य या वाक्य जोड़ी वर्गीकरण के लिए अपने मॉडल को परिभाषित करने के लिए तैयार हैं।

def build_classifier_model(num_classes):

  class Classifier(tf.keras.Model):
    def __init__(self, num_classes):
      super(Classifier, self).__init__(name="prediction")
      self.encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True)
      self.dropout = tf.keras.layers.Dropout(0.1)
      self.dense = tf.keras.layers.Dense(num_classes)

    def call(self, preprocessed_text):
      encoder_outputs = self.encoder(preprocessed_text)
      pooled_output = encoder_outputs["pooled_output"]
      x = self.dropout(pooled_output)
      x = self.dense(x)
      return x

  model = Classifier(num_classes)
  return model

आइए कुछ प्रीप्रोसेस्ड इनपुट पर मॉडल को चलाने का प्रयास करें।

test_classifier_model = build_classifier_model(2)
bert_raw_result = test_classifier_model(text_preprocessed)
print(tf.sigmoid(bert_raw_result))

tf.Tensor([[0.29329836 0.44367802]], shape=(1, 2), dtype=float32)

GLUE . से कोई कार्य चुनें

आप से एक TensorFlow डेटासेट का उपयोग करने के लिए जा रहे हैं GLUE बेंचमार्क सूट।

Colab आपको इन छोटे डेटासेट को स्थानीय फ़ाइल सिस्टम में डाउनलोड करने देता है, और नीचे दिया गया कोड उन्हें पूरी तरह से मेमोरी में पढ़ता है, क्योंकि अलग TPU वर्कर होस्ट कोलाब रनटाइम के स्थानीय फ़ाइल सिस्टम तक नहीं पहुंच सकता है।

बड़ा डेटासेट के लिए, आप अपने खुद के बनाने की आवश्यकता होगी Google क्लाउड संग्रहण बाल्टी और TPU कार्यकर्ता वहां से डेटा पढ़ने की है। आप में अधिक जान सकते हैं TPU गाइड ।

CoLa डेटासेट (एकल वाक्य के लिए) या MRPC (बहु वाक्य के लिए) से शुरू करने की अनुशंसा की जाती है क्योंकि ये छोटे होते हैं और ठीक होने में लंबा समय नहीं लेते हैं।

tfds_name = 'glue/cola' 

tfds_info = tfds.builder(tfds_name).info

sentence_features = list(tfds_info.features.keys())
sentence_features.remove('idx')
sentence_features.remove('label')

available_splits = list(tfds_info.splits.keys())
train_split = 'train'
validation_split = 'validation'
test_split = 'test'
if tfds_name == 'glue/mnli':
  validation_split = 'validation_matched'
  test_split = 'test_matched'

num_classes = tfds_info.features['label'].num_classes
num_examples = tfds_info.splits.total_num_examples

print(f'Using {tfds_name} from TFDS')
print(f'This dataset has {num_examples} examples')
print(f'Number of classes: {num_classes}')
print(f'Features {sentence_features}')
print(f'Splits {available_splits}')

with tf.device('/job:localhost'):
  # batch_size=-1 is a way to load the dataset into memory
  in_memory_ds = tfds.load(tfds_name, batch_size=-1, shuffle_files=True)

# The code below is just to show some samples from the selected dataset
print(f'Here are some sample rows from {tfds_name} dataset')
sample_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[train_split])

labels_names = tfds_info.features['label'].names
print(labels_names)
print()

sample_i = 1
for sample_row in sample_dataset.take(5):
  samples = [sample_row[feature] for feature in sentence_features]
  print(f'sample row {sample_i}')
  for sample in samples:
    print(sample.numpy())
  sample_label = sample_row['label']

  print(f'label: {sample_label} ({labels_names[sample_label]})')
  print()
  sample_i += 1

Using glue/cola from TFDS
This dataset has 10657 examples
Number of classes: 2
Features ['sentence']
Splits ['train', 'validation', 'test']
Here are some sample rows from glue/cola dataset
['unacceptable', 'acceptable']

sample row 1
b'It is this hat that it is certain that he was wearing.'
label: 1 (acceptable)

sample row 2
b'Her efficient looking up of the answer pleased the boss.'
label: 1 (acceptable)

sample row 3
b'Both the workers will wear carnations.'
label: 1 (acceptable)

sample row 4
b'John enjoyed drawing trees for his syntax homework.'
label: 1 (acceptable)

sample row 5
b'We consider Leslie rather foolish, and Lou a complete idiot.'
label: 1 (acceptable)

डेटासेट समस्या प्रकार (वर्गीकरण या प्रतिगमन) और प्रशिक्षण के लिए उपयुक्त हानि फ़ंक्शन भी निर्धारित करता है।

def get_configuration(glue_task):

  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  if glue_task == 'glue/cola':
    metrics = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
  else:
    metrics = tf.keras.metrics.SparseCategoricalAccuracy(
        'accuracy', dtype=tf.float32)

  return metrics, loss

अपने मॉडल को प्रशिक्षित करें

अंत में, आप अपने द्वारा चुने गए डेटासेट पर मॉडल को एंड-टू-एंड प्रशिक्षित कर सकते हैं।

वितरण

शीर्ष पर सेट-अप कोड को याद करें, जिसने कोलाब रनटाइम को कई टीपीयू उपकरणों के साथ एक टीपीयू कार्यकर्ता से जोड़ा है। उन पर प्रशिक्षण वितरित करने के लिए, आप टीपीयू वितरण रणनीति के दायरे में अपना मुख्य केरस मॉडल बनाएंगे और संकलित करेंगे। (विवरण के लिए देखें Keras के साथ प्रशिक्षण वितरित ।)

दूसरी ओर, प्रीप्रोसेसिंग, वर्कर होस्ट के सीपीयू पर चलता है, टीपीयू पर नहीं, इसलिए प्रीप्रोसेसिंग के लिए केरस मॉडल के साथ-साथ इसके साथ मैप किए गए प्रशिक्षण और सत्यापन डेटासेट को वितरण रणनीति के दायरे से बाहर बनाया गया है। कॉल करने के लिए Model.fit() के वितरण का ख्याल रखना होगा पारित कर दिया-इन मॉडल प्रतिकृतियां को डाटासेट।

अनुकूलक

ठीक ट्यूनिंग अनुकूलक सेट अप बर्ट से पूर्व प्रशिक्षण (के रूप में इस प्रकार है बर्ट के साथ बाँटिए पाठ यह एक काल्पनिक सीखने की शुरुआती दर की एक रेखीय क्षय के साथ AdamW अनुकूलक का उपयोग करता है, पहले से अधिक एक रेखीय वार्म अप चरण के साथ उपसर्ग:) प्रशिक्षण कदम (के 10% num_warmup_steps )। BERT पेपर के अनुरूप, फाइन-ट्यूनिंग के लिए प्रारंभिक सीखने की दर छोटी होती है (सर्वश्रेष्ठ 5e-5, 3e-5, 2e-5)।

epochs = 3
batch_size = 32
init_lr = 2e-5

print(f'Fine tuning {tfhub_handle_encoder} model')
bert_preprocess_model = make_bert_preprocess_model(sentence_features)

with strategy.scope():

  # metric have to be created inside the strategy scope
  metrics, loss = get_configuration(tfds_name)

  train_dataset, train_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, train_split, batch_size, bert_preprocess_model)
  steps_per_epoch = train_data_size // batch_size
  num_train_steps = steps_per_epoch * epochs
  num_warmup_steps = num_train_steps // 10

  validation_dataset, validation_data_size = load_dataset_from_tfds(
      in_memory_ds, tfds_info, validation_split, batch_size,
      bert_preprocess_model)
  validation_steps = validation_data_size // batch_size

  classifier_model = build_classifier_model(num_classes)

  optimizer = optimization.create_optimizer(
      init_lr=init_lr,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      optimizer_type='adamw')

  classifier_model.compile(optimizer=optimizer, loss=loss, metrics=[metrics])

  classifier_model.fit(
      x=train_dataset,
      validation_data=validation_dataset,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      validation_steps=validation_steps)

Fine tuning https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3 model
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/keras/engine/functional.py:585: UserWarning: Input dict contained keys ['idx', 'label'] which did not match any model input. They will be ignored by the model.
  [n for n in tensors.keys() if n not in ref_input_names])
Epoch 1/3
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:1", shape=(None,), dtype=int32), values=Tensor("clip_by_global_norm/clip_by_global_norm/_0:0", dtype=float32), dense_shape=Tensor("AdamWeightDecay/gradients/StatefulPartitionedCall:2", shape=(None,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
267/267 [==============================] - 86s 81ms/step - loss: 0.6092 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.4846 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 2/3
267/267 [==============================] - 14s 53ms/step - loss: 0.3774 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.5322 - val_MatthewsCorrelationCoefficient: 0.0000e+00
Epoch 3/3
267/267 [==============================] - 14s 53ms/step - loss: 0.2623 - MatthewsCorrelationCoefficient: 0.0000e+00 - val_loss: 0.6469 - val_MatthewsCorrelationCoefficient: 0.0000e+00

अनुमान के लिए निर्यात करें

आप एक अंतिम मॉडल तैयार करेंगे जिसमें प्रीप्रोसेसिंग भाग और हमारे द्वारा अभी बनाया गया फाइन-ट्यून बीईआरटी होगा।

अनुमान के समय, प्रीप्रोसेसिंग को मॉडल का हिस्सा होना चाहिए (क्योंकि अब प्रशिक्षण डेटा के लिए एक अलग इनपुट कतार नहीं है जो इसे करता है)। प्रीप्रोसेसिंग सिर्फ गणना नहीं है; इसके अपने संसाधन हैं (शब्दावली तालिका) जिसे निर्यात के लिए सहेजे गए केरस मॉडल से जोड़ा जाना चाहिए। यह अंतिम सभा वही है जो बचाई जाएगी।

आप colab पर मॉडल को बचाने के लिए जा रहे हैं और बाद में आप इसे भविष्य के लिए रखने के लिए डाउनलोड कर सकते हैं (देखें -> विषय सूची -> फ़ाइलें)।

main_save_path = './my_models'
bert_type = tfhub_handle_encoder.split('/')[-2]
saved_model_name = f'{tfds_name.replace("/", "_")}_{bert_type}'

saved_model_path = os.path.join(main_save_path, saved_model_name)

preprocess_inputs = bert_preprocess_model.inputs
bert_encoder_inputs = bert_preprocess_model(preprocess_inputs)
bert_outputs = classifier_model(bert_encoder_inputs)
model_for_export = tf.keras.Model(preprocess_inputs, bert_outputs)

print('Saving', saved_model_path)

# Save everything on the Colab host (even the variables from TPU memory)
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model_for_export.save(saved_model_path, include_optimizer=False,
                      options=save_options)

Saving ./my_models/glue_cola_bert_en_uncased_L-12_H-768_A-12
WARNING:absl:Found untraced functions such as restored_function_body, restored_function_body, restored_function_body, restored_function_body, restored_function_body while saving (showing 5 of 910). These functions will not be directly callable after loading.

मॉडल का परीक्षण करें

अंतिम चरण आपके निर्यात किए गए मॉडल के परिणामों का परीक्षण कर रहा है।

बस कुछ तुलना करने के लिए, आइए मॉडल को फिर से लोड करें और डेटासेट से परीक्षण विभाजन से कुछ इनपुट का उपयोग करके इसका परीक्षण करें।

with tf.device('/job:localhost'):
  reloaded_model = tf.saved_model.load(saved_model_path)

उपयोगिता के तरीके

def prepare(record):
  model_inputs = [[record[ft]] for ft in sentence_features]
  return model_inputs


def prepare_serving(record):
  model_inputs = {ft: record[ft] for ft in sentence_features}
  return model_inputs


def print_bert_results(test, bert_result, dataset_name):

  bert_result_class = tf.argmax(bert_result, axis=1)[0]

  if dataset_name == 'glue/cola':
    print('sentence:', test[0].numpy())
    if bert_result_class == 1:
      print('This sentence is acceptable')
    else:
      print('This sentence is unacceptable')

  elif dataset_name == 'glue/sst2':
    print('sentence:', test[0])
    if bert_result_class == 1:
      print('This sentence has POSITIVE sentiment')
    else:
      print('This sentence has NEGATIVE sentiment')

  elif dataset_name == 'glue/mrpc':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Are a paraphrase')
    else:
      print('Are NOT a paraphrase')

  elif dataset_name == 'glue/qqp':
    print('question1:', test[0])
    print('question2:', test[1])
    if bert_result_class == 1:
      print('Questions are similar')
    else:
      print('Questions are NOT similar')

  elif dataset_name == 'glue/mnli':
    print('premise   :', test[0])
    print('hypothesis:', test[1])
    if bert_result_class == 1:
      print('This premise is NEUTRAL to the hypothesis')
    elif bert_result_class == 2:
      print('This premise CONTRADICTS the hypothesis')
    else:
      print('This premise ENTAILS the hypothesis')

  elif dataset_name == 'glue/qnli':
    print('question:', test[0])
    print('sentence:', test[1])
    if bert_result_class == 1:
      print('The question is NOT answerable by the sentence')
    else:
      print('The question is answerable by the sentence')

  elif dataset_name == 'glue/rte':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  elif dataset_name == 'glue/wnli':
    print('sentence1:', test[0])
    print('sentence2:', test[1])
    if bert_result_class == 1:
      print('Sentence1 DOES NOT entails sentence2')
    else:
      print('Sentence1 entails sentence2')

  print('BERT raw results:', bert_result[0])
  print()

परीक्षण

with tf.device('/job:localhost'):
  test_dataset = tf.data.Dataset.from_tensor_slices(in_memory_ds[test_split])
  for test_row in test_dataset.shuffle(1000).map(prepare).take(5):
    if len(sentence_features) == 1:
      result = reloaded_model(test_row[0])
    else:
      result = reloaded_model(list(test_row))

    print_bert_results(test_row, result, tfds_name)

sentence: [b'An old woman languished in the forest.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.7032353  3.3714833], shape=(2,), dtype=float32)

sentence: [b"I went to the movies and didn't pick up the shirts."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.73970896  1.0806316 ], shape=(2,), dtype=float32)

sentence: [b"Every essay that she's written and which I've read is on that pile."]
This sentence is acceptable
BERT raw results: tf.Tensor([-0.7034159  0.6236454], shape=(2,), dtype=float32)

sentence: [b'Either Bill ate the peaches, or Harry.']
This sentence is unacceptable
BERT raw results: tf.Tensor([ 0.05972151 -0.08620442], shape=(2,), dtype=float32)

sentence: [b'I ran into the baker from whom I bought these bagels.']
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6862067  3.285925 ], shape=(2,), dtype=float32)

आप अपने मॉडल का उपयोग करना चाहते हैं TF प्रस्तुति , याद है कि वह अपने नाम पर रखा गया हस्ताक्षर से एक के माध्यम से अपने SavedModel कॉल करेंगे। ध्यान दें कि इनपुट में कुछ छोटे अंतर हैं। पायथन में, आप उनका परीक्षण इस प्रकार कर सकते हैं:

with tf.device('/job:localhost'):
  serving_model = reloaded_model.signatures['serving_default']
  for test_row in test_dataset.shuffle(1000).map(prepare_serving).take(5):
    result = serving_model(**test_row)
    # The 'prediction' key is the classifier's defined model name.
    print_bert_results(list(test_row.values()), result['prediction'], tfds_name)

sentence: b'Everyone attended more than two seminars.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.5594155  2.862155 ], shape=(2,), dtype=float32)

sentence: b'Most columnists claim that a senior White House official has been briefing them.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6298996  3.3155093], shape=(2,), dtype=float32)

sentence: b"That my father, he's lived here all his life is well known to those cops."
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2048947  1.8589772], shape=(2,), dtype=float32)

sentence: b'Ourselves like us.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.2723312  2.0494034], shape=(2,), dtype=float32)

sentence: b'John is clever.'
This sentence is acceptable
BERT raw results: tf.Tensor([-1.6516167  3.3147635], shape=(2,), dtype=float32)

तुमने यह किया! आपके सहेजे गए मॉडल का उपयोग किसी प्रक्रिया में प्रस्तुत करने या सरल अनुमान के लिए किया जा सकता है, जिसमें कम कोड वाले सरल एपीआई और बनाए रखने में आसान है।

अगले कदम

अब जब आपने किसी आधार BERT मॉडल को आज़मा लिया है, तो आप अधिक सटीकता प्राप्त करने के लिए या शायद छोटे मॉडल संस्करणों के साथ अन्य मॉडल आज़मा सकते हैं।

आप अन्य डेटासेट में भी कोशिश कर सकते हैं।