TFDS はCroissant 🥐 形式をサポートするようになりました。詳細については、ドキュメントをお読みください。

このページは Cloud Translation API によって翻訳されました。

たとえば

説明:

このデータは、2018 年 11 月 17 日土曜日の Tatoeba コーパスから抽出されたものです。

言語ごとに、1000 の英文とその翻訳が選択されています (利用可能な場合)。言語、そのファミリ、およびスクリプトの説明、およびベースラインの結果については、このペーパーを確認してください。

英語の文章は、すべての言語ペアで同一ではないことに注意してください。これは、結果が言語間で直接比較できないことを意味します。

ホームページ: http://opus.nlpl.eu/Tatoeba.php
ソースコード: tfds.datasets.tatoeba.Builder
バージョン:
- 1.0.0 (デフォルト): 初期リリース。
自動キャッシュ(ドキュメント): はい
機能構造:

FeaturesDict({
    'source_language': Text(shape=(), dtype=string),
    'source_sentence': Text(shape=(), dtype=string),
    'target_language': Text(shape=(), dtype=string),
    'target_sentence': Text(shape=(), dtype=string),
})

機能のドキュメント:

特徴	クラス	Dtype
	特徴辞書
ソース言語	文章	弦
source_sentence	文章	弦
目標とする言語	文章	弦
target_sentence	文章	弦

監視されたキー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
引用：

@article{tatoeba,
          title={Massively Multilingual Sentence Embeddings for Zero-Shot
                   Cross-Lingual Transfer and Beyond},
          author={Mikel, Artetxe and Holger, Schwenk,},
          journal={arXiv:1812.10464v2},
          year={2018}
}

@InProceedings{TIEDEMANN12.463,
  author = {J{\"o}rg}rg Tiedemann},
  title = {Parallel Data, Tools and Interfaces in OPUS},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
}

tatoeba/tatoeba_af (デフォルト設定)

ダウンロードサイズ: 58.24 KiB
データセットサイズ: 162.74 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_ar

ダウンロードサイズ: 70.95 KiB
データセットサイズ: 175.46 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_bg

ダウンロードサイズ: 99.88 KiB
データセットサイズ: 204.64 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_bn

ダウンロードサイズ: 89.55 KiB
データセットサイズ: 194.24 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_de

ダウンロードサイズ: 103.09 KiB
データセットサイズ: 207.93 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_el

ダウンロードサイズ: 77.11 KiB
データセットサイズ: 181.65 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_es

ダウンロードサイズ: 70.57 KiB
データセットサイズ: 175.12 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_et

ダウンロードサイズ: 58.33 KiB
データセットサイズ: 162.85 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_eu

ダウンロードサイズ: 64.52 KiB
データセットサイズ: 169.02 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_fa

ダウンロードサイズ: 91.52 KiB
データセットサイズ: 196.15 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_fi

ダウンロードサイズ: 73.90 KiB
データセットサイズ: 178.47 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_fr

ダウンロードサイズ: 78.14 KiB
データセットサイズ: 182.68 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_he

ダウンロードサイズ: 81.54 KiB
データセットサイズ: 186.15 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_hi

ダウンロードサイズ: 119.69 KiB
データセットサイズ: 224.89 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_hu

ダウンロードサイズ: 67.27 KiB
データセットサイズ: 171.78 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_id

ダウンロードサイズ: 73.09 KiB
データセットサイズ: 177.61 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_it

ダウンロードサイズ: 64.29 KiB
データセットサイズ: 168.81 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_ja

ダウンロードサイズ: 90.90 KiB
データセットサイズ: 195.53 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_jv

ダウンロードサイズ: 13.59 KiB
データセットサイズ: 35.01 KiB
スプリット:

スプリット	例
`'train'`	205

例( tfds.as_dataframe ):

tatoeba/tatoeba_ka

ダウンロードサイズ: 70.47 KiB
データセットサイズ: 148.67 KiB
スプリット:

スプリット	例
`'train'`	746

例( tfds.as_dataframe ):

tatoeba/tatoeba_kk

ダウンロードサイズ: 46.07 KiB
データセットサイズ: 106.25 KiB
スプリット:

スプリット	例
`'train'`	575

例( tfds.as_dataframe ):

tatoeba/tatoeba_ko

ダウンロードサイズ: 77.28 KiB
データセットサイズ: 181.88 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_ml

ダウンロードサイズ: 92.50 KiB
データセットサイズ: 165.14 KiB
スプリット:

スプリット	例
`'train'`	687

例( tfds.as_dataframe ):

tatoeba/tatoeba_mr

ダウンロードサイズ: 98.19 KiB
データセットサイズ: 202.96 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_nl

ダウンロードサイズ: 71.55 KiB
データセットサイズ: 176.10 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_pt

ダウンロードサイズ: 73.42 KiB
データセットサイズ: 177.95 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_ru

ダウンロードサイズ: 90.30 KiB
データセットサイズ: 194.92 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_sw

ダウンロードサイズ: 19.99 KiB
データセットサイズ: 60.75 KiB
スプリット:

スプリット	例
`'train'`	390

例( tfds.as_dataframe ):

tatoeba/tatoeba_ta

ダウンロードサイズ: 38.52 KiB
データセットサイズ: 70.93 KiB
スプリット:

スプリット	例
`'train'`	307

例( tfds.as_dataframe ):

tatoeba/tatoeba_te

ダウンロードサイズ: 24.55 KiB
データセットサイズ: 49.07 KiB
スプリット:

スプリット	例
`'train'`	234

例( tfds.as_dataframe ):

tatoeba/tatoeba_th

ダウンロードサイズ: 61.72 KiB
データセットサイズ: 119.32 KiB
スプリット:

スプリット	例
`'train'`	548

例( tfds.as_dataframe ):

tatoeba/tatoeba_tl

ダウンロードサイズ: 66.54 KiB
データセットサイズ: 171.04 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_tr

ダウンロードサイズ: 70.20 KiB
データセットサイズ: 174.70 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_ur

ダウンロードサイズ: 86.63 KiB
データセットサイズ: 191.20 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_vi

ダウンロードサイズ: 89.26 KiB
データセットサイズ: 193.89 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

tatoeba/tatoeba_zh

ダウンロードサイズ: 67.32 KiB
データセットサイズ: 171.85 KiB
スプリット:

スプリット	例
`'train'`	1,000

例( tfds.as_dataframe ):

たとえば コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。