TFDS はCroissant 🥐 形式をサポートするようになりました。詳細については、ドキュメントをお読みください。

このページは Cloud Translation API によって翻訳されました。

wiki_atomic_edits

参考文献:

ドイツ語_挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/german_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	3343403

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ドイツ語_削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/german_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	1994329

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

english_insertions

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/english_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	13737796

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

english_deletions

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/english_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	9352389

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

スペイン語_挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	1380934

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

スペイン語_削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	908276

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

フランス語_挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/french_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	2038305

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

フランス語_削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/french_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	2060242

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

イタリア語_挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/italian_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	1078814

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

イタリア語_削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/italian_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	583316

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

日本語挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	2249527

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

日本語の削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	1352162

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ロシア語_挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/russian_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	1471638

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ロシア語_削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/russian_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	960976

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

中国語の挿入

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_insertions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	746509

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

中国語_削除

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_deletions')

説明：

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.

ライセンス: 既知のライセンスはありません
バージョン: 1.0.0
分割:

スプリット	例
`'train'`	467271

特徴：

{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}