wiki_atomic_edits

Références :

allemand_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/german_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 3343403
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

german_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/german_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 1994329
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

anglais_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/english_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 13737796
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

anglais_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/english_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 9352389
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

espagnol_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 1380934
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

espagnol_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/spanish_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 908276
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

français_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/french_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 2038305
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

french_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/french_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 2060242
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

italien_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/italian_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 1078814
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

italien_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/italian_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 583316
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

japonais_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 2249527
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

japonais_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/japanese_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 1352162
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

russe_insertions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/russian_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 1471638
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

suppressions_russes

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/russian_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 960976
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

insertions_chinoises

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_insertions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 746509
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

chinois_deletions

Utilisez la commande suivante pour charger cet ensemble de données dans TFDS :

ds = tfds.load('huggingface:wiki_atomic_edits/chinese_deletions')
  • Description :
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

An atomic edit is defined as an edit e applied to a natural language expression S as the insertion, deletion, or substitution of a sub-expression P such that both the original expression S and the resulting expression e(S) are well-formed semantic constituents (MacCartney, 2009). In this corpus, we release such atomic insertions and deletions made to sentences in wikipedia.
  • Licence : Aucune licence connue
  • Version : 1.0.0
  • Divisions :
Diviser Exemples
'train' 467271
  • Caractéristiques :
{
    "id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "base_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "phrase": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "edited_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}