TFDS ora supporta il formato Croissant 🥐 ! Leggi la documentazione per saperne di più.

Questa pagina è stata tradotta dall'API Cloud Translation.

para_pat

Riferimenti:

el-en

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/el-en')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	10855

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "el",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

cs-en

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/cs-en')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	78977

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "cs",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-hu

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-hu')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	42629

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "hu"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ro

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-ro')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	48789

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "ro"
        ],
        "id": null,
        "_type": "Translation"
    }
}

it-sk

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-sk')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	23410

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "sk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

it-uk

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-uk')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	89226

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

es-fr

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/es-fr')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	32553

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "es",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ru

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/fr-ru')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	10889

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "fr",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-fr

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/de-fr')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	1167988

Caratteristiche :

{
    "translation": {
        "languages": [
            "de",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ja

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-ja')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	6170339

Caratteristiche :

{
    "translation": {
        "languages": [
            "en",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-es

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-es')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	649396

Caratteristiche :

{
    "translation": {
        "languages": [
            "en",
            "es"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-fr

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-fr')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	12223525

Caratteristiche :

{
    "translation": {
        "languages": [
            "en",
            "fr"
        ],
        "id": null,
        "_type": "Translation"
    }
}

de-en

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/de-en')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	2165054

Caratteristiche :

{
    "translation": {
        "languages": [
            "de",
            "en"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ko

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-ko')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	2324357

Caratteristiche :

{
    "translation": {
        "languages": [
            "en",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ja

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/fr-ja')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	313422

Caratteristiche :

{
    "translation": {
        "languages": [
            "fr",
            "ja"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-zh

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-zh')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	4897841

Caratteristiche :

{
    "translation": {
        "languages": [
            "en",
            "zh"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-ru

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-ru')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	4296399

Caratteristiche :

{
    "translation": {
        "languages": [
            "en",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-ko

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/fr-ko')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	120607

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "fr",
            "ko"
        ],
        "id": null,
        "_type": "Translation"
    }
}

ru-uk

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/ru-uk')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	85963

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "ru",
            "uk"
        ],
        "id": null,
        "_type": "Translation"
    }
}

en-pt

Utilizzare il comando seguente per caricare questo set di dati in TFDS:

ds = tfds.load('huggingface:para_pat/en-pt')

Descrizione :

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

This dataset contains the developed parallel corpus from the open access Google
Patents dataset in 74 language pairs, comprising more than 68 million sentences
and 800 million tokens. Sentences were automatically aligned using the Hunalign algorithm
for the largest 22 language pairs, while the others were abstract (i.e. paragraph) aligned.

We demonstrate the capabilities of our corpus by training Neural Machine Translation
(NMT) models for the main 9 language pairs, with a total of 18 models.

Licenza : CC BY 4.0
Versione : 1.1.0
Divide :

Diviso	Esempi
`'train'`	23121

Caratteristiche :

{
    "index": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "family_id": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "languages": [
            "en",
            "pt"
        ],
        "id": null,
        "_type": "Translation"
    }
}