open_subtitles

مراجع:

بكالوريوس EO

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:open_subtitles/bs-eo')

وصف :

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2018.0.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	10989

سمات :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "meta": {
        "year": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "imdbId": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "subtitleId": {
            "bs": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            },
            "eo": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            }
        },
        "sentenceIds": {
            "bs": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "eo": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        }
    },
    "translation": {
        "languages": [
            "bs",
            "eo"
        ],
        "id": null,
        "_type": "Translation"
    }
}

fr-hy

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:open_subtitles/fr-hy')

وصف :

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2018.0.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	668

سمات :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "meta": {
        "year": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "imdbId": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "subtitleId": {
            "fr": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            },
            "hy": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            }
        },
        "sentenceIds": {
            "fr": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "hy": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        }
    },
    "translation": {
        "languages": [
            "fr",
            "hy"
        ],
        "id": null,
        "_type": "Translation"
    }
}

دا رو

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:open_subtitles/da-ru')

وصف :

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2018.0.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	7543012

سمات :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "meta": {
        "year": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "imdbId": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "subtitleId": {
            "da": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            },
            "ru": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            }
        },
        "sentenceIds": {
            "da": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "ru": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        }
    },
    "translation": {
        "languages": [
            "da",
            "ru"
        ],
        "id": null,
        "_type": "Translation"
    }
}

أون مرحبا

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:open_subtitles/en-hi')

وصف :

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2018.0.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	93016

سمات :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "meta": {
        "year": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "imdbId": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "subtitleId": {
            "en": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            },
            "hi": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            }
        },
        "sentenceIds": {
            "en": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "hi": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        }
    },
    "translation": {
        "languages": [
            "en",
            "hi"
        ],
        "id": null,
        "_type": "Translation"
    }
}

مليار هو

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:open_subtitles/bn-is')

وصف :

This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.

Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!

This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.

62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2018.0.0
الإنشقاقات :

ينقسم	أمثلة
`'train'`	38272

سمات :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "meta": {
        "year": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "imdbId": {
            "dtype": "uint32",
            "id": null,
            "_type": "Value"
        },
        "subtitleId": {
            "bn": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            },
            "is": {
                "dtype": "uint32",
                "id": null,
                "_type": "Value"
            }
        },
        "sentenceIds": {
            "bn": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "is": {
                "feature": {
                    "dtype": "uint32",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        }
    },
    "translation": {
        "languages": [
            "bn",
            "is"
        ],
        "id": null,
        "_type": "Translation"
    }
}