참고자료:
bs-eo
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:open_subtitles/bs-eo')
- 설명 :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- 라이센스 : 알려진 라이센스 없음
- 버전 : 2018.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 10989 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"bs": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"eo": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"bs": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"eo": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"bs",
"eo"
],
"id": null,
"_type": "Translation"
}
}
fr-hy
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:open_subtitles/fr-hy')
- 설명 :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- 라이센스 : 알려진 라이센스 없음
- 버전 : 2018.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 668 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"fr": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"hy": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"fr": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"hy": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"fr",
"hy"
],
"id": null,
"_type": "Translation"
}
}
다루
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:open_subtitles/da-ru')
- 설명 :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- 라이센스 : 알려진 라이센스 없음
- 버전 : 2018.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 7543012 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"da": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"ru": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"da": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ru": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"da",
"ru"
],
"id": null,
"_type": "Translation"
}
}
엔하이
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:open_subtitles/en-hi')
- 설명 :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- 라이센스 : 알려진 라이센스 없음
- 버전 : 2018.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 93016 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"en": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"hi": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"en": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"hi": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"en",
"hi"
],
"id": null,
"_type": "Translation"
}
}
bn-is
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:open_subtitles/bn-is')
- 설명 :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- 라이센스 : 알려진 라이센스 없음
- 버전 : 2018.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 38272 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"bn": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"is": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"bn": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"is": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"bn",
"is"
],
"id": null,
"_type": "Translation"
}
}