Tài liệu tham khảo:
v-nl
Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:
ds = tfds.load('huggingface:php/fi-nl')
- Sự miêu tả :
A parallel corpus originally extracted from http://se.php.net/download-docs.php. The original documents are written in English and have been partly translated into 21 languages. The original manuals contain about 500,000 words. The amount of actually translated texts varies for different languages between 50,000 and 380,000 words. The corpus is rather noisy and may include parts from the English original in some of the translations. The corpus is tokenized and each language pair has been sentence aligned.
23 languages, 252 bitexts
total number of files: 71,414
total number of tokens: 3.28M
total number of sentence fragments: 1.38M
- Giấy phép : Không có giấy phép được biết đến
- Phiên bản : 1.0.0
- Chia tách :
Tách ra | Ví dụ |
---|---|
'train' | 27870 |
- Đặc trưng :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fi",
"nl"
],
"id": null,
"_type": "Translation"
}
}
nó-ro
Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:
ds = tfds.load('huggingface:php/it-ro')
- Sự miêu tả :
A parallel corpus originally extracted from http://se.php.net/download-docs.php. The original documents are written in English and have been partly translated into 21 languages. The original manuals contain about 500,000 words. The amount of actually translated texts varies for different languages between 50,000 and 380,000 words. The corpus is rather noisy and may include parts from the English original in some of the translations. The corpus is tokenized and each language pair has been sentence aligned.
23 languages, 252 bitexts
total number of files: 71,414
total number of tokens: 3.28M
total number of sentence fragments: 1.38M
- Giấy phép : Không có giấy phép được biết đến
- Phiên bản : 1.0.0
- Chia tách :
Tách ra | Ví dụ |
---|---|
'train' | 28507 |
- Đặc trưng :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"it",
"ro"
],
"id": null,
"_type": "Translation"
}
}
nl-sv
Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:
ds = tfds.load('huggingface:php/nl-sv')
- Sự miêu tả :
A parallel corpus originally extracted from http://se.php.net/download-docs.php. The original documents are written in English and have been partly translated into 21 languages. The original manuals contain about 500,000 words. The amount of actually translated texts varies for different languages between 50,000 and 380,000 words. The corpus is rather noisy and may include parts from the English original in some of the translations. The corpus is tokenized and each language pair has been sentence aligned.
23 languages, 252 bitexts
total number of files: 71,414
total number of tokens: 3.28M
total number of sentence fragments: 1.38M
- Giấy phép : Không có giấy phép được biết đến
- Phiên bản : 1.0.0
- Chia tách :
Tách ra | Ví dụ |
---|---|
'train' | 28079 |
- Đặc trưng :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"nl",
"sv"
],
"id": null,
"_type": "Translation"
}
}
en-nó
Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:
ds = tfds.load('huggingface:php/en-it')
- Sự miêu tả :
A parallel corpus originally extracted from http://se.php.net/download-docs.php. The original documents are written in English and have been partly translated into 21 languages. The original manuals contain about 500,000 words. The amount of actually translated texts varies for different languages between 50,000 and 380,000 words. The corpus is rather noisy and may include parts from the English original in some of the translations. The corpus is tokenized and each language pair has been sentence aligned.
23 languages, 252 bitexts
total number of files: 71,414
total number of tokens: 3.28M
total number of sentence fragments: 1.38M
- Giấy phép : Không có giấy phép được biết đến
- Phiên bản : 1.0.0
- Chia tách :
Tách ra | Ví dụ |
---|---|
'train' | 35538 |
- Đặc trưng :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"it"
],
"id": null,
"_type": "Translation"
}
}
en-fr
Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:
ds = tfds.load('huggingface:php/en-fr')
- Sự miêu tả :
A parallel corpus originally extracted from http://se.php.net/download-docs.php. The original documents are written in English and have been partly translated into 21 languages. The original manuals contain about 500,000 words. The amount of actually translated texts varies for different languages between 50,000 and 380,000 words. The corpus is rather noisy and may include parts from the English original in some of the translations. The corpus is tokenized and each language pair has been sentence aligned.
23 languages, 252 bitexts
total number of files: 71,414
total number of tokens: 3.28M
total number of sentence fragments: 1.38M
- Giấy phép : Không có giấy phép được biết đến
- Phiên bản : 1.0.0
- Chia tách :
Tách ra | Ví dụ |
---|---|
'train' | 42222 |
- Đặc trưng :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"fr"
],
"id": null,
"_type": "Translation"
}
}