Ссылки:
де
Используйте следующую команду, чтобы загрузить этот набор данных в TFDS:
ds = tfds.load('huggingface:mlsum/de')
- Описание :
We present MLSUM, the first large-scale MultiLingual SUMmarization dataset.
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish.
Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.
We report cross-lingual comparative analyses based on state-of-the-art systems.
These highlight existing biases which motivate the use of a multi-lingual dataset.
- Лицензия : Нет известной лицензии.
- Версия : 1.0.0
- Расколы :
Расколоть | Примеры |
---|---|
'test' | 10701 |
'train' | 220887 |
'validation' | 11394 |
- Функции :
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"summary": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"topic": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
эс
Используйте следующую команду, чтобы загрузить этот набор данных в TFDS:
ds = tfds.load('huggingface:mlsum/es')
- Описание :
We present MLSUM, the first large-scale MultiLingual SUMmarization dataset.
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish.
Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.
We report cross-lingual comparative analyses based on state-of-the-art systems.
These highlight existing biases which motivate the use of a multi-lingual dataset.
- Лицензия : Нет известной лицензии.
- Версия : 1.0.0
- Расколы :
Расколоть | Примеры |
---|---|
'test' | 13920 |
'train' | 266367 |
'validation' | 10358 |
- Функции :
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"summary": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"topic": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
пт
Используйте следующую команду, чтобы загрузить этот набор данных в TFDS:
ds = tfds.load('huggingface:mlsum/fr')
- Описание :
We present MLSUM, the first large-scale MultiLingual SUMmarization dataset.
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish.
Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.
We report cross-lingual comparative analyses based on state-of-the-art systems.
These highlight existing biases which motivate the use of a multi-lingual dataset.
- Лицензия : Нет известной лицензии.
- Версия : 1.0.0
- Расколы :
Расколоть | Примеры |
---|---|
'test' | 15828 |
'train' | 392902 |
'validation' | 16059 |
- Функции :
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"summary": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"topic": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ру
Используйте следующую команду, чтобы загрузить этот набор данных в TFDS:
ds = tfds.load('huggingface:mlsum/ru')
- Описание :
We present MLSUM, the first large-scale MultiLingual SUMmarization dataset.
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish.
Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.
We report cross-lingual comparative analyses based on state-of-the-art systems.
These highlight existing biases which motivate the use of a multi-lingual dataset.
- Лицензия : Нет известной лицензии.
- Версия : 1.0.0
- Расколы :
Расколоть | Примеры |
---|---|
'test' | 757 |
'train' | 25556 |
'validation' | 750 |
- Функции :
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"summary": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"topic": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ты
Используйте следующую команду, чтобы загрузить этот набор данных в TFDS:
ds = tfds.load('huggingface:mlsum/tu')
- Описание :
We present MLSUM, the first large-scale MultiLingual SUMmarization dataset.
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish.
Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community.
We report cross-lingual comparative analyses based on state-of-the-art systems.
These highlight existing biases which motivate the use of a multi-lingual dataset.
- Лицензия : Нет известной лицензии.
- Версия : 1.0.0
- Расколы :
Расколоть | Примеры |
---|---|
'test' | 12775 |
'train' | 249277 |
'validation' | 11565 |
- Функции :
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"summary": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"topic": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}