참고자료:
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:msr_text_compression')
- 설명 :
This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from the Open American National Corpus (ww.anc.org) and crowd-sourcing.
- 라이센스 : Microsoft 연구 데이터 라이센스 계약
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 785 |
'train' | 4936 |
'validation' | 447 |
- 특징 :
{
"source_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"domain": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"source_text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"targets": {
"feature": {
"compressed_text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"judge_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"num_ratings": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"ratings": {
"feature": {
"dtype": "int64",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}