msr_text_compression

อ้างอิง:

ใช้คำสั่งต่อไปนี้เพื่อโหลดชุดข้อมูลนี้ใน TFDS:

ds = tfds.load('huggingface:msr_text_compression')
  • คำอธิบาย :
This dataset contains sentences and short paragraphs with corresponding shorter (compressed) versions. There are up to five compressions for each input text, together with quality judgements of their meaning preservation and grammaticality. The dataset is derived using source texts from the Open American National Corpus (ww.anc.org) and crowd-sourcing.
  • ใบอนุญาต : ข้อตกลงสิทธิ์การใช้งานข้อมูลการวิจัยของ Microsoft
  • เวอร์ชั่น : 1.1.0
  • แยก :
แยก ตัวอย่าง
'test' 785
'train' 4936
'validation' 447
  • คุณสมบัติ :
{
    "source_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "domain": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "source_text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "targets": {
        "feature": {
            "compressed_text": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "judge_id": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "num_ratings": {
                "dtype": "int64",
                "id": null,
                "_type": "Value"
            },
            "ratings": {
                "feature": {
                    "dtype": "int64",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}