qa_srl

Tài liệu tham khảo:

văn bản đơn giản

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:qa_srl/plain_text')
  • Sự miêu tả :
The dataset contains question-answer pairs to model verbal predicate-argument structure. The questions start with wh-words (Who, What, Where, What, etc.) and contain a verb predicate in the sentence; the answers are phrases in the sentence. 
There were 2 datsets used in the paper, newswire and wikipedia. Unfortunately the newswiredataset is built from CoNLL-2009 English training set that is covered under license
Thus, we are providing only Wikipedia training set here. Please check README.md for more details on newswire dataset.
For the Wikipedia domain, randomly sampled sentences from the English Wikipedia (excluding questions and sentences with fewer than 10 or more than 60 words) were taken.
This new dataset is designed to solve this great NLP task and is crafted with a lot of care.
  • Giấy phép : Không có giấy phép được biết đến
  • Phiên bản : 1.0.0
  • Chia tách :
Tách ra Ví dụ
'test' 2201
'train' 6414
'validation' 2183
  • Đặc trưng :
{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "predicate_idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "predicate": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "answers": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}