qa_srl

参考文献:

平文

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:qa_srl/plain_text')

説明：

The dataset contains question-answer pairs to model verbal predicate-argument structure. The questions start with wh-words (Who, What, Where, What, etc.) and contain a verb predicate in the sentence; the answers are phrases in the sentence. 
There were 2 datsets used in the paper, newswire and wikipedia. Unfortunately the newswiredataset is built from CoNLL-2009 English training set that is covered under license
Thus, we are providing only Wikipedia training set here. Please check README.md for more details on newswire dataset.
For the Wikipedia domain, randomly sampled sentences from the English Wikipedia (excluding questions and sentences with fewer than 10 or more than 60 words) were taken.
This new dataset is designed to solve this great NLP task and is crafted with a lot of care.

ライセンス: 不明なライセンス
バージョン: 1.0.0
分割:

スプリット	例
`'test'`	2201
`'train'`	6414
`'validation'`	2183

特徴：

{
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sent_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "predicate_idx": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "predicate": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "answers": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}