참고자료:
CDSC-e
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:cdsc/cdsc-e')
- 설명 :
Polish CDSCorpus consists of 10K Polish sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish. The dataset was presented at ACL 2017. Please refer to the Wróblewska and Krasnowska-Kieraś (2017) for a detailed description of the resource.
- 라이센스 : CC BY-NC-SA 4.0
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 1000 |
'train' | 8000 |
'validation' | 1000 |
- 특징 :
{
"pair_ID": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"sentence_A": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_B": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"entailment_judgment": {
"num_classes": 3,
"names": [
"NEUTRAL",
"CONTRADICTION",
"ENTAILMENT"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
}
}
CDSC-R
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:cdsc/cdsc-r')
- 설명 :
Polish CDSCorpus consists of 10K Polish sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish. The dataset was presented at ACL 2017. Please refer to the Wróblewska and Krasnowska-Kieraś (2017) for a detailed description of the resource.
- 라이센스 : CC BY-NC-SA 4.0
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 1000 |
'train' | 8000 |
'validation' | 1000 |
- 특징 :
{
"pair_ID": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"sentence_A": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_B": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"relatedness_score": {
"dtype": "float32",
"id": null,
"_type": "Value"
}
}