참고자료:
수동
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:wiki_auto/manual')
- 설명 :
WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split` configs here).
- 라이센스 : CC-BY-SA 3.0
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'dev' | 73249 |
'test' | 118074 |
'train' | 373801 |
- 특징 :
{
"alignment_label": {
"num_classes": 3,
"names": [
"notAligned",
"aligned",
"partialAligned"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"normal_sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"normal_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gleu_score": {
"dtype": "float32",
"id": null,
"_type": "Value"
}
}
auto_acl
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:wiki_auto/auto_acl')
- 설명 :
WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split` configs here).
- 라이센스 : CC-BY-SA 3.0
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'full' | 488332 |
- 특징 :
{
"normal_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
자동
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:wiki_auto/auto')
- 설명 :
WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split` configs here).
- 라이센스 : CC-BY-SA 3.0
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'part_1' | 125059 |
'part_2' | 13036 |
- 특징 :
{
"example_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"normal": {
"normal_article_id": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"normal_article_title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"normal_article_url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"normal_article_content": {
"feature": {
"normal_sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"normal_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
}
},
"simple": {
"simple_article_id": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"simple_article_title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_article_url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_article_content": {
"feature": {
"simple_sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
}
},
"paragraph_alignment": {
"feature": {
"normal_paragraph_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_paragraph_id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"sentence_alignment": {
"feature": {
"normal_sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
auto_full_no_split
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:wiki_auto/auto_full_no_split')
- 설명 :
WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split` configs here).
- 라이센스 : CC-BY-SA 3.0
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'full' | 591994 |
- 특징 :
{
"normal_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
auto_full_with_split
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:wiki_auto/auto_full_with_split')
- 설명 :
WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split` configs here).
- 라이센스 : CC-BY-SA 3.0
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'full' | 483801 |
- 특징 :
{
"normal_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simple_sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}