wiki_auto

참고자료:

수동

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:wiki_auto/manual')

설명 :

WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split`  configs here).

라이센스 : CC-BY-SA 3.0
버전 : 1.0.0
분할 :

나뉘다	예
`'dev'`	73249
`'test'`	118074
`'train'`	373801

특징 :

{
    "alignment_label": {
        "num_classes": 3,
        "names": [
            "notAligned",
            "aligned",
            "partialAligned"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "normal_sentence_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "simple_sentence_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "normal_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "simple_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "gleu_score": {
        "dtype": "float32",
        "id": null,
        "_type": "Value"
    }
}

auto_acl

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:wiki_auto/auto_acl')

설명 :

WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split`  configs here).

라이센스 : CC-BY-SA 3.0
버전 : 1.0.0
분할 :

나뉘다	예
`'full'`	488332

특징 :

{
    "normal_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "simple_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

자동

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:wiki_auto/auto')

설명 :

WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split`  configs here).

라이센스 : CC-BY-SA 3.0
버전 : 1.0.0
분할 :

나뉘다	예
`'part_1'`	125059
`'part_2'`	13036

특징 :

{
    "example_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "normal": {
        "normal_article_id": {
            "dtype": "int32",
            "id": null,
            "_type": "Value"
        },
        "normal_article_title": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "normal_article_url": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "normal_article_content": {
            "feature": {
                "normal_sentence_id": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "normal_sentence": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                }
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        }
    },
    "simple": {
        "simple_article_id": {
            "dtype": "int32",
            "id": null,
            "_type": "Value"
        },
        "simple_article_title": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "simple_article_url": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "simple_article_content": {
            "feature": {
                "simple_sentence_id": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "simple_sentence": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                }
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        }
    },
    "paragraph_alignment": {
        "feature": {
            "normal_paragraph_id": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "simple_paragraph_id": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "sentence_alignment": {
        "feature": {
            "normal_sentence_id": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "simple_sentence_id": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

auto_full_no_split

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:wiki_auto/auto_full_no_split')

설명 :

WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split`  configs here).

라이센스 : CC-BY-SA 3.0
버전 : 1.0.0
분할 :

나뉘다	예
`'full'`	591994

특징 :

{
    "normal_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "simple_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

auto_full_with_split

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:wiki_auto/auto_full_with_split')

설명 :

WikiAuto provides a set of aligned sentences from English Wikipedia and Simple English Wikipedia
as a resource to train sentence simplification systems. The authors first crowd-sourced a set of manual alignments
between sentences in a subset of the Simple English Wikipedia and their corresponding versions in English Wikipedia
(this corresponds to the `manual` config), then trained a neural CRF system to predict these alignments.
The trained model was then applied to the other articles in Simple English Wikipedia with an English counterpart to
create a larger corpus of aligned sentences (corresponding to the `auto`, `auto_acl`, `auto_full_no_split`, and `auto_full_with_split`  configs here).

라이센스 : CC-BY-SA 3.0
버전 : 1.0.0
분할 :

나뉘다	예
`'full'`	483801

특징 :

{
    "normal_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "simple_sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}