TFDS hiện hỗ trợ định dạng Croissant 🥐 ! Đọc tài liệu để biết thêm.

Trang này được dịch bởi Cloud Translation API.

wikipedia

Tài liệu tham khảo:

20220301.de

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:wikipedia/20220301.de')

Sự miêu tả :

Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).

Giấy phép : Không có giấy phép được biết đến
Phiên bản : 2.0.0
Chia tách :

Tách ra	Ví dụ
`'train'`	2665357

Đặc trưng :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

20220301.en

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:wikipedia/20220301.en')

Sự miêu tả :

Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).

Giấy phép : Không có giấy phép được biết đến
Phiên bản : 2.0.0
Chia tách :

Tách ra	Ví dụ
`'train'`	6458670

Đặc trưng :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

20220301.fr

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:wikipedia/20220301.fr')

Sự miêu tả :

Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).

Giấy phép : Không có giấy phép được biết đến
Phiên bản : 2.0.0
Chia tách :

Tách ra	Ví dụ
`'train'`	2402095

Đặc trưng :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

20220301.frr

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:wikipedia/20220301.frr')

Sự miêu tả :

Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).

Giấy phép : Không có giấy phép được biết đến
Phiên bản : 2.0.0
Chia tách :

Tách ra	Ví dụ
`'train'`	15199

Đặc trưng :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

20220301.it

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:wikipedia/20220301.it')

Sự miêu tả :

Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).

Giấy phép : Không có giấy phép được biết đến
Phiên bản : 2.0.0
Chia tách :

Tách ra	Ví dụ
`'train'`	1743035

Đặc trưng :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

20220301.đơn giản

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:wikipedia/20220301.simple')

Sự miêu tả :

Wikipedia dataset containing cleaned articles of all languages.
The datasets are built from the Wikipedia dump
(https://dumps.wikimedia.org/) with one split per language. Each example
contains the content of one full Wikipedia article with cleaning to strip
markdown and unwanted sections (references, etc.).

Giấy phép : Không có giấy phép được biết đến
Phiên bản : 2.0.0
Chia tách :

Tách ra	Ví dụ
`'train'`	205328

Đặc trưng :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}