TFDS اکنون از فرمت Croissant 🥐 پشتیبانی می کند! برای دانستن بیشتر مستندات را بخوانید.

این صفحه به‌وسیله ‏Cloud Translation API‏ ترجمه شده است.

wiki_dpr

مراجع:

psgs_w100.nq.exact

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.exact')

توضیحات :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 0.0.0
تقسیمات :

تقسیم کنید	نمونه ها
`'train'`	21015300

ویژگی ها :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.compressed

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.compressed')

توضیحات :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 0.0.0
تقسیمات :

تقسیم کنید	نمونه ها
`'train'`	21015300

ویژگی ها :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.nq.no_index

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.nq.no_index')

توضیحات :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 0.0.0
تقسیمات :

تقسیم کنید	نمونه ها
`'train'`	21015300

ویژگی ها :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.exact

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.exact')

توضیحات :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 0.0.0
تقسیمات :

تقسیم کنید	نمونه ها
`'train'`	21015300

ویژگی ها :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.compressed

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.compressed')

توضیحات :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 0.0.0
تقسیمات :

تقسیم کنید	نمونه ها
`'train'`	21015300

ویژگی ها :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

psgs_w100.multiset.no_index

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:wiki_dpr/psgs_w100.multiset.no_index')

توضیحات :

This is the wikipedia split used to evaluate the Dense Passage Retrieval (DPR) model.
It contains 21M passages from wikipedia along with their DPR embeddings.
The wikipedia articles were split into multiple, disjoint text blocks of 100 words as passages.

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 0.0.0
تقسیمات :

تقسیم کنید	نمونه ها
`'train'`	21015300

ویژگی ها :

{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "embeddings": {
        "feature": {
            "dtype": "float32",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}