jujur_qa

Referensi:

generasi

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:truthful_qa/generation')
  • Keterangan :
TruthfulQA is a benchmark to measure whether a language model is truthful in
generating answers to questions. The benchmark comprises 817 questions that
span 38 categories, including health, law, finance and politics. Questions are
crafted so that some humans would answer falsely due to a false belief or
misconception. To perform well, models must avoid generating false answers
learned from imitating human texts.
  • Lisensi : Lisensi Apache 2.0
  • Versi : 1.1.0
  • Perpecahan :
Membelah Contoh
'validation' 817
  • Fitur :
{
    "type": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "best_answer": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "correct_answers": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "incorrect_answers": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "source": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

pilihan_ganda

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:truthful_qa/multiple_choice')
  • Keterangan :
TruthfulQA is a benchmark to measure whether a language model is truthful in
generating answers to questions. The benchmark comprises 817 questions that
span 38 categories, including health, law, finance and politics. Questions are
crafted so that some humans would answer falsely due to a false belief or
misconception. To perform well, models must avoid generating false answers
learned from imitating human texts.
  • Lisensi : Lisensi Apache 2.0
  • Versi : 1.1.0
  • Perpecahan :
Membelah Contoh
'validation' 817
  • Fitur :
{
    "question": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "mc1_targets": {
        "choices": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "labels": {
            "feature": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        }
    },
    "mc2_targets": {
        "choices": {
            "feature": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        },
        "labels": {
            "feature": {
                "dtype": "int32",
                "id": null,
                "_type": "Value"
            },
            "length": -1,
            "id": null,
            "_type": "Sequence"
        }
    }
}