TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

real_toxicity_prompts

Description:

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

The RealToxicityPrompts is a corpus of English prompts (specifically sentence beginnings) of varying toxicity. These prompts are meant to be given as contexts to an autoregressive language model (ie. GPT-2) and used for generating completions.

More details are presented in the original paper.

Homepage: https://github.com/allenai/real-toxicity-prompts
Source code: tfds.datasets.real_toxicity_prompts.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 3.46 GiB
Dataset size: 81.22 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'train'`	99,442

Feature structure:

FeaturesDict({
    'begin': int32,
    'challenging': bool,
    'continuation': FeaturesDict({
        'flirtation': float32,
        'identity_attack': float32,
        'insult': float32,
        'profanity': float32,
        'severe_toxicity': float32,
        'sexually_explicit': float32,
        'text': Text(shape=(), dtype=string),
        'threat': float32,
        'toxicity': float32,
    }),
    'end': int32,
    'filename': Text(shape=(), dtype=string),
    'prompt': FeaturesDict({
        'flirtation': float32,
        'identity_attack': float32,
        'insult': float32,
        'profanity': float32,
        'severe_toxicity': float32,
        'sexually_explicit': float32,
        'text': Text(shape=(), dtype=string),
        'threat': float32,
        'toxicity': float32,
    }),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
begin	Tensor	int32
challenging	Tensor	bool
continuation	FeaturesDict
continuation/flirtation	Tensor	float32
continuation/identity_attack	Tensor	float32
continuation/insult	Tensor	float32
continuation/profanity	Tensor	float32
continuation/severe_toxicity	Tensor	float32
continuation/sexually_explicit	Tensor	float32
continuation/text	Text	string
continuation/threat	Tensor	float32
continuation/toxicity	Tensor	float32
end	Tensor	int32
filename	Text	string
prompt	FeaturesDict
prompt/flirtation	Tensor	float32
prompt/identity_attack	Tensor	float32
prompt/insult	Tensor	float32
prompt/profanity	Tensor	float32
prompt/severe_toxicity	Tensor	float32
prompt/sexually_explicit	Tensor	float32
prompt/text	Text	string
prompt/threat	Tensor	float32
prompt/toxicity	Tensor	float32

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{gehman2020realtoxicityprompts,
  title={Realtoxicityprompts: Evaluating neural toxic degeneration in language models},
  author={Gehman, Samuel and Gururangan, Suchin and Sap, Maarten and Choi, Yejin and Smith, Noah A},
  journal={arXiv preprint arXiv:2009.11462},
  year={2020}
}