asimov_dilemmas_auto_val

  • Description:

Binary dilemma questions generated from counterfactual situations used to auto-amend generated constitutions (validation set).

Split Examples
'val' 34
  • Feature structure:
FeaturesDict({
    'answer': Text(shape=(), dtype=string),
    'prompt_with_constitution': Text(shape=(), dtype=string),
    'prompt_with_constitution_antijailbreak': Text(shape=(), dtype=string),
    'prompt_with_constitution_antijailbreak_adversary': Text(shape=(), dtype=string),
    'prompt_with_constitution_antijailbreak_adversary_parts': Sequence(Text(shape=(), dtype=string)),
    'prompt_with_constitution_antijailbreak_parts': Sequence(Text(shape=(), dtype=string)),
    'prompt_with_constitution_parts': Sequence(Text(shape=(), dtype=string)),
    'prompt_without_constitution': Text(shape=(), dtype=string),
    'prompt_without_constitution_parts': Sequence(Text(shape=(), dtype=string)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
answer Text string
prompt_with_constitution Text string
prompt_with_constitution_antijailbreak Text string
prompt_with_constitution_antijailbreak_adversary Text string
prompt_with_constitution_antijailbreak_adversary_parts Sequence(Text) (None,) string
prompt_with_constitution_antijailbreak_parts Sequence(Text) (None,) string
prompt_with_constitution_parts Sequence(Text) (None,) string
prompt_without_constitution Text string
prompt_without_constitution_parts Sequence(Text) (None,) string
  • Citation:
@article{sermanet2025asimov,
  author    = {Pierre Sermanet and Anirudha Majumdar and Alex Irpan and Dmitry Kalashnikov and Vikas Sindhwani},
  title     = {Generating Robot Constitutions & Benchmarks for Semantic Safety},
  journal   = {arXiv preprint arXiv:2503.08663},
  url       = {https://arxiv.org/abs/2503.08663},
  year      = {2025},
}