サイコロ

説明：

安全性のための会話型 AI 評価 ( DICES ) データセットの多様性

機械学習アプローチは、多くの場合、肯定的な例と否定的な例を明確に区別する必要があるデータセットを使用してトレーニングおよび評価されます。このアプローチは、多くのタスクやコンテンツ項目に存在する自然な主観を過度に単純化します。また、人間の認識や意見に本来備わっている多様性が見えにくくなります。人間の内容の差異や多様性を維持しようとする作業は、多くの場合、非常に費用と労力がかかります。このギャップを埋め、より詳細なモデルのパフォーマンス分析を促進するために、AI が生成した会話の安全性について多様な視点を備えた独自のデータセットである DICES データセットを提案します。私たちは対話型AIシステムの安全性評価に重点を置いています。 DICES データセットには、各評価者に関する詳細な人口統計情報、さらなる分析の統計的有意性を確保するための会話ごとの固有の評価の非常に高度な複製が含まれており、評価者の投票を異なる人口統計にわたる分布としてエンコードして、さまざまな評価集計戦略の詳細な調査を可能にします。

このデータセットは、会話型 AI の安全性の観点から分散、曖昧さ、多様性を観察および測定するのに適しています。このデータセットには、評価者の多様性が、異なる地理的地域、民族グループ、年齢層、性別の評価者の安全認識にどのような影響を与えるかを示す一連の指標を説明した論文が付属しています。 DICES データセットの目標は、会話型 AI システムの安全性評価のための共有ベンチマークとして使用されることです。

コンテンツに関する警告: このデータセットには、攻撃的な可能性のある敵対的な会話の例が含まれています。

ホームページ: https://github.com/google-research-datasets/dices-dataset
ソースコード: tfds.datasets.dices.Builder
バージョン:
- 1.0.0 (デフォルト): 初期リリース。
監視キー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
引用：

@article{aroyo2024dices,
  title={ {DICES} dataset: Diversity in conversational {AI} evaluation for safety},
  author={Aroyo, Lora and Taylor, Alex and Diaz, Mark and Homan, Christopher and Parrish, Alicia and Serapio-Garc{\'\i}a, Gregory and Prabhakaran, Vinodkumar and Wang, Ding},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

サイコロ/350 (デフォルト設定)

構成の説明: データセット 350 には、123 人の固有の評価者からなる多様な評価者プールによって評価された 350 の会話が含まれています。各会話は、5 つの安全性トップレベルカテゴリと会話の全体的な理解に関する 1 つの質問で評価されます。評価者は性別 (男性または女性)、人種/民族 (白人、黒人、ラテン人、アジア人、多民族) によってバランスよく採用され、各評価者はすべての項目を評価しました。各評価者はすべての会話を評価しました。各会話には 123 の固有の評価があります。このデータセットの合計行数は 43050 です。
ダウンロードサイズ: 29.70 MiB
データセットのサイズ: 74.43 MiB
自動キャッシュ(ドキュメント): はい
分割:

スプリット	例
`'train'`	43,050

機能の構造:

FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_dangerous_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_derogation_of_boat': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_insensitive': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_medical_legal_financial_or_relationship_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscene_and_profane': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_sexually_suggestive_content': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_beliefs': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_gender_sexual_orientation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_inherited_attributes': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_targeting_status': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q5_political_affiliation': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_making_endorsement': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q6_policy_guidelines_polarizing_topics': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'answer_time_ms': Scalar(shape=(), dtype=int64, description=Amount of time spent by each rater on each safety annotation question.),
    'answer_timestamp': Scalar(shape=(), dtype=int64, description=Time when each conversation was rated by each rater.),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
    'safety_gold': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'safety_gold_reason': Text(shape=(), dtype=string),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ	説明
	特徴辞書
Q1_全体の会話_評価	クラスラベル		int64	会話のわかりやすさを評価します。
Q2_有害な内容_危険な内容	クラスラベル		int64
Q2_ボートの有害な内容の違反行為	クラスラベル		int64
Q2_有害なコンテンツ_非感受性	クラスラベル		int64
Q2_有害な内容_医療_法的_金融_または_関係_アドバイス	クラスラベル		int64
Q2_有害な内容_猥褻性と冒涜性	クラスラベル		int64
Q2_有害な内容_その他	クラスラベル		int64
Q2_有害なコンテンツ_全体	クラスラベル		int64
Q2_有害な内容の個人情報	クラスラベル		int64
Q2_有害物質規制品	クラスラベル		int64
Q2_有害なコンテンツ_性的示唆を与えるコンテンツ	クラスラベル		int64
Q3_バイアス_その他	クラスラベル		int64
Q3_バイアス_全体	クラスラベル		int64
Q3_バイアス_ターゲティング_信念	クラスラベル		int64
Q3_バイアス_ターゲティング_ジェンダー_性的指向	クラスラベル		int64
Q3_バイアス_ターゲティング_継承_属性	クラスラベル		int64
Q3_バイアス_ターゲティング_ステータス	クラスラベル		int64
Q4_誤った情報	クラスラベル		int64
Q5_政治的所属	クラスラベル		int64
Q6_ポリシー_ガイドライン_作成_承認	クラスラベル		int64
Q6_ポリシー_ガイドライン_その他	クラスラベル		int64
Q6_ポリシー_ガイドライン_全体	クラスラベル		int64
Q6_ポリシー_ガイドライン_分極化_トピックス	クラスラベル		int64
Q_全体	クラスラベル		int64
Answer_time_ms	スカラー		int64	各評価者が各安全性注釈の質問に費やした時間。
回答タイムスタンプ	スカラー		int64	各会話が各評価者によって評価された時間。
コンテクスト	文章		弦	会話は、チャットボットの最終応答の前に変わります。
害の程度	クラスラベル		int64	安全リスクの重大度の手書きの注釈付き評価。
害の種類	シーケンス(クラスラベル)	（なし、）	int64	会話の有害トピックに手書きで注釈を付けたもの。
ID	スカラー		int64	各行の数値識別子。単一の会話に対する単一の評価者によるすべての評価を表します。
アイテムID	スカラー		int64	各会話の数値識別子。
段階	クラスラベル		int64	3 つの異なる期間のうちの 1 つ。
評価者_年齢	クラスラベル		int64	評価者の年齢層。
評価者_教育	クラスラベル		int64	評価者の教育。
評価者の性別	クラスラベル		int64	評価者の性別。
評価者ID	スカラー		int64	各評価者の数値識別子。
評価者_レース	クラスラベル		int64	評価者の人種/民族。
評価者_生_レース	文章		弦	5 つのカテゴリに簡略化する前の、評価者の自己申告の生の人種/民族。
応答	文章		弦	会話における最後のチャットボットの応答。
安全性_ゴールド	クラスラベル		int64	専門家によって提供されるゴールドスタンダードの安全ラベル。
safety_gold_reason	文章		弦	専門家によって提供されたゴールド安全ラベルの理由（指定されている場合）。

例( tfds.as_dataframe ):

サイコロ/990

構成の説明: データセット 990 には、173 人の固有評価者からなる多様な評価者プールによって評価された 990 件の会話が含まれています。各会話は、3 つの安全性トップレベルカテゴリと 1 つの全体的な会話理解の質問で評価されます。評価者は、各会話の評価者の数が性別 (男性、女性) と地域 (米国、インド) によってバランスがとれるように採用されました。各評価者は会話のサンプルのみを評価しました。各会話には 60 ～ 70 の固有の評価があります。このデータセットの合計行数は 72103 です。
ダウンロードサイズ: 48.06 MiB
データセットのサイズ: 150.38 MiB
自動キャッシュ(ドキュメント): shuffle_files=False (トレイン) の場合のみ
分割:

スプリット	例
`'train'`	72,103

機能の構造:

FeaturesDict({
    'Q1_whole_conversation_evaluation': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'Q2_harmful_content_explicitly_sexual': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_financial_advice': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_harm_to_people_or_animals': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_health_and_safety_claims': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_how_tos_about_harmful_activities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_obscenities_or_profanities': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_personal_information': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_use_of_harmful_regulated_goods': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_promotes_or_condones_violence': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q2_harmful_content_violent_or_gory': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_3_Racial_religious_or_other_slurs_and_stereotypes_that_incites_or_promote_hatred_based_on_any_of_the_attributes_noted_above': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_calling_individuals_or_groups_evil_corrupt_or_malicious': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_incites_hatred': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_insults_about_emotional_romantic_and_or_sexual_attraction': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_physically_or_mentally_inferior': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_subhuman_comparisons': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_bias_superiority_or_subjugation_of_a_group': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q3_unfair_bias_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_conspiracy_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_contradicts_expert_consensus': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_false_theories': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_other': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q4_misinformation_political': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'Q_overall': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'context': Text(shape=(), dtype=string),
    'degree_of_harm': ClassLabel(shape=(), dtype=int64, num_classes=5),
    'harm_type': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=32)),
    'id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each row, representing all ratings by a single rater to a single conversation.),
    'item_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each conversation.),
    'phase': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_age': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_education': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'rater_gender': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'rater_id': Scalar(shape=(), dtype=int64, description=Numerical identifier for each rater.),
    'rater_locale': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'rater_race': ClassLabel(shape=(), dtype=int64, num_classes=6),
    'rater_raw_race': Text(shape=(), dtype=string),
    'response': Text(shape=(), dtype=string),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ	説明
	特徴辞書
Q1_全体の会話_評価	クラスラベル		int64	会話のわかりやすさを評価します。
Q2_有害なコンテンツ_明示的_性的	クラスラベル		int64
Q2_有害な内容_金融_アドバイス	クラスラベル		int64
Q2_人や動物に対する有害な内容	クラスラベル		int64
Q2_有害な内容の健康と安全性の主張	クラスラベル		int64
Q2_有害なコンテンツ_ハウ_トス_有害な活動について	クラスラベル		int64
Q2_有害なコンテンツの猥褻行為または冒涜行為	クラスラベル		int64
Q2_有害な内容_その他	クラスラベル		int64
Q2_有害なコンテンツ_全体	クラスラベル		int64
Q2_有害な内容の個人情報	クラスラベル		int64
Q2_有害なコンテンツの宣伝またはコンドンの有害な規制商品の使用	クラスラベル		int64
Q2_有害なコンテンツの宣伝または暴力の容認	クラスラベル		int64
Q2_有害な内容_暴力的_または残虐な内容	クラスラベル		int64
Q3_3_上記の属性のいずれかに基づいて、人種的宗教的またはその他の中傷やステレオタイプを扇動または促進する	クラスラベル		int64
Q3_個人またはグループに対する偏見による悪の腐敗または悪意の呼びかけ	クラスラベル		int64
Q3_バイアス_扇動_憎しみ	クラスラベル		int64
Q3 感情的、ロマンチック、または性的魅力についての偏見に対する侮辱	クラスラベル		int64
Q3_バイアス_その他	クラスラベル		int64
Q3_身体的または精神的に劣ったバイアス	クラスラベル		int64
Q3_バイアス_亜人_比較	クラスラベル		int64
Q3_グループの優位性または征服に関するバイアス	クラスラベル		int64
Q3_不公平なバイアス_全体	クラスラベル		int64
Q4_誤った情報_陰謀_理論	クラスラベル		int64
Q4_誤った情報_矛盾_専門家_コンセンサス	クラスラベル		int64
Q4_誤った情報_偽りの理論	クラスラベル		int64
Q4_誤情報_その他	クラスラベル		int64
Q4_誤った情報_全体	クラスラベル		int64
Q4_誤った情報_政治的	クラスラベル		int64
Q_全体	クラスラベル		int64
コンテクスト	文章		弦	会話は、チャットボットの最終応答の前に変わります。
害の程度	クラスラベル		int64	安全リスクの重大度の手書きの注釈付き評価。
害の種類	シーケンス(クラスラベル)	（なし、）	int64	会話の有害トピックに手書きで注釈を付けたもの。
ID	スカラー		int64	各行の数値識別子。単一の会話に対する単一の評価者によるすべての評価を表します。
アイテムID	スカラー		int64	各会話の数値識別子。
段階	クラスラベル		int64	3 つの異なる期間のうちの 1 つ。
評価者_年齢	クラスラベル		int64	評価者の年齢層。
評価者_教育	クラスラベル		int64	評価者の教育。
評価者の性別	クラスラベル		int64	評価者の性別。
評価者ID	スカラー		int64	各評価者の数値識別子。
評価者_ロケール	クラスラベル		int64	評価者のロケール。
評価者_レース	クラスラベル		int64	評価者の人種/民族。
評価者_生_レース	文章		弦	5 つのカテゴリに簡略化する前の、評価者の自己申告の生の人種/民族。
応答	文章		弦	会話における最後のチャットボットの応答。

例( tfds.as_dataframe ):