q_re_cc

설명 :

81,000개의 질문-답변 쌍이 있는 14,000개의 대화가 포함된 데이터세트입니다. QReCC는 TREC CAsT, QuAC 및 Google Natural Question의 질문을 기반으로 구축되었습니다.

홈페이지 : https://github.com/apple/ml-qrecc
소스 코드 : tfds.text.qrecc.QReCC
버전 :
- 1.0.0 (기본값): 최초 릴리스입니다.
다운로드 크기 : 7.60 MiB
데이터세트 크기 : 69.29 MiB
자동 캐시 ( 문서 ): 예
분할 :

나뉘다	예
`'test'`	16,451
`'train'`	63,501

기능 구조 :

FeaturesDict({
    'answer': Text(shape=(), dtype=string),
    'answer_url': Text(shape=(), dtype=string),
    'context': Sequence(Text(shape=(), dtype=string)),
    'conversation_id': Scalar(shape=(), dtype=int32, description=The id of the conversation.),
    'question': Text(shape=(), dtype=string),
    'question_rewrite': Text(shape=(), dtype=string),
    'source': Text(shape=(), dtype=string),
    'turn_id': Scalar(shape=(), dtype=int32, description=The id of the conversation turn, within a conversation.),
})

기능 문서 :

특징	수업	모양	Dtype	설명
	특징Dict
답변	텍스트		끈
답변_URL	텍스트		끈
문맥	시퀀스(텍스트)	(없음,)	끈
대화_ID	스칼라		정수32	대화의 ID입니다.
질문	텍스트		끈
질문_재작성	텍스트		끈
원천	텍스트		끈	데이터의 원본 소스 - QuAC, CAsT 또는 Natural Question 중 하나
턴_ID	스칼라		정수32	대화 내에서 대화 차례의 ID입니다.

감독되는 키 ( as_supervised doc 참조): None
그림 ( tfds.show_examples ): 지원되지 않습니다.
예 ( tfds.as_dataframe ):

인용 :

@article{qrecc,
  title={Open-Domain Question Answering Goes Conversational via Question Rewriting},
  author={Anantha, Raviteja and Vakulenko, Svitlana and Tu, Zhucheng and Longpre, Shayne and Pulman, Stephen and Chappidi, Srinivas},
  journal={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  year={2021}
}