- विवरण :
जीईएम मानव एनोटेशन और स्वचालित मेट्रिक्स दोनों के माध्यम से इसके मूल्यांकन पर ध्यान देने के साथ प्राकृतिक भाषा निर्माण के लिए एक बेंचमार्क वातावरण है।
GEM का लक्ष्य है: (1) कई NLG कार्यों और भाषाओं में फैले 13 डेटासेट में NLG की प्रगति को मापना। (2) डेटा स्टेटमेंट और चैलेंज सेट के माध्यम से प्रस्तुत डेटा और मॉडल का गहन विश्लेषण प्रदान करें। (3) स्वचालित और मानव मैट्रिक्स दोनों का उपयोग करके उत्पन्न पाठ के मूल्यांकन के लिए मानक विकसित करना।
अधिक जानकारी https://gem-benchmark.com पर मिल सकती है।
होमपेज : https://gem-benchmark.com
स्रोत कोड :
tfds.text.gem.Gem
संस्करण :
-
1.0.0
: प्रारंभिक संस्करण -
1.0.1
: MLSum के लिए ख़राब लिंक फ़िल्टर अपडेट करें -
1.1.0
(डिफ़ॉल्ट): चैलेंज सेट की रिलीज़
-
पर्यवेक्षित कुंजियाँ (
as_supervised
doc देखें):None
चित्र ( tfds.show_examples ): समर्थित नहीं है।
मणि/common_gen (डिफ़ॉल्ट कॉन्फ़िगरेशन)
कॉन्फिग विवरण : कॉमनजेन एक विवश पाठ निर्माण कार्य है, जो बेंचमार्क डेटासेट से जुड़ा है, स्पष्ट रूप से जनरेटिव कॉमन्सेंस रीजनिंग की क्षमता के लिए मशीनों का परीक्षण करता है। सामान्य अवधारणाओं के एक सेट को देखते हुए; कार्य इन अवधारणाओं का उपयोग करके रोजमर्रा के परिदृश्य का वर्णन करने वाला एक सुसंगत वाक्य उत्पन्न करना है।
डाउनलोड आकार :
1.84 MiB
डेटासेट का आकार :
16.84 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 1,497 |
'train' | 67,389 |
'validation' | 993 |
- फ़ीचर संरचना :
FeaturesDict({
'concept_set_id': int32,
'concepts': Sequence(string),
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
अवधारणा_सेट_आईडी | टेन्सर | int32 | ||
अवधारणाओं | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{lin2020commongen,
title = "CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning",
author = "Lin, Bill Yuchen and
Zhou, Wangchunshu and
Shen, Ming and
Zhou, Pei and
Bhagavatula, Chandra and
Choi, Yejin and
Ren, Xiang",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.165",
pages = "1823--1840",
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
जेम/cs_restaurants
कॉन्फिग विवरण : कार्य एक (काल्पनिक) संवाद प्रणाली के संदर्भ में प्रतिक्रिया उत्पन्न कर रहा है जो रेस्तरां के बारे में जानकारी प्रदान करता है। इनपुट एक मूल आशय/संवाद अधिनियम प्रकार और स्लॉट्स (विशेषताओं) और उनके मूल्यों की एक सूची है। आउटपुट एक प्राकृतिक भाषा वाक्य है।
डाउनलोड आकार :
1.46 MiB
डेटासेट का आकार :
2.71 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 842 |
'train' | 3,569 |
'validation' | 781 |
- फ़ीचर संरचना :
FeaturesDict({
'dialog_act': string,
'dialog_act_delexicalized': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'target_delexicalized': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
डायलॉग_एक्ट | टेन्सर | डोरी | ||
dialo_act_delexicalized | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी | ||
target_delexicalized | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{cs_restaurants,
address = {Tokyo, Japan},
title = {Neural {Generation} for {Czech}: {Data} and {Baselines} },
shorttitle = {Neural {Generation} for {Czech} },
url = {https://www.aclweb.org/anthology/W19-8670/},
urldate = {2019-10-18},
booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
author = {Dušek, Ondřej and Jurčíček, Filip},
month = oct,
year = {2019},
pages = {563--574}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि / डार्ट
कॉन्फिग विवरण : DART उच्च गुणवत्ता वाले वाक्य एनोटेशन के साथ एक बड़ा और ओपन-डोमेन संरचित डेटा रिकॉर्ड टू टेक्स्ट जनरेशन कॉर्पस है, जिसमें प्रत्येक इनपुट ट्री-स्ट्रक्चर्ड ऑन्कोलॉजी के बाद एंटिटी-रिलेशन ट्रिपल का एक सेट है।
डाउनलोड आकार :
28.01 MiB
डेटासेट का आकार :
33.78 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 6,959 |
'train' | 62,659 |
'validation' | 2,768 |
- फ़ीचर संरचना :
FeaturesDict({
'dart_id': int32,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'subtree_was_extended': bool,
'target': string,
'target_sources': Sequence(string),
'tripleset': Sequence(string),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
dart_id | टेन्सर | int32 | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
subtree_was_extended | टेन्सर | बूल | ||
लक्ष्य | टेन्सर | डोरी | ||
target_sources | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
tripleset | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@article{radev2020dart,
title=Dart: Open-domain structured data record to text generation,
author={Radev, Dragomir and Zhang, Rui and Rau, Amrit and Sivaprasad, Abhinand and Hsieh, Chiachun and Rajani, Nazneen Fatema and Tang, Xiangru and Vyas, Aadit and Verma, Neha and Krishna, Pranav and others},
journal={arXiv preprint arXiv:2007.02871},
year={2020}
}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/e2e_nlg
कॉन्फ़िगरेशन विवरण : E2E डेटासेट को एक सीमित-डोमेन डेटा-टू-टेक्स्ट कार्य के लिए डिज़ाइन किया गया है - 8 अलग-अलग विशेषताओं (नाम, क्षेत्र, मूल्य सीमा आदि) के आधार पर रेस्तरां विवरण/सिफारिशें तैयार करना।
डाउनलोड आकार :
13.99 MiB
डेटासेट का आकार :
16.92 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 4,693 |
'train' | 33,525 |
'validation' | 4,299 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'meaning_representation': string,
'references': Sequence(string),
'target': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
meaning_representation | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{e2e_cleaned,
address = {Tokyo, Japan},
title = {Semantic {Noise} {Matters} for {Neural} {Natural} {Language} {Generation} },
url = {https://www.aclweb.org/anthology/W19-8652/},
booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
author = {Dušek, Ondřej and Howcroft, David M and Rieser, Verena},
year = {2019},
pages = {421--426},
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/mlsum_de
कॉन्फ़िग विवरण : MLSum एक बड़े पैमाने पर बहुभाषी सारांश डेटासेट है। यह ऑनलाइन समाचार आउटलेट्स से निर्मित है, यह विभाजन जर्मन पर केंद्रित है।
डाउनलोड आकार :
345.98 MiB
डेटासेट का आकार :
963.60 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_covid' | 5,058 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 10,695 |
'train' | 220,748 |
'validation' | 11,392 |
- फ़ीचर संरचना :
FeaturesDict({
'date': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'text': string,
'title': string,
'topic': string,
'url': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
दिनांक | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी | ||
मूलपाठ | टेन्सर | डोरी | ||
शीर्षक | टेन्सर | डोरी | ||
विषय | टेन्सर | डोरी | ||
यूआरएल | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{scialom-etal-2020-mlsum,
title = "{MLSUM}: The Multilingual Summarization Corpus",
author = {Scialom, Thomas and Dray, Paul-Alexis and Lamprier, Sylvain and Piwowarski, Benjamin and Staiano, Jacopo},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/mlsum_es
कॉन्फ़िग विवरण : MLSum एक बड़े पैमाने पर बहुभाषी सारांश डेटासेट है। यह ऑनलाइन समाचार आउटलेट्स से निर्मित है, यह विभाजन स्पेनिश पर केंद्रित है।
डाउनलोड आकार :
501.27 MiB
डेटासेट का आकार :
1.29 GiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_covid' | 1,938 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 13,366 |
'train' | 259,888 |
'validation' | 9,977 |
- फ़ीचर संरचना :
FeaturesDict({
'date': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'text': string,
'title': string,
'topic': string,
'url': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
दिनांक | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी | ||
मूलपाठ | टेन्सर | डोरी | ||
शीर्षक | टेन्सर | डोरी | ||
विषय | टेन्सर | डोरी | ||
यूआरएल | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{scialom-etal-2020-mlsum,
title = "{MLSUM}: The Multilingual Summarization Corpus",
author = {Scialom, Thomas and Dray, Paul-Alexis and Lamprier, Sylvain and Piwowarski, Benjamin and Staiano, Jacopo},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/स्कीमा_गाइडेड_डायलॉग
कॉन्फ़िगरेशन विवरण : स्कीमा-गाइडेड डायलॉग (SGD) डेटासेट में एक मानव और एक आभासी सहायक के बीच 18K बहु-डोमेन कार्य-उन्मुख संवाद होते हैं, जिसमें बैंकों और घटनाओं से लेकर मीडिया, कैलेंडर, यात्रा और मौसम तक के 17 डोमेन शामिल होते हैं।
डाउनलोड आकार :
17.00 MiB
डेटासेट का आकार :
201.19 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हां (challenge_test_backtranslation, Challenge_test_bfp02, Challenge_test_bfp05, Challenge_test_nopunc, Challenge_test_scramble, Challenge_train_sample, Challenge_validation_sample, परीक्षण, सत्यापन), केवल जब
shuffle_files=False
(ट्रेन)विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_backtranslation' | 500 |
'challenge_test_bfp02' | 500 |
'challenge_test_bfp05' | 500 |
'challenge_test_nopunc' | 500 |
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 10,000 |
'train' | 164,982 |
'validation' | 10,000 |
- फ़ीचर संरचना :
FeaturesDict({
'context': Sequence(string),
'dialog_acts': Sequence({
'act': ClassLabel(shape=(), dtype=int64, num_classes=18),
'slot': string,
'values': Sequence(string),
}),
'dialog_id': string,
'gem_id': string,
'gem_parent_id': string,
'prompt': string,
'references': Sequence(string),
'service': string,
'target': string,
'turn_id': int32,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
dial_acts | क्रम | |||
डायलॉग_एक्ट्स/एक्ट | क्लासलेबल | int64 | ||
डायलॉग_एक्ट्स/स्लॉट | टेन्सर | डोरी | ||
डायलॉग_एक्ट्स/वैल्यू | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
डायलॉग_आईडी | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
तत्पर | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
सर्विस | टेन्सर | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
टर्न_आईडी | टेन्सर | int32 |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@article{rastogi2019towards,
title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},
author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
journal={arXiv preprint arXiv:1909.05855},
year={2019}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
रत्न / टोटो
कॉन्फ़िग विवरण : ToTTo एक टेबल-टू-टेक्स्ट NLG कार्य है। कार्य इस प्रकार है: पंक्ति नाम, स्तंभ नाम और तालिका कक्षों के साथ एक विकिपीडिया तालिका दी गई है, जिसमें कोशिकाओं का एक सबसेट हाइलाइट किया गया है, तालिका के हाइलाइट किए गए भाग के लिए एक प्राकृतिक भाषा विवरण तैयार करें।
डाउनलोड आकार :
180.75 MiB
डेटासेट का आकार :
645.86 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 7,700 |
'train' | 121,153 |
'validation' | 7,700 |
- फ़ीचर संरचना :
FeaturesDict({
'example_id': string,
'gem_id': string,
'gem_parent_id': string,
'highlighted_cells': Sequence(Sequence(int32)),
'overlap_subset': string,
'references': Sequence(string),
'sentence_annotations': Sequence({
'final_sentence': string,
'original_sentence': string,
'sentence_after_ambiguity': string,
'sentence_after_deletion': string,
}),
'table': Sequence(Sequence({
'column_span': int32,
'is_header': bool,
'row_span': int32,
'value': string,
})),
'table_page_title': string,
'table_section_text': string,
'table_section_title': string,
'table_webpage_url': string,
'target': string,
'totto_id': int32,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
example_id | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
Highlight_cells | अनुक्रम (अनुक्रम (टेंसर)) | (कोई नहीं, कोई नहीं) | int32 | |
ओवरलैप_उपसेट | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
वाक्य_टिप्पणियाँ | क्रम | |||
वाक्य_एनोटेशन/अंतिम_वाक्य | टेन्सर | डोरी | ||
वाक्य_टिप्पणियां/मूल_वाक्य | टेन्सर | डोरी | ||
वाक्य_टिप्पणियां/वाक्य_बाद_अस्पष्टता | टेन्सर | डोरी | ||
वाक्य_टिप्पणी/वाक्य_बाद_हटाना | टेन्सर | डोरी | ||
मेज़ | क्रम | |||
टेबल/कॉलम_स्पैन | टेन्सर | int32 | ||
टेबल/is_header | टेन्सर | बूल | ||
टेबल/row_span | टेन्सर | int32 | ||
तालिका / मूल्य | टेन्सर | डोरी | ||
टेबल_पेज_टाइटल | टेन्सर | डोरी | ||
टेबल_सेक्शन_टेक्स्ट | टेन्सर | डोरी | ||
टेबल_सेक्शन_टाइटल | टेन्सर | डोरी | ||
टेबल_वेबपेज_url | टेन्सर | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
totto_id | टेन्सर | int32 |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{parikh2020totto,
title=ToTTo: A Controlled Table-To-Text Generation Dataset,
author={Parikh, Ankur and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={1173--1186},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
रत्न/web_nlg_en
कॉन्फिग विवरण : वेबएनएलजी समानांतर डीबीपीडिया ट्रिपल सेट और छोटे टेक्स्ट का एक द्विभाषी डेटासेट (अंग्रेजी, रूसी) है जो लगभग 450 विभिन्न डीबीपीडिया गुणों को कवर करता है। वेबएनएलजी डेटा मूल रूप से लघु पाठ उत्पन्न करने और माइक्रो-प्लानिंग को संभालने में सक्षम आरडीएफ वर्बलाइजर्स के विकास को बढ़ावा देने के लिए बनाया गया था।
डाउनलोड आकार :
12.57 MiB
डेटासेट का आकार :
19.91 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_numbers' | 500 |
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 502 |
'challenge_validation_sample' | 499 |
'test' | 1,779 |
'train' | 35,426 |
'validation' | 1,667 |
- फ़ीचर संरचना :
FeaturesDict({
'category': string,
'gem_id': string,
'gem_parent_id': string,
'input': Sequence(string),
'references': Sequence(string),
'target': string,
'webnlg_id': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
श्रेणी | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
इनपुट | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी | ||
webnlg_id | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{gardent2017creating,
author = "Gardent, Claire
and Shimorina, Anastasia
and Narayan, Shashi
and Perez-Beltrachini, Laura",
title = "Creating Training Corpora for NLG Micro-Planners",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "179--188",
location = "Vancouver, Canada",
doi = "10.18653/v1/P17-1017",
url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/web_nlg_ru
कॉन्फिग विवरण : वेबएनएलजी समानांतर डीबीपीडिया ट्रिपल सेट और छोटे टेक्स्ट का एक द्विभाषी डेटासेट (अंग्रेजी, रूसी) है जो लगभग 450 विभिन्न डीबीपीडिया गुणों को कवर करता है। वेबएनएलजी डेटा मूल रूप से लघु पाठ उत्पन्न करने और माइक्रो-प्लानिंग को संभालने में सक्षम आरडीएफ वर्बलाइजर्स के विकास को बढ़ावा देने के लिए बनाया गया था।
डाउनलोड आकार :
7.49 MiB
डेटासेट का आकार :
11.30 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 501 |
'challenge_validation_sample' | 500 |
'test' | 1,102 |
'train' | 14,630 |
'validation' | 790 |
- फ़ीचर संरचना :
FeaturesDict({
'category': string,
'gem_id': string,
'gem_parent_id': string,
'input': Sequence(string),
'references': Sequence(string),
'target': string,
'webnlg_id': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
श्रेणी | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
इनपुट | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी | ||
webnlg_id | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{gardent2017creating,
author = "Gardent, Claire
and Shimorina, Anastasia
and Narayan, Shashi
and Perez-Beltrachini, Laura",
title = "Creating Training Corpora for NLG Micro-Planners",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "179--188",
location = "Vancouver, Canada",
doi = "10.18653/v1/P17-1017",
url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_auto_asset_turk
कॉन्फिग विवरण : विकीऑटो वाक्य सरलीकरण प्रणालियों को प्रशिक्षित करने के लिए एक संसाधन के रूप में अंग्रेजी विकिपीडिया और सरल अंग्रेजी विकिपीडिया से संरेखित वाक्यों का एक सेट प्रदान करता है। ASSET और TURK उच्च गुणवत्ता वाले सरलीकरण डेटासेट हैं जिनका उपयोग परीक्षण के लिए किया जाता है।
डाउनलोड आकार :
121.01 MiB
डेटासेट का आकार :
202.40 MiB
Auto-cached ( documentation ): Yes (challenge_test_asset_backtranslation, challenge_test_asset_bfp02, challenge_test_asset_bfp05, challenge_test_asset_nopunc, challenge_test_turk_backtranslation, challenge_test_turk_bfp02, challenge_test_turk_bfp05, challenge_test_turk_nopunc, challenge_train_sample, challenge_validation_sample, test_asset, test_turk, validation), Only when
shuffle_files=False
(train)विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_asset_backtranslation' | 359 |
'challenge_test_asset_bfp02' | 359 |
'challenge_test_asset_bfp05' | 359 |
'challenge_test_asset_nopunc' | 359 |
'challenge_test_turk_backtranslation' | 359 |
'challenge_test_turk_bfp02' | 359 |
'challenge_test_turk_bfp05' | 359 |
'challenge_test_turk_nopunc' | 359 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test_asset' | 359 |
'test_turk' | 359 |
'train' | 483,801 |
'validation' | 20,000 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'target': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
लक्ष्य | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{jiang-etal-2020-neural,
title = "Neural {CRF} Model for Sentence Alignment in Text Simplification",
author = "Jiang, Chao and
Maddela, Mounica and
Lan, Wuwei and
Zhong, Yang and
Xu, Wei",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.709",
doi = "10.18653/v1/2020.acl-main.709",
pages = "7943--7960",
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/xsum
कॉन्फ़िग विवरण : डेटासेट अपने चरम रूप में अमूर्त संक्षेपण के कार्य के लिए है, यह एक दस्तावेज़ को एक वाक्य में सारांशित करने के बारे में है।
डाउनलोड का आकार :
246.31 MiB
डेटासेट का आकार :
78.89 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'challenge_test_backtranslation' | 500 |
'challenge_test_bfp_02' | 500 |
'challenge_test_bfp_05' | 500 |
'challenge_test_covid' | 401 |
'challenge_test_nopunc' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 1,166 |
'train' | 23,206 |
'validation' | 1,117 |
- फ़ीचर संरचना :
FeaturesDict({
'document': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'xsum_id': string,
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
दस्तावेज़ | टेन्सर | डोरी | ||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
लक्ष्य | टेन्सर | डोरी | ||
xsum_id | टेन्सर | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{Narayan2018dont,
author = "Shashi Narayan and Shay B. Cohen and Mirella Lapata",
title = "Don't Give Me the Details, Just the Summary! {T}opic-Aware Convolutional Neural Networks for Extreme Summarization",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ",
year = "2018",
address = "Brussels, Belgium",
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_arabic_ar
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
56.25 MiB
डेटासेट का आकार :
291.42 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 5,841 |
'train' | 20,441 |
'validation' | 2,919 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'ar': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'ar': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/एआर | मूलपाठ | डोरी | ||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
target_aligned/ar | मूलपाठ | डोरी | ||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_chinese_zh
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
31.38 MiB
डेटासेट का आकार :
122.06 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 3,775 |
'train' | 13,211 |
'validation' | 1,886 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'zh': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'zh': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_गठबंधन/zh | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/zh | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_czech_cs
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
13.84 MiB
डेटासेट का आकार :
58.05 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 1,438 |
'train' | 5,033 |
'validation' | 718 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'cs': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'cs': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/cs | मूलपाठ | डोरी | ||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
target_aligned/cs | मूलपाठ | डोरी | ||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_dutch_nl
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
53.88 MiB
डेटासेट का आकार :
237.97 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ (परीक्षण, सत्यापन), केवल जब
shuffle_files=False
(ट्रेन)विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 6,248 |
'train' | 21,866 |
'validation' | 3,123 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'nl': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'nl': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/nl | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/nl | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_english_en
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
112.56 MiB
डेटासेट का आकार :
657.51 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 28,614 |
'train' | 99,020 |
'validation' | 13,823 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_french_fr
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
113.26 MiB
डेटासेट का आकार :
522.28 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 12,731 |
'train' | 44,556 |
'validation' | 6,364 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/fr | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/fr | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
रत्न/wiki_lingua_german_de
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड का आकार :
102.65 MiB
डेटासेट का आकार :
452.46 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 11,669 |
'train' | 40,839 |
'validation' | 5,833 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'de': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'de': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/डी | मूलपाठ | डोरी | ||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
target_aligned/de | मूलपाठ | डोरी | ||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_hindi_hi
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
20.07 MiB
डेटासेट का आकार :
138.06 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 1,984 |
'train' | 6,942 |
'validation' | 991 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'hi': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'hi': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/हाय | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/हाय | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_indonesian_id
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
80.08 MiB
डेटासेट का आकार :
370.63 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 9,497 |
'train' | 33,237 |
'validation' | 4,747 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/आईडी | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/id | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_italian_it
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
84.80 MiB
डेटासेट का आकार :
374.40 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 10,189 |
'train' | 35,661 |
'validation' | 5,093 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'it': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'it': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/it | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/it | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_japanese_ja
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
21.75 MiB
डेटासेट का आकार :
103.19 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 2,530 |
'train' | 8,853 |
'validation' | 1,264 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ja': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ja': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/जा | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/ja | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_korean_ko
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
22.26 MiB
डेटासेट का आकार :
102.35 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 2,436 |
'train' | 8,524 |
'validation' | 1,216 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ko': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ko': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/ko | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/ko | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_portuguese_pt
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
131.17 MiB
डेटासेट का आकार :
570.46 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 16,331 |
'train' | 57,159 |
'validation' | 8,165 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/पीटी | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/pt | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_russian_ru
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
101.36 MiB
डेटासेट का आकार :
564.69 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 10,580 |
'train' | 37,028 |
'validation' | 5,288 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_गठबंधन/आरयू | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/ru | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_spanish_es
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
189.06 MiB
डेटासेट का आकार :
849.75 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): नहीं
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 22,632 |
'train' | 79,212 |
'validation' | 11,316 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/es | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/es | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
रत्न/wiki_lingua_thai_th
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
28.60 MiB
डेटासेट का आकार :
193.77 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ (परीक्षण, सत्यापन), केवल जब
shuffle_files=False
(ट्रेन)विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 2,950 |
'train' | 10,325 |
'validation' | 1,475 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'th': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'th': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/वें | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/th | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_turkish_tr
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
6.73 MiB
डेटासेट का आकार :
30.75 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 900 |
'train' | 3,148 |
'validation' | 449 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'tr': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'tr': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/tr | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/tr | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
मणि/wiki_lingua_vietnamese_vi
कॉन्फिग विवरण : विकिलिंगुआ क्रॉस-लिंगुअल एब्सट्रैक्टिव सारांश सिस्टम के मूल्यांकन के लिए एक बड़े पैमाने पर, बहुभाषी डेटासेट है।
डाउनलोड आकार :
36.27 MiB
डेटासेट का आकार :
179.77 MiB
ऑटो-कैश्ड ( दस्तावेज़ीकरण ): हाँ
विभाजन :
विभाजित करना | उदाहरण |
---|---|
'test' | 3,917 |
'train' | 13,707 |
'validation' | 1,957 |
- फ़ीचर संरचना :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'vi': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'vi': Text(shape=(), dtype=string),
}),
})
- फ़ीचर दस्तावेज़ीकरण :
विशेषता | कक्षा | आकार | डीटाइप | विवरण |
---|---|---|---|---|
विशेषताएं डिक्ट | ||||
Gem_id | टेन्सर | डोरी | ||
Gem_parent_id | टेन्सर | डोरी | ||
संदर्भ | अनुक्रम (टेंसर) | (कोई भी नहीं,) | डोरी | |
स्रोत | टेन्सर | डोरी | ||
स्रोत_गठबंधन | अनुवाद | |||
स्रोत_संरेखित/hi | मूलपाठ | डोरी | ||
स्रोत_संरेखित/vi | मूलपाठ | डोरी | ||
लक्ष्य | टेन्सर | डोरी | ||
target_aligned | अनुवाद | |||
लक्ष्य_संरेखित/hi | मूलपाठ | डोरी | ||
target_aligned/vi | मूलपाठ | डोरी |
- उदाहरण ( tfds.as_dataframe ):
- उद्धरण :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."