- Açıklama :
GEM , hem insan ek açıklamaları hem de otomatik Metrikler yoluyla Değerlendirmeye odaklanan Doğal Dil Üretimi için bir kıyaslama ortamıdır.
GEM şunları amaçlamaktadır: (1) birçok NLG görevini ve dilini kapsayan 13 veri setinde NLG ilerlemesini ölçmek. (2) veri ifadeleri ve zorluk setleri aracılığıyla sunulan verilerin ve modellerin derinlemesine bir analizini sağlar. (3) hem otomatik hem de insan ölçümlerini kullanarak oluşturulan metnin değerlendirilmesi için standartlar geliştirmek.
Daha fazla bilgi https://gem-benchmark.com adresinde bulunabilir.
Ana Sayfa : https://gem-benchmark.com
Kaynak kodu :
tfds.text.gem.Gem
sürümler :
-
1.0.0
: İlk sürüm -
1.0.1
: MLSum için hatalı bağlantı filtresini güncelleyin -
1.1.0
(varsayılan): Mücadele Setlerinin Yayınlanması
-
Denetlenen anahtarlar (Bkz
as_supervised
doc ):None
Şekil ( tfds.show_examples ): Desteklenmiyor.
gem/common_gen (varsayılan yapılandırma)
Yapılandırma açıklaması : CommonGen, makineleri üretken sağduyulu muhakeme yeteneği açısından açık bir şekilde test etmek için bir kıyaslama veri kümesiyle ilişkilendirilmiş kısıtlı bir metin oluşturma görevidir. Bir dizi ortak kavram verildiğinde; görev, bu kavramları kullanarak günlük bir senaryoyu açıklayan tutarlı bir cümle oluşturmaktır.
İndirme boyutu :
1.84 MiB
Veri kümesi boyutu :
16.84 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 1.497 |
'train' | 67.389 |
'validation' | 993 |
- Özellik yapısı :
FeaturesDict({
'concept_set_id': int32,
'concepts': Sequence(string),
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
concept_set_id | tensör | int32 | ||
kavramlar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{lin2020commongen,
title = "CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning",
author = "Lin, Bill Yuchen and
Zhou, Wangchunshu and
Shen, Ming and
Zhou, Pei and
Bhagavatula, Chandra and
Choi, Yejin and
Ren, Xiang",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.165",
pages = "1823--1840",
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
mücevher/cs_restoranlar
Yapılandırma açıklaması : Görev, restoranlar hakkında bilgi sağlayan (varsayımsal) bir diyalog sistemi bağlamında yanıtlar üretmektir. Girdi, temel bir niyet/diyalog eylemi türü ve bir yuvalar (öznitelikler) listesi ve değerleridir. Çıktı, bir doğal dil cümlesidir.
İndirme boyutu :
1.46 MiB
Veri kümesi boyutu :
2.71 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 842 |
'train' | 3.569 |
'validation' | 781 |
- Özellik yapısı :
FeaturesDict({
'dialog_act': string,
'dialog_act_delexicalized': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'target_delexicalized': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
dialog_act | tensör | sicim | ||
dialog_act_delexicalized | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim | ||
target_delexicalized | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{cs_restaurants,
address = {Tokyo, Japan},
title = {Neural {Generation} for {Czech}: {Data} and {Baselines} },
shorttitle = {Neural {Generation} for {Czech} },
url = {https://www.aclweb.org/anthology/W19-8670/},
urldate = {2019-10-18},
booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
author = {Dušek, Ondřej and Jurčíček, Filip},
month = oct,
year = {2019},
pages = {563--574}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
mücevher/dart
Yapılandırma açıklaması : DART, her bir girdinin ağaç yapılı bir ontolojiyi izleyen bir varlık-ilişki üçlüsü kümesi olduğu, yüksek kaliteli cümle ek açıklamalarına sahip, büyük ve açık alan yapılı bir Veri Kaydından Metne oluşturma külliyatıdır.
İndirme boyutu :
28.01 MiB
Veri kümesi boyutu :
33.78 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 6.959 |
'train' | 62.659 |
'validation' | 2.768 |
- Özellik yapısı :
FeaturesDict({
'dart_id': int32,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'subtree_was_extended': bool,
'target': string,
'target_sources': Sequence(string),
'tripleset': Sequence(string),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
dart_id | tensör | int32 | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
subtree_was_extended | tensör | bool | ||
hedef | tensör | sicim | ||
hedef_kaynaklar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
üçlü takım | Sıra(Tensor) | (Hiçbiri,) | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@article{radev2020dart,
title=Dart: Open-domain structured data record to text generation,
author={Radev, Dragomir and Zhang, Rui and Rau, Amrit and Sivaprasad, Abhinand and Hsieh, Chiachun and Rajani, Nazneen Fatema and Tang, Xiangru and Vyas, Aadit and Verma, Neha and Krishna, Pranav and others},
journal={arXiv preprint arXiv:2007.02871},
year={2020}
}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
mücevher/e2e_nlg
Yapılandırma açıklaması : E2E veri kümesi, sınırlı alanlı bir veriden metne dönüştürme görevi için tasarlanmıştır -- 8 adede kadar farklı özniteliğe (ad, bölge, fiyat aralığı vb.) dayalı restoran açıklamaları/önerileri oluşturma.
İndirme boyutu :
13.99 MiB
Veri kümesi boyutu :
16.92 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 4.693 |
'train' | 33.525 |
'validation' | 4.299 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'meaning_representation': string,
'references': Sequence(string),
'target': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
anlam_temsil | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{e2e_cleaned,
address = {Tokyo, Japan},
title = {Semantic {Noise} {Matters} for {Neural} {Natural} {Language} {Generation} },
url = {https://www.aclweb.org/anthology/W19-8652/},
booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
author = {Dušek, Ondřej and Howcroft, David M and Rieser, Verena},
year = {2019},
pages = {421--426},
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/mlsum_de
Yapılandırma açıklaması : MLSum, büyük ölçekli çok dilli bir özetleme veri kümesidir. Çevrimiçi haber kaynaklarından oluşturulmuştur, bu bölüm Almanca'ya odaklanmaktadır.
İndirme boyutu :
345.98 MiB
Veri kümesi boyutu :
963.60 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_covid' | 5.058 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 10.695 |
'train' | 220.748 |
'validation' | 11.392 |
- Özellik yapısı :
FeaturesDict({
'date': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'text': string,
'title': string,
'topic': string,
'url': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
tarih | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim | ||
Metin | tensör | sicim | ||
Başlık | tensör | sicim | ||
başlık | tensör | sicim | ||
url | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{scialom-etal-2020-mlsum,
title = "{MLSUM}: The Multilingual Summarization Corpus",
author = {Scialom, Thomas and Dray, Paul-Alexis and Lamprier, Sylvain and Piwowarski, Benjamin and Staiano, Jacopo},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/mlsum_es
Yapılandırma açıklaması : MLSum, büyük ölçekli çok dilli bir özetleme veri kümesidir. Çevrimiçi haber kaynaklarından oluşturulmuştur, bu bölüm İspanyolca'ya odaklanmaktadır.
İndirme boyutu :
501.27 MiB
Veri kümesi boyutu :
1.29 GiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_covid' | 1.938 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 13.366 |
'train' | 259.888 |
'validation' | 9.977 |
- Özellik yapısı :
FeaturesDict({
'date': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'text': string,
'title': string,
'topic': string,
'url': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
tarih | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim | ||
Metin | tensör | sicim | ||
Başlık | tensör | sicim | ||
başlık | tensör | sicim | ||
url | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{scialom-etal-2020-mlsum,
title = "{MLSUM}: The Multilingual Summarization Corpus",
author = {Scialom, Thomas and Dray, Paul-Alexis and Lamprier, Sylvain and Piwowarski, Benjamin and Staiano, Jacopo},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/schema_guided_dialog
Yapılandırma açıklaması : Şema Kılavuzlu Diyalog (SGD) veri kümesi, bankalardan etkinliklere, medyadan takvime, seyahate ve hava durumuna kadar 17 alanı kapsayan, bir insan ile bir sanal asistan arasında 18K çok alanlı, göreve yönelik diyaloglar içerir.
İndirme boyutu :
17.00 MiB
Veri kümesi boyutu :
201.19 MiB
Otomatik önbelleğe alma ( belgeler ): Evet (challenge_test_backtranslation, challenge_test_bfp02, challenge_test_bfp05, challenge_test_nopunc, challenge_test_scramble, challenge_train_sample, challenge_validation_sample, test, doğrulama), Yalnızca
shuffle_files=False
(tren) olduğundabölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_backtranslation' | 500 |
'challenge_test_bfp02' | 500 |
'challenge_test_bfp05' | 500 |
'challenge_test_nopunc' | 500 |
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 10.000 |
'train' | 164.982 |
'validation' | 10.000 |
- Özellik yapısı :
FeaturesDict({
'context': Sequence(string),
'dialog_acts': Sequence({
'act': ClassLabel(shape=(), dtype=int64, num_classes=18),
'slot': string,
'values': Sequence(string),
}),
'dialog_id': string,
'gem_id': string,
'gem_parent_id': string,
'prompt': string,
'references': Sequence(string),
'service': string,
'target': string,
'turn_id': int32,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
bağlam | Sıra(Tensor) | (Hiçbiri,) | sicim | |
dialog_acts | Sekans | |||
dialog_acts/eylem | SınıfEtiketi | int64 | ||
dialog_acts/yuva | tensör | sicim | ||
dialog_acts/değerler | Sıra(Tensor) | (Hiçbiri,) | sicim | |
dialog_id | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
çabuk | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hizmet | tensör | sicim | ||
hedef | tensör | sicim | ||
turn_id | tensör | int32 |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@article{rastogi2019towards,
title={Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset},
author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
journal={arXiv preprint arXiv:1909.05855},
year={2019}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
mücevher / toto
Yapılandırma açıklaması : ToTTo, Tablodan Metne NLG görevidir. Görev şu şekildedir: Bir hücre alt kümesi vurgulanmış olarak satır adları, sütun adları ve tablo hücreleri içeren bir Wikipedia tablosu verildiğinde, tablonun vurgulanan kısmı için bir doğal dil açıklaması oluşturun.
İndirme boyutu :
180.75 MiB
Veri kümesi boyutu :
645.86 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 7.700 |
'train' | 121.153 |
'validation' | 7.700 |
- Özellik yapısı :
FeaturesDict({
'example_id': string,
'gem_id': string,
'gem_parent_id': string,
'highlighted_cells': Sequence(Sequence(int32)),
'overlap_subset': string,
'references': Sequence(string),
'sentence_annotations': Sequence({
'final_sentence': string,
'original_sentence': string,
'sentence_after_ambiguity': string,
'sentence_after_deletion': string,
}),
'table': Sequence(Sequence({
'column_span': int32,
'is_header': bool,
'row_span': int32,
'value': string,
})),
'table_page_title': string,
'table_section_text': string,
'table_section_title': string,
'table_webpage_url': string,
'target': string,
'totto_id': int32,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
örnek_id | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
vurgulanan_hücreler | Dizi(Dizi(Tensor)) | (Yok, Yok) | int32 | |
üst üste binme_altkümesi | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
cümle_annotations | Sekans | |||
cümle_annotations/son_sentence | tensör | sicim | ||
cümle_annotations/original_sentence | tensör | sicim | ||
cümle_annotations/sentence_after_ambiguity | tensör | sicim | ||
cümle_annotations/sentence_after_deletion | tensör | sicim | ||
masa | Sekans | |||
tablo/sütun_span | tensör | int32 | ||
tablo/is_header | tensör | bool | ||
tablo/satır_span | tensör | int32 | ||
tablo/değer | tensör | sicim | ||
tablo_sayfası_başlığı | tensör | sicim | ||
tablo_bölümü_metni | tensör | sicim | ||
tablo_bölümü_başlığı | tensör | sicim | ||
tablo_web sayfası_url | tensör | sicim | ||
hedef | tensör | sicim | ||
totto_id | tensör | int32 |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{parikh2020totto,
title=ToTTo: A Controlled Table-To-Text Generation Dataset,
author={Parikh, Ankur and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages={1173--1186},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/web_nlg_tr
Yapılandırma açıklaması : WebNLG, paralel DBpedia üçlü kümelerinden ve yaklaşık 450 farklı DBpedia özelliğini kapsayan kısa metinlerden oluşan iki dilli bir veri kümesidir (İngilizce, Rusça). WebNLG verileri başlangıçta, kısa metin oluşturabilen ve mikro planlamayı gerçekleştirebilen RDF sözelleştiricilerinin geliştirilmesini desteklemek için oluşturulmuştur.
İndirme boyutu :
12.57 MiB
Veri kümesi boyutu :
19.91 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_numbers' | 500 |
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 502 |
'challenge_validation_sample' | 499 |
'test' | 1.779 |
'train' | 35.426 |
'validation' | 1.667 |
- Özellik yapısı :
FeaturesDict({
'category': string,
'gem_id': string,
'gem_parent_id': string,
'input': Sequence(string),
'references': Sequence(string),
'target': string,
'webnlg_id': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
kategori | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
giriş | Sıra(Tensor) | (Hiçbiri,) | sicim | |
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim | ||
webnlg_id | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{gardent2017creating,
author = "Gardent, Claire
and Shimorina, Anastasia
and Narayan, Shashi
and Perez-Beltrachini, Laura",
title = "Creating Training Corpora for NLG Micro-Planners",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "179--188",
location = "Vancouver, Canada",
doi = "10.18653/v1/P17-1017",
url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/web_nlg_ru
Yapılandırma açıklaması : WebNLG, paralel DBpedia üçlü kümelerinden ve yaklaşık 450 farklı DBpedia özelliğini kapsayan kısa metinlerden oluşan iki dilli bir veri kümesidir (İngilizce, Rusça). WebNLG verileri başlangıçta, kısa metin oluşturabilen ve mikro planlamayı gerçekleştirebilen RDF sözelleştiricilerinin geliştirilmesini desteklemek için oluşturulmuştur.
İndirme boyutu :
7.49 MiB
Veri kümesi boyutu :
11.30 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_scramble' | 500 |
'challenge_train_sample' | 501 |
'challenge_validation_sample' | 500 |
'test' | 1.102 |
'train' | 14.630 |
'validation' | 790 |
- Özellik yapısı :
FeaturesDict({
'category': string,
'gem_id': string,
'gem_parent_id': string,
'input': Sequence(string),
'references': Sequence(string),
'target': string,
'webnlg_id': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
kategori | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
giriş | Sıra(Tensor) | (Hiçbiri,) | sicim | |
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim | ||
webnlg_id | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{gardent2017creating,
author = "Gardent, Claire
and Shimorina, Anastasia
and Narayan, Shashi
and Perez-Beltrachini, Laura",
title = "Creating Training Corpora for NLG Micro-Planners",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "179--188",
location = "Vancouver, Canada",
doi = "10.18653/v1/P17-1017",
url = "http://www.aclweb.org/anthology/P17-1017"
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_auto_asset_turk
Yapılandırma açıklaması : WikiAuto, cümle basitleştirme sistemlerini eğitmek için bir kaynak olarak İngilizce Wikipedia ve Basit İngilizce Wikipedia'dan bir dizi hizalanmış cümle sağlar. ASSET ve TURK, test için kullanılan yüksek kaliteli basitleştirme veri kümeleridir.
İndirme boyutu :
121.01 MiB
Veri kümesi boyutu :
202.40 MiB
Auto-cached ( documentation ): Yes (challenge_test_asset_backtranslation, challenge_test_asset_bfp02, challenge_test_asset_bfp05, challenge_test_asset_nopunc, challenge_test_turk_backtranslation, challenge_test_turk_bfp02, challenge_test_turk_bfp05, challenge_test_turk_nopunc, challenge_train_sample, challenge_validation_sample, test_asset, test_turk, validation), Only when
shuffle_files=False
(train)bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_asset_backtranslation' | 359 |
'challenge_test_asset_bfp02' | 359 |
'challenge_test_asset_bfp05' | 359 |
'challenge_test_asset_nopunc' | 359 |
'challenge_test_turk_backtranslation' | 359 |
'challenge_test_turk_bfp02' | 359 |
'challenge_test_turk_bfp05' | 359 |
'challenge_test_turk_nopunc' | 359 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test_asset' | 359 |
'test_turk' | 359 |
'train' | 483.801 |
'validation' | 20.000 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'target': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
hedef | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{jiang-etal-2020-neural,
title = "Neural {CRF} Model for Sentence Alignment in Text Simplification",
author = "Jiang, Chao and
Maddela, Mounica and
Lan, Wuwei and
Zhong, Yang and
Xu, Wei",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.709",
doi = "10.18653/v1/2020.acl-main.709",
pages = "7943--7960",
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
mücevher/xsum
Yapılandırma açıklaması : Veri kümesi, en uç haliyle soyutlayıcı özetleme görevi içindir, bir belgeyi tek bir cümlede özetlemekle ilgilidir.
İndirme boyutu :
246.31 MiB
Veri kümesi boyutu :
78.89 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'challenge_test_backtranslation' | 500 |
'challenge_test_bfp_02' | 500 |
'challenge_test_bfp_05' | 500 |
'challenge_test_covid' | 401 |
'challenge_test_nopunc' | 500 |
'challenge_train_sample' | 500 |
'challenge_validation_sample' | 500 |
'test' | 1.166 |
'train' | 23.206 |
'validation' | 1.117 |
- Özellik yapısı :
FeaturesDict({
'document': string,
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'target': string,
'xsum_id': string,
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
belge | tensör | sicim | ||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
hedef | tensör | sicim | ||
xsum_id | tensör | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{Narayan2018dont,
author = "Shashi Narayan and Shay B. Cohen and Mirella Lapata",
title = "Don't Give Me the Details, Just the Summary! {T}opic-Aware Convolutional Neural Networks for Extreme Summarization",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ",
year = "2018",
address = "Brussels, Belgium",
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_arabic_ar
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
56.25 MiB
Veri kümesi boyutu :
291.42 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 5.841 |
'train' | 20.441 |
'validation' | 2.919 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'ar': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'ar': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
kaynak_hizalı/ar | Metin | sicim | ||
source_aligned/tr | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_hizalı/ar | Metin | sicim | ||
hedef_aligned/tr | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_chinese_zh
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
31.38 MiB
Veri kümesi boyutu :
122.06 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 3.775 |
'train' | 13.211 |
'validation' | 1.886 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'zh': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'zh': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_aligned/zh | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_aligned/zh | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_czech_cs
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
13.84 MiB
Veri kümesi boyutu :
58.05 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 1.438 |
'train' | 5.033 |
'validation' | 718 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'cs': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'cs': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
kaynak_aligned/cs | Metin | sicim | ||
source_aligned/tr | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_hizalı/cs | Metin | sicim | ||
hedef_aligned/tr | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_dutch_nl
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
53.88 MiB
Veri kümesi boyutu :
237.97 MiB
Otomatik önbelleğe alınmış ( belgeler ): Evet (test, doğrulama), Yalnızca
shuffle_files=False
(tren) olduğundabölmeler :
Bölmek | örnekler |
---|---|
'test' | 6.248 |
'train' | 21.866 |
'validation' | 3.123 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'nl': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'nl': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_aligned/nl | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/nl | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_english_en
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
112.56 MiB
Veri kümesi boyutu :
657.51 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 28.614 |
'train' | 99.020 |
'validation' | 13.823 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_french_fr
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
113.26 MiB
Veri kümesi boyutu :
522.28 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 12.731 |
'train' | 44.556 |
'validation' | 6.364 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalanmış/fr | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/fr | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_german_de
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
102.65 MiB
Veri kümesi boyutu :
452.46 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 11.669 |
'train' | 40.839 |
'validation' | 5.833 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'de': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'de': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
kaynak_aligned/de | Metin | sicim | ||
source_aligned/tr | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_hizalı/de | Metin | sicim | ||
hedef_aligned/tr | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_hindi_hi
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
20.07 MiB
Veri kümesi boyutu :
138.06 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 1.984 |
'train' | 6.942 |
'validation' | 991 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'hi': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'hi': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
source_aligned/merhaba | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
target_aligned/merhaba | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_indonesian_id
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
80.08 MiB
Veri kümesi boyutu :
370.63 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 9.497 |
'train' | 33.237 |
'validation' | 4.747 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalı/kimlik | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/kimlik | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_italian_it
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
84.80 MiB
Veri kümesi boyutu :
374.40 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 10.189 |
'train' | 35.661 |
'validation' | 5.093 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'it': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'it': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_aligned/it | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/it | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_japanese_ja
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
21.75 MiB
Veri kümesi boyutu :
103.19 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 2.530 |
'train' | 8.853 |
'validation' | 1.264 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ja': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ja': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalanmış/ja | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/ja | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_korean_ko
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
22.26 MiB
Veri kümesi boyutu :
102.35 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 2.436 |
'train' | 8.524 |
'validation' | 1.216 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ko': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ko': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
source_aligned/ko | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
target_aligned/ko | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_portuguese_pt
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
131.17 MiB
Veri kümesi boyutu :
570.46 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 16.331 |
'train' | 57.159 |
'validation' | 8.165 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalı/pt | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/nokta | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
mücevher/wiki_lingua_russian_ru
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
101.36 MiB
Veri kümesi boyutu :
564.69 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 10.580 |
'train' | 37.028 |
'validation' | 5.288 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_aligned/ru | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/ru | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_spanish_es
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
189.06 MiB
Veri kümesi boyutu :
849.75 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Hayır
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 22.632 |
'train' | 79.212 |
'validation' | 11.316 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_aligned/es | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_aligned/es | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_thai_th
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
28.60 MiB
Veri kümesi boyutu :
193.77 MiB
Otomatik önbelleğe alınmış ( belgeler ): Evet (test, doğrulama), Yalnızca
shuffle_files=False
(tren) olduğundabölmeler :
Bölmek | örnekler |
---|---|
'test' | 2.950 |
'train' | 10.325 |
'validation' | 1.475 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'th': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'th': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalı/th | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/th | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_turkish_tr
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
6.73 MiB
Veri kümesi boyutu :
30.75 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 900 |
'train' | 3.148 |
'validation' | 449 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'tr': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'tr': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalı/tr | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/tr | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."
gem/wiki_lingua_vietnamese_vi
Yapılandırma açıklaması : Wikilingua, diller arası soyutlayıcı özetleme sistemlerinin değerlendirilmesi için büyük ölçekli, çok dilli bir veri kümesidir.
İndirme boyutu :
36.27 MiB
Veri kümesi boyutu :
179.77 MiB
Otomatik önbelleğe alınmış ( belgeleme ): Evet
bölmeler :
Bölmek | örnekler |
---|---|
'test' | 3.917 |
'train' | 13.707 |
'validation' | 1.957 |
- Özellik yapısı :
FeaturesDict({
'gem_id': string,
'gem_parent_id': string,
'references': Sequence(string),
'source': string,
'source_aligned': Translation({
'en': Text(shape=(), dtype=string),
'vi': Text(shape=(), dtype=string),
}),
'target': string,
'target_aligned': Translation({
'en': Text(shape=(), dtype=string),
'vi': Text(shape=(), dtype=string),
}),
})
- Özellik belgeleri :
Özellik | Sınıf | Şekil | Dtipi | Tanım |
---|---|---|---|---|
ÖzelliklerDict | ||||
gem_id | tensör | sicim | ||
gem_parent_id | tensör | sicim | ||
Referanslar | Sıra(Tensor) | (Hiçbiri,) | sicim | |
kaynak | tensör | sicim | ||
kaynak_hizalı | Tercüme | |||
source_aligned/tr | Metin | sicim | ||
kaynak_hizalı/vi | Metin | sicim | ||
hedef | tensör | sicim | ||
hedef_hizalı | Tercüme | |||
hedef_aligned/tr | Metin | sicim | ||
hedef_hizalı/vi | Metin | sicim |
- Örnekler ( tfds.as_dataframe ):
- Alıntı :
@inproceedings{ladhak-wiki-2020,
title=WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization,
author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
booktitle={Findings of EMNLP, 2020},
year={2020}
}
@article{gehrmann2021gem,
author = {Sebastian Gehrmann and
Tosin P. Adewumi and
Karmanya Aggarwal and
Pawan Sasanka Ammanamanchi and
Aremu Anuoluwapo and
Antoine Bosselut and
Khyathi Raghavi Chandu and
Miruna{-}Adriana Clinciu and
Dipanjan Das and
Kaustubh D. Dhole and
Wanyu Du and
Esin Durmus and
Ondrej Dusek and
Chris Emezue and
Varun Gangal and
Cristina Garbacea and
Tatsunori Hashimoto and
Yufang Hou and
Yacine Jernite and
Harsh Jhamtani and
Yangfeng Ji and
Shailza Jolly and
Dhruv Kumar and
Faisal Ladhak and
Aman Madaan and
Mounica Maddela and
Khyati Mahajan and
Saad Mahamood and
Bodhisattwa Prasad Majumder and
Pedro Henrique Martins and
Angelina McMillan{-}Major and
Simon Mille and
Emiel van Miltenburg and
Moin Nadeem and
Shashi Narayan and
Vitaly Nikolaev and
Rubungo Andre Niyongabo and
Salomey Osei and
Ankur P. Parikh and
Laura Perez{-}Beltrachini and
Niranjan Ramesh Rao and
Vikas Raunak and
Juan Diego Rodriguez and
Sashank Santhanam and
Jo{\~{a} }o Sedoc and
Thibault Sellam and
Samira Shaikh and
Anastasia Shimorina and
Marco Antonio Sobrevilla Cabezudo and
Hendrik Strobelt and
Nishant Subramani and
Wei Xu and
Diyi Yang and
Akhila Yerukola and
Jiawei Zhou},
title = {The {GEM} Benchmark: Natural Language Generation, its Evaluation and
Metrics},
journal = {CoRR},
volume = {abs/2102.01672},
year = {2021},
url = {https://arxiv.org/abs/2102.01672},
archivePrefix = {arXiv},
eprint = {2102.01672}
}
Note that each GEM dataset has its own citation. Please see the source to see
the correct citation for each contained dataset."