Huggingface telah bercabang TFDS dan menyediakan banyak kumpulan data teks. Lihat di sini untuk dokumentasi lebih lanjut. Selanjutnya Anda dapat menemukan daftar semua kumpulan data yang dapat digunakan dengan TFDS.
- akronim_identifikasi
- ade_corpus_v2
- adv_lem
- adversarial_qa
- aeslc
- afrikaans_ner_corpus
- ag_news
- ai2_arc
- air_dialogue
- ajgt_twitter_ar
- allegro_reviews
- alocine
- alternatif
- amazon_polaritas
- amazon_reviews_multi
- amazon_us_reviews
- ambig_qa
- amerika_nli
- ami
- amttl
- sebelum
- app_reviews
- aqua_rat
- air laut
- ar_cov19
- ar_res_reviews
- ar_sarkasme
- arab_billion_words
- arab_pos_dialek
- arab_speech_corpus
- arcd
- arsend_lev
- seni
- arxiv_dataset
- pendakian_kb
- aslg_pc12
- asnq
- aset
- assin
- assin2
- atom
- autshumato
- babi_qa
- perbankan77
- bbaw_egyptian
- bbc_hindi_nli
- bc2gm_corpus
- kacang polong
- terbaik2009
- binet
- alkitab_para
- paten_besar
- bangku besar
- billum
- bing_coronavirus_query_set
- biomrc
- bios
- biwi_kinect_head_pose
- buku bl
- blbooksgenre
- blended_skill_talk
- orang canggung
- blog_authorship_corpus
- bn_hate_speech
- bnl_koran
- kumpulan buku
- badan buku terbuka
- boolq
- bprec
- break_data
- brwac
- bsd_ja_en
- bswac
- c3
- c4
- cail2018
- kanker
- jubah
- kasino
- catalonia_kemerdekaan
- kucing_vs_anjing
- cawac
- cbt
- cc100
- cc_news
- ccaligned_multibahasa
- cdsc
- cdt
- pohon cemara
- cfq
- chr_en
- cifar10
- cifar100
- sekitar
- sipil_komentar
- clickbait_news_bg
- iklim_demam
- clinc_oos
- petunjuk
- cmrc2018
- cmu_hinglish_dog
- cnn_dailymail
- coached_conv_pref
- wacana_kasar
- codah
- code_search_net
- code_x_glue_cc_clone_detection_big_clone_bench
- code_x_glue_cc_clone_detection_poj104
- code_x_glue_cc_cloze_testing_all
- code_x_glue_cc_cloze_testing_maxmin
- code_x_glue_cc_code_completion_line
- code_x_glue_cc_code_completion_token
- code_x_glue_cc_code_refinement
- code_x_glue_cc_code_to_code_trans
- code_x_glue_cc_defect_detection
- code_x_glue_ct_code_to_text
- code_x_glue_tc_nl_code_search_adv
- code_x_glue_tc_text_to_code
- kode_x_lem_tt_teks_ke_teks
- com_qa
- common_gen
- bahasa umum
- common_voice
- akal sehat_qa
- kompetisi_math
- compgueswhat
- konsepnet5
- konseptual_12m
- konseptual_captions
- kontrol 2000
- kontrol2002
- kontrol2003
- conll2012_ontonotesv5
- conllpp
- keluhan-keuangan-konsumen
- konv_ai
- konv_ai_2
- konv_ai_3
- pertanyaan_konv
- coqa
- kabel19
- cornell_movie_dialog
- cos_e
- kosmos_qa
- menangkal
- covid_qa_castorini
- covid_qa_deepset
- covid_qa_ucsd
- covid_tweets_japanese
- covost2
- cppe-5
- craigslist_bargains
- crawl_domain
- crd3
- kejahatan_dan_hukuman
- gagak_pasangan
- kriptonit
- cs_restoran
- cuad
- keingintahuan_dialog
- daily_dialog
- dane
- komentar_politik_danish
- anak panah
- datacommons_factcheck
- dbpedia_14
- dbrd
- kesepakatan_atau_tidak_dialog
- pasti_pronoun_resolusi
- dengue_filipino
- dialog_re
- diplomasi_deteksi
- bencana_respons_pesan
- diskofuse
- penemuan
- disfl_qa
- doc2dial
- dipercaya
- doqa
- mimpi
- menjatuhkan
- duorc
- belanda_sosial
- tanggul
- e2e_nlg
- e2e_nlg_cleaned
- ecb
- ecthr_cases
- tepi
- ehealth_kd
- eitb_parcc
- listrik_beban_diagram
- eli5
- eli5_kategori
- elkarhizketak
- emea
- emosi
- emosi
- emotone_ar
- empati_dialog
- diperkaya_web_nlg
- enwik8
- penghapus_multi_rc
- bahasa inggris
- eth_py150_buka
- jiwa khas suatu bangsa
- ett
- eu_regulatory_ir
- eurlex
- euronews
- europa_eac_tm
- europa_ecdc_tm
- europarl_bilingual
- event2Mind
- bukti_infer_pengobatan
- ujian
- faktackbr
- fake_news_english
- fake_news_filipino
- farsi_news
- fashion_mnist
- demam
- beberapa_rel
- financial_phrasebank
- lebih halus
- flores
- cerobong
- makanan101
- pasukan
- freebase_qa
- celah
- permata
- generate_reviews_enth
- generik_kb
- pengakuan_badan_hukum_jerman
- orang jerman
- germeval_14
- giga_fren
- gigaword
- glukosa
- lem
- gnad10
- go_emotions
- gooaq
- google_wellformed_query
- grail_qa
- kode_hebat
- kode_hukum_yunani
- gsm8k
- wali_pengarang
- gutenberg_time
- hans
- tangan kanan
- keras
- harem
- has_part
- benci_ofensif
- hat_speech18
- kebencian_ucapan_filipino
- kebencian_ucapan_ofensif
- kebencian_ucapan_pl
- hat_speech_portuguese
- menjelaskan
- hausa_voa_ner
- hausa_voa_topics
- hda_nli_hindi
- kepala_qa
- fakta_kesehatan
- ibrani_projectbenyehuda
- ibrani_sentimen
- ibrani_dunia_ini
- helaswag
- hendrycks_test
- hind_encorp
- hindi_discourse
- hipokorpus
- hkcancor
- hlgd
- harapan_edi
- hotpot_qa
- melayang-layang
- hrenwac_para
- hrwac
- editan lucu
- hybrid_qa
- hyperpartisan_news_detection
- iapp_wiki_qa_squad
- id_clickbait
- id_liputan6
- id_nergrit_corpus
- id_newspapers_2018
- id_panl_bppt
- id_puisi
- igbo_english_machine_translation
- igbo_monolingual
- igbo_ner
- daftar
- imagenet-1k
- imagenet_sketsa
- imdb
- imdb_urdu_reviews
- impres
- indic_lem
- indonli
- indonlu
- ingin tahu_qg
- interpress_news_category_tr
- interpress_news_category_tr_lite
- irc_disentangle
- isixhosa_ner_corpus
- isizulu_ner_corpus
- iwslt2017
- bahaya
- jfleg
- jigsaw_toxicity_pred
- jigsaw_unintended_bias
- jnlpba
- jurnalis_pertanyaan
- kan_harapan
- kannada_news
- kd_conv
- kde4
- kelm
- kilt_tasks
- kilt_wikipedia
- kinnews_kirnews
- petunjuk
- kor_3i4k
- kor_benci
- kor_ner
- kor_nli
- kor_nlu
- kor_qpair
- kor_sae
- kor_sarkasme
- labr
- lama
- lambada
- besar_spanyol_corpus
- laroseda
- lc_quad
- lccc
- lener_br
- lex_lem
- pembohong
- librispeech_asr
- librispeech_lm
- membatasi
- lince
- linnaeus
- liveqa
- lj_speech
- lm1b
- tanggal 20
- m_lama
- mac_morpho
- makhzan
- masakhaner
- kumpulan data_matematika
- math_qa
- matinf
- mbpp
- mc4
- mc_taco
- md_gender_bias
- mdd
- med_hop
- medali
- medis_dialog
- medical_questions_pairs
- medmcqa
- menyo20k_mt
- meta_woz
- metashift
- metooma
- meterc
- miam
- mkb
- mkqa
- mlqa
- mlsum
- menit
- moka
- monash_tsf
- maroko
- movie_rationales
- mrqa
- ms_marco
- ms_terms
- msr_genomics_kbcomp
- msr_sqa
- msr_text_compression
- msr_zhen_translation_parity
- msra_ner
- mt_eng_vietnamese
- banyak sinetron
- multi_booked
- multi_eurlex
- multi_berita
- multi_nli
- multi_nli_ketidakcocokan
- multi_para_crawl
- multi_re_qa
- multi_woz_v22
- multi_x_science_sum
- multidoc2dial
- multibahasa_librispeech
- teman bersama
- mwsc
- myanmar_news
- narasiqa
- narasiqa_manual
- natural_questions
- ncbi_disease
- nchlt
- ncslgr
- nell
- neural_code_search
- berita_komentar
- grup berita
- berita
- newsph_nli
- surat kabar
- newsqa
- ruang wartawan
- nkjp-ner
- nli_tr
- nlu_evaluasi_data
- norec
- norne
- norwegia_ner
- nq_open
- nsmc
- angka_sense
- numeric_fused_head
- oklar
- offcombr
- pelanggaran2020_tr
- offenceval_dravidian
- ofis_publik
- ohsumed
- ollie
- omp
- onestop_english
- onestop_qa
- open_subtitles
- openai_humaneval
- openbookqa
- bukalr
- teks web terbuka
- pendapat
- opus100
- opus_books
- opus_dgt
- opus_dogc
- opus_elhuyar
- opus_euconst
- opus_finlex
- opus_fiskmo
- opus_gnome
- opus_infopankki
- opus_memat
- opus_montenegrinsubs
- opus_openoffice
- opus_paracrawl
- opus_rf
- opus_tedtalks
- opus_ubuntu
- opus_wikipedia
- opus_xhosanavy
- orange_sum
- oscar
- para_crawl
- para_pat
- parsinlu_reading_comprehension
- lulus
- cakar
- cakar-x
- pec
- peer_read
- peoples_daily_ner
- per_sent
- persia_ner
- hal19
- php
- piaf
- pib
- piqa
- pn_summary
- puisi_sentimen
- polemo2
- poleval2019_cyberbullying
- poleval2019_mt
- polsum
- polyglot_ner
- prachathai67k
- pragmaval
- proto_qa
- psc
- ptb_text_only
- dipublikasi
- pubmed_qa
- py_ast
- qa4mre
- qa_srl
- qa_zre
- qangguru
- qant
- qasc
- qasper
- qed
- qed_amara
- quac
- Burung puyuh
- pertengkaran
- kuarsa
- menggambar cepat
- quora
- kuoref
- balapan
- panggil ulang
- penalaran_bg
- resep_nlg
- relor
- red_caps
- reddit_tifu
- refresd
- reuters21578
- riddle_sense
- ro_sent
- ro_sts
- ro_sts_paralel
- roman_urdu
- roman_urdu_hate_speech
- ronec
- tali
- tomat busuk
- rusia_super_lem
- rvl_cdip
- s2orc
- samsum
- sansekerta_klasik
- saudinewsnet
- sberquad
- sbu_captions
- memindai
- scb_mt_enth_2020
- adegan_parse_150
- schema_guided_dstc8
- ilmu pengetahuan
- scielo
- karya tulis ilmiah
- scifact
- ilmu pengetahuan
- scitail
- scitldr
- cari_qa
- sede
- selqa
- sem_eval_2010_task_8
- sem_eval_2014_task_1
- sem_eval_2018_task_1
- sem_eval_2020_task_11
- dikirim_comp
- senti_lex
- senti_ws
- sentimen140
- sepedi_ner
- sesotho_ner_corpus
- waktu
- setwana_ner_corpus
- sharc
- sharc_modified
- sakit
- silikon
- simple_questions_v2
- siswati_ner_corpus
- data pintar
- sms_spam
- snips_built_in_intents
- snli
- snow_simplified_japanese_corpus
- jadi_stacksample
- social_bias_frames
- sosial_i_qa
- sofc_materials_articles
- sogou_news
- spanish_billion_words
- spc
- spesies_800
- ucapan_perintah
- laba-laba
- pasukan
- squad_adversarial
- skuad_es
- skuad_it
- skuad_kor_v1
- skuad_kor_v2
- skuad_v1_pt
- skuad_v2
- pergeseran regu
- srwac
- sst
- stereoset
- cerita_cloze
- stsb_mt_sv
- stsb_multi_mt
- style_change_detection
- subjqa
- lem super
- hebat
- svhn
- barang curian
- bahasa swahili
- swahili_news
- swda
- swedish_medical_ner
- swedia_ner_corpus
- swedia_reviews
- swiss_judgment_prediksi
- tab_fakta
- tamilmixsentiment
- tanzil
- tapaco
- tashkeela
- kepala tugas1
- kepala tugas2
- kepala tugas3
- tatoeba
- ted_hrlr
- ted_iwlst2013
- ted_multi
- ted_talks_iwslt
- telugu_books
- telugu_news
- tep_en_fa_para
- teks2log
- teksvqa
- thai_toxicity_tweet
- lebih tua
- thaiqa_squad
- thailand
- tumpukan_
- the_pile_books3
- the_pile_openwebtext2
- the_pile_stack_exchange
- tilde_model
- time_dial
- times_of_india_news_headlines
- timit_asr
- tiny_shakespeare
- tlc
- tmu_gfm_dataset
- tne
- diberitahu-br
- totto
- perjalanan
- trivia_qa
- jujur_qa
- tsac
- ttc4900
- tunizi
- tuple_ie
- turki
- turkic_xwmt
- turkish_movie_sentiment
- bahasa turki_ner
- turkish_product_reviews
- turkish_shrinked_ner
- turku_ner_corpus
- tweet_eval
- tweet_qa
- tweets_ar_en_parallel
- tweets_hate_speech_detection
- twi_text_c3
- twi_wordsim353
- tydiqa
- ubuntu_dialogs_corpus
- udhr
- um005
- un_ga
- un_multi
- un_pc
- universal_dependencies
- universal_morfologi
- urdu_fake_news
- urdu_sentiment_corpus
- vctk
- visual_genom
- hidup
- web_nlg
- web_of_science
- web_questions
- weibo_ner
- wi_locness
- wide_face
- wiki40b
- wiki_asp
- wiki_atomic_edits
- wiki_auto
- wiki_bio
- wiki_dpr
- wiki_hop
- wiki_lingua
- wiki_movies
- wiki_qa
- wiki_qa_ar
- wiki_snippets
- wiki_source
- wiki_split
- wiki_summary
- wikiann
- wikicorpus
- wikihow
- wikipedia
- wikisql
- pertanyaan yang dapat dibaca di wiki
- wikitext
- wikitext_tl39
- wi_2018
- wino_bias
- winograd_wsc
- winogrande
- wiqa
- kebijaksanaan1000
- perasaan_bijaksana
- wmt14
- wmt15
- wmt16
- wmt17
- wmt18
- wmt19
- wmt20_mlqe_task1
- wmt20_mlqe_task2
- wmt20_mlqe_task3
- wmt_t2t
- wnut_17
- wongnai_reviews
- woz_dialog
- wrbsc
- x_sikap
- xcopa
- xcsr
- xed_en_fi
- lem x
- xnli
- xor_tydi_qa
- segi empat
- xquad_r
- xsum
- xsum_faktualitas
- ekstrim
- yahoo_answers_qa
- yahoo_answers_topics
- yelp_polarity
- yelp_review_full
- yoruba_bbc_topics
- yoruba_gv_ner
- yoruba_text_c3
- yoruba_wordsim353
- youtube_caption_koreksi
- semangat