xtreme_s

  • Description:

FLEURS is the speech version of the FLORES machine translation benchmark, covering 2000 n-way parallel sentences in n=102 languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in “universal” speech representation learning.

In this version, only the FLEURS dataset is provided, which covers speech recognition and speech-to-text translation.

FeaturesDict({
    'audio': Audio(shape=(None,), dtype=int64),
    'gender': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'id': Scalar(shape=(), dtype=int32, description=Source text identifier, consistent across all languages to keep n-way parallelism of translations. Since each transcription may be spoken by multiple speakers, within each language multiple examples will also share the same id.),
    'lang_group_id': ClassLabel(shape=(), dtype=int64, num_classes=7),
    'lang_id': ClassLabel(shape=(), dtype=int64, num_classes=102),
    'language': Text(shape=(), dtype=string),
    'num_samples': Scalar(shape=(), dtype=int32, description=Total number of frames in the audio),
    'path': string,
    'raw_transcription': Text(shape=(), dtype=string),
    'transcription': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
audio Audio (None,) int64
gender ClassLabel int64
id Scalar int32 Source text identifier, consistent across all languages to keep n-way parallelism of translations. Since each transcription may be spoken by multiple speakers, within each language multiple examples will also share the same id.
lang_group_id ClassLabel int64
lang_id ClassLabel int64
language Text string Language encoded as lowercase, underscore-separatedversion of a BCP-47 tag.
num_samples Scalar int32 Total number of frames in the audio
path Tensor string
raw_transcription Text string Raw Transcription from FLoRes.
transcription Text string Normalized transcription.
@article{fleurs2022arxiv,
  title = {FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech},
  author = {Conneau, Alexis and Ma, Min and Khanuja, Simran and Zhang, Yu and Axelrod, Vera and Dalmia, Siddharth and Riesa, Jason and Rivera, Clara and Bapna, Ankur},
  journal={arXiv preprint arXiv:2205.12446},
  url = {https://arxiv.org/abs/2205.12446},
  year = {2022},
}
@article{conneau2022xtreme,
  title={XTREME-S: Evaluating Cross-lingual Speech Representations},
  author={Conneau, Alexis and Bapna, Ankur and Zhang, Yu and Ma, Min and von Platen, Patrick and Lozhkov, Anton and Cherry, Colin and Jia, Ye and Rivera, Clara and Kale, Mihir and others},
  journal={arXiv preprint arXiv:2203.10752},
  year={2022}
}

xtreme_s/fleurs.af_za (default config)

  • Download size: 877.09 MiB

  • Dataset size: 1.91 GiB

  • Splits:

Split Examples
'test' 264
'train' 1,032
'validation' 198

xtreme_s/fleurs.am_et

  • Download size: 2.18 GiB

  • Dataset size: 4.92 GiB

  • Splits:

Split Examples
'test' 516
'train' 3,163
'validation' 223

xtreme_s/fleurs.ar_eg

  • Download size: 1.42 GiB

  • Dataset size: 3.06 GiB

  • Splits:

Split Examples
'test' 428
'train' 2,104
'validation' 295

xtreme_s/fleurs.as_in

  • Download size: 2.67 GiB

  • Dataset size: 5.73 GiB

  • Splits:

Split Examples
'test' 984
'train' 2,812
'validation' 418

xtreme_s/fleurs.ast_es

  • Download size: 1.90 GiB

  • Dataset size: 4.03 GiB

  • Splits:

Split Examples
'test' 946
'train' 2,511
'validation' 398

xtreme_s/fleurs.az_az

  • Download size: 2.28 GiB

  • Dataset size: 5.08 GiB

  • Splits:

Split Examples
'test' 923
'train' 2,665
'validation' 400

xtreme_s/fleurs.be_by

  • Download size: 2.45 GiB

  • Dataset size: 5.53 GiB

  • Splits:

Split Examples
'test' 967
'train' 2,433
'validation' 408

xtreme_s/fleurs.bg_bg

  • Download size: 2.11 GiB

  • Dataset size: 4.61 GiB

  • Splits:

Split Examples
'test' 658
'train' 2,973
'validation' 395

xtreme_s/fleurs.bn_in

  • Download size: 2.77 GiB

  • Dataset size: 5.84 GiB

  • Splits:

Split Examples
'test' 920
'train' 3,006
'validation' 402

xtreme_s/fleurs.bs_ba

  • Download size: 2.32 GiB

  • Dataset size: 5.23 GiB

  • Splits:

Split Examples
'test' 925
'train' 3,091
'validation' 400

xtreme_s/fleurs.ca_es

  • Download size: 2.01 GiB

  • Dataset size: 4.32 GiB

  • Splits:

Split Examples
'test' 940
'train' 2,300
'validation' 404

xtreme_s/fleurs.ceb_ph

  • Download size: 2.63 GiB

  • Dataset size: 5.65 GiB

  • Splits:

Split Examples
'test' 541
'train' 3,261
'validation' 225

xtreme_s/fleurs.ckb_iq

  • Download size: 2.46 GiB

  • Dataset size: 5.34 GiB

  • Splits:

Split Examples
'test' 922
'train' 3,040
'validation' 386

xtreme_s/fleurs.cmn_hans_cn

  • Download size: 2.35 GiB

  • Dataset size: 5.12 GiB

  • Splits:

Split Examples
'test' 945
'train' 3,246
'validation' 409

xtreme_s/fleurs.cs_cz

  • Download size: 1.93 GiB

  • Dataset size: 4.32 GiB

  • Splits:

Split Examples
'test' 723
'train' 2,811
'validation' 305

xtreme_s/fleurs.cy_gb

  • Download size: 2.90 GiB

  • Dataset size: 6.62 GiB

  • Splits:

Split Examples
'test' 1,021
'train' 3,427
'validation' 447

xtreme_s/fleurs.da_dk

  • Download size: 1.82 GiB

  • Dataset size: 4.17 GiB

  • Splits:

Split Examples
'test' 930
'train' 2,465
'validation' 395

xtreme_s/fleurs.de_de

  • Download size: 2.25 GiB

  • Dataset size: 4.88 GiB

  • Splits:

Split Examples
'test' 862
'train' 2,987
'validation' 363

xtreme_s/fleurs.el_gr

  • Download size: 2.24 GiB

  • Dataset size: 4.73 GiB

  • Splits:

Split Examples
'test' 650
'train' 3,215
'validation' 271

xtreme_s/fleurs.en_us

  • Download size: 1.72 GiB

  • Dataset size: 3.76 GiB

  • Splits:

Split Examples
'test' 647
'train' 2,602
'validation' 394

xtreme_s/fleurs.es_419

  • Download size: 2.14 GiB

  • Dataset size: 4.80 GiB

  • Splits:

Split Examples
'test' 908
'train' 2,796
'validation' 408

xtreme_s/fleurs.et_ee

  • Download size: 1.88 GiB

  • Dataset size: 4.20 GiB

  • Splits:

Split Examples
'test' 893
'train' 2,501
'validation' 387

xtreme_s/fleurs.fa_ir

  • Download size: 2.87 GiB

  • Dataset size: 6.34 GiB

  • Splits:

Split Examples
'test' 871
'train' 3,101
'validation' 369

xtreme_s/fleurs.ff_sn

  • Download size: 2.93 GiB

  • Dataset size: 6.54 GiB

  • Splits:

Split Examples
'test' 660
'train' 3,235
'validation' 273

xtreme_s/fleurs.fi_fi

  • Download size: 2.21 GiB

  • Dataset size: 4.92 GiB

  • Splits:

Split Examples
'test' 918
'train' 2,704
'validation' 415

xtreme_s/fleurs.fil_ph

  • Download size: 2.47 GiB

  • Dataset size: 5.38 GiB

  • Splits:

Split Examples
'test' 964
'train' 1,884
'validation' 418

xtreme_s/fleurs.fr_fr

  • Download size: 2.08 GiB

  • Dataset size: 4.73 GiB

  • Splits:

Split Examples
'test' 676
'train' 3,193
'validation' 289

xtreme_s/fleurs.ga_ie

  • Download size: 2.78 GiB

  • Dataset size: 6.24 GiB

  • Splits:

Split Examples
'test' 842
'train' 2,845
'validation' 369

xtreme_s/fleurs.gl_es

  • Download size: 1.79 GiB

  • Dataset size: 3.87 GiB

  • Splits:

Split Examples
'test' 927
'train' 2,175
'validation' 395

xtreme_s/fleurs.gu_in

  • Download size: 2.33 GiB

  • Dataset size: 4.97 GiB

  • Splits:

Split Examples
'test' 1,000
'train' 3,145
'validation' 432

xtreme_s/fleurs.ha_ng

  • Download size: 3.14 GiB

  • Dataset size: 6.84 GiB

  • Splits:

Split Examples
'test' 621
'train' 3,259
'validation' 296

xtreme_s/fleurs.he_il

  • Download size: 2.22 GiB

  • Dataset size: 4.60 GiB

  • Splits:

Split Examples
'test' 792
'train' 3,242
'validation' 328

xtreme_s/fleurs.hi_in

  • Download size: 1.53 GiB

  • Dataset size: 3.27 GiB

  • Splits:

Split Examples
'test' 418
'train' 2,120
'validation' 239

xtreme_s/fleurs.hr_hr

  • Download size: 2.40 GiB

  • Dataset size: 5.54 GiB

  • Splits:

Split Examples
'test' 914
'train' 3,461
'validation' 377

xtreme_s/fleurs.hu_hu

  • Download size: 2.33 GiB

  • Dataset size: 5.07 GiB

  • Splits:

Split Examples
'test' 905
'train' 3,095
'validation' 407

xtreme_s/fleurs.hy_am

  • Download size: 2.30 GiB

  • Dataset size: 5.23 GiB

  • Splits:

Split Examples
'test' 932
'train' 3,053
'validation' 395

xtreme_s/fleurs.id_id

  • Download size: 2.21 GiB

  • Dataset size: 4.69 GiB

  • Splits:

Split Examples
'test' 687
'train' 2,579
'validation' 350

xtreme_s/fleurs.ig_ng

  • Download size: 3.32 GiB

  • Dataset size: 7.44 GiB

  • Splits:

Split Examples
'test' 969
'train' 2,839
'validation' 413

xtreme_s/fleurs.is_is

  • Download size: 559.65 MiB

  • Dataset size: 1.16 GiB

  • Splits:

Split Examples
'test' 46
'train' 926
'validation' 36

xtreme_s/fleurs.it_it

  • Download size: 2.41 GiB

  • Dataset size: 5.19 GiB

  • Splits:

Split Examples
'test' 865
'train' 3,030
'validation' 391

xtreme_s/fleurs.ja_jp

  • Download size: 1.83 GiB

  • Dataset size: 3.94 GiB

  • Splits:

Split Examples
'test' 650
'train' 2,292
'validation' 266

xtreme_s/fleurs.jv_id

  • Download size: 2.68 GiB

  • Dataset size: 5.62 GiB

  • Splits:

Split Examples
'test' 728
'train' 3,051
'validation' 295

xtreme_s/fleurs.ka_ge

  • Download size: 1.50 GiB

  • Dataset size: 3.37 GiB

  • Splits:

Split Examples
'test' 979
'train' 1,491
'validation' 409

xtreme_s/fleurs.kam_ke

  • Download size: 3.43 GiB

  • Dataset size: 7.37 GiB

  • Splits:

Split Examples
'test' 827
'train' 3,340
'validation' 338

xtreme_s/fleurs.kea_cv

  • Download size: 2.55 GiB

  • Dataset size: 5.57 GiB

  • Splits:

Split Examples
'test' 864
'train' 2,715
'validation' 366

xtreme_s/fleurs.kk_kz

  • Download size: 2.69 GiB

  • Dataset size: 6.24 GiB

  • Splits:

Split Examples
'test' 856
'train' 3,200
'validation' 369

xtreme_s/fleurs.km_kh

  • Download size: 1.88 GiB

  • Dataset size: 4.26 GiB

  • Splits:

Split Examples
'test' 771
'train' 1,675
'validation' 326

xtreme_s/fleurs.kn_in

  • Download size: 2.26 GiB

  • Dataset size: 4.81 GiB

  • Splits:

Split Examples
'test' 838
'train' 2,283
'validation' 368

xtreme_s/fleurs.ko_kr

  • Download size: 1.65 GiB

  • Dataset size: 3.67 GiB

  • Splits:

Split Examples
'test' 382
'train' 2,307
'validation' 226

xtreme_s/fleurs.ky_kg

  • Download size: 2.18 GiB

  • Dataset size: 5.13 GiB

  • Splits:

Split Examples
'test' 977
'train' 2,818
'validation' 422

xtreme_s/fleurs.lb_lu

  • Download size: 1.94 GiB

  • Dataset size: 4.39 GiB

  • Splits:

Split Examples
'test' 934
'train' 2,502
'validation' 408

xtreme_s/fleurs.lg_ug

  • Download size: 2.83 GiB

  • Dataset size: 6.43 GiB

  • Splits:

Split Examples
'test' 723
'train' 2,478
'validation' 306

xtreme_s/fleurs.ln_cd

  • Download size: 3.68 GiB

  • Dataset size: 8.13 GiB

  • Splits:

Split Examples
'test' 478
'train' 3,350
'validation' 209

xtreme_s/fleurs.lo_la

  • Download size: 1.61 GiB

  • Dataset size: 3.38 GiB

  • Splits:

Split Examples
'test' 405
'train' 1,809
'validation' 191

xtreme_s/fleurs.lt_lt

  • Download size: 2.24 GiB

  • Dataset size: 5.03 GiB

  • Splits:

Split Examples
'test' 986
'train' 2,937
'validation' 416

xtreme_s/fleurs.luo_ke

  • Download size: 1.87 GiB

  • Dataset size: 4.18 GiB

  • Splits:

Split Examples
'test' 256
'train' 2,384
'validation' 102

xtreme_s/fleurs.lv_lv

  • Download size: 1.74 GiB

  • Dataset size: 3.84 GiB

  • Splits:

Split Examples
'test' 851
'train' 2,110
'validation' 356

xtreme_s/fleurs.mi_nz

  • Download size: 4.45 GiB

  • Dataset size: 9.67 GiB

  • Splits:

Split Examples
'test' 1,008
'train' 3,249
'validation' 429

xtreme_s/fleurs.mk_mk

  • Download size: 1.91 GiB

  • Dataset size: 4.23 GiB

  • Splits:

Split Examples
'test' 973
'train' 2,337
'validation' 415

xtreme_s/fleurs.ml_in

  • Download size: 2.68 GiB

  • Dataset size: 5.80 GiB

  • Splits:

Split Examples
'test' 958
'train' 3,043
'validation' 418

xtreme_s/fleurs.mn_mn

  • Download size: 2.21 GiB

  • Dataset size: 5.47 GiB

  • Splits:

Split Examples
'test' 949
'train' 3,074
'validation' 405

xtreme_s/fleurs.mr_in

  • Download size: 3.00 GiB

  • Dataset size: 6.41 GiB

  • Splits:

Split Examples
'test' 1,015
'train' 3,269
'validation' 443

xtreme_s/fleurs.ms_my

  • Download size: 2.22 GiB

  • Dataset size: 4.74 GiB

  • Splits:

Split Examples
'test' 749
'train' 2,667
'validation' 324

xtreme_s/fleurs.mt_mt

  • Download size: 2.39 GiB

  • Dataset size: 5.37 GiB

  • Splits:

Split Examples
'test' 926
'train' 2,895
'validation' 404

xtreme_s/fleurs.my_mm

  • Download size: 2.85 GiB

  • Dataset size: 6.43 GiB

  • Splits:

Split Examples
'test' 880
'train' 3,058
'validation' 384

xtreme_s/fleurs.nb_no

  • Download size: 2.15 GiB

  • Dataset size: 4.68 GiB

  • Splits:

Split Examples
'test' 357
'train' 3,167
'validation' 163

xtreme_s/fleurs.ne_np

  • Download size: 2.47 GiB

  • Dataset size: 5.39 GiB

  • Splits:

Split Examples
'test' 726
'train' 3,332
'validation' 305

xtreme_s/fleurs.nl_nl

  • Download size: 1.56 GiB

  • Dataset size: 3.33 GiB

  • Splits:

Split Examples
'test' 364
'train' 2,918
'validation' 171

xtreme_s/fleurs.nso_za

  • Download size: 3.27 GiB

  • Dataset size: 7.11 GiB

  • Splits:

Split Examples
'test' 790
'train' 1,990
'validation' 363

xtreme_s/fleurs.ny_mw

  • Download size: 2.60 GiB

  • Dataset size: 5.76 GiB

  • Splits:

Split Examples
'test' 761
'train' 2,694
'validation' 311

xtreme_s/fleurs.oc_fr

  • Download size: 3.52 GiB

  • Dataset size: 7.43 GiB

  • Splits:

Split Examples
'test' 998
'train' 3,379
'validation' 427

xtreme_s/fleurs.om_et

  • Download size: 1.18 GiB

  • Dataset size: 2.52 GiB

  • Splits:

Split Examples
'test' 41
'train' 1,701
'validation' 19

xtreme_s/fleurs.or_in

  • Download size: 1.31 GiB

  • Dataset size: 2.79 GiB

  • Splits:

Split Examples
'test' 883
'train' 1,081
'validation' 392

xtreme_s/fleurs.pa_in

  • Download size: 1.58 GiB

  • Dataset size: 3.34 GiB

  • Splits:

Split Examples
'test' 574
'train' 1,923
'validation' 251

xtreme_s/fleurs.pl_pl

  • Download size: 1.95 GiB

  • Dataset size: 4.39 GiB

  • Splits:

Split Examples
'test' 758
'train' 2,841
'validation' 338

xtreme_s/fleurs.ps_af

  • Download size: 1.89 GiB

  • Dataset size: 4.20 GiB

  • Splits:

Split Examples
'test' 512
'train' 2,513
'validation' 217

xtreme_s/fleurs.pt_br

  • Download size: 2.50 GiB

  • Dataset size: 5.43 GiB

  • Splits:

Split Examples
'test' 919
'train' 2,793
'validation' 386

xtreme_s/fleurs.ro_ro

  • Download size: 2.39 GiB

  • Dataset size: 5.09 GiB

  • Splits:

Split Examples
'test' 883
'train' 2,891
'validation' 387

xtreme_s/fleurs.ru_ru

  • Download size: 1.92 GiB

  • Dataset size: 4.25 GiB

  • Splits:

Split Examples
'test' 775
'train' 2,562
'validation' 356

xtreme_s/fleurs.sd_in

  • Download size: 3.00 GiB

  • Dataset size: 6.35 GiB

  • Splits:

Split Examples
'test' 980
'train' 3,443
'validation' 426

xtreme_s/fleurs.sk_sk

  • Download size: 1.55 GiB

  • Dataset size: 3.46 GiB

  • Splits:

Split Examples
'test' 792
'train' 1,957
'validation' 352

xtreme_s/fleurs.sl_si

  • Download size: 1.74 GiB

  • Dataset size: 4.00 GiB

  • Splits:

Split Examples
'test' 834
'train' 2,512
'validation' 349

xtreme_s/fleurs.sn_zw

  • Download size: 2.51 GiB

  • Dataset size: 5.61 GiB

  • Splits:

Split Examples
'test' 925
'train' 2,463
'validation' 393

xtreme_s/fleurs.so_so

  • Download size: 3.13 GiB

  • Dataset size: 6.93 GiB

  • Splits:

Split Examples
'test' 1,019
'train' 3,149
'validation' 432

xtreme_s/fleurs.sr_rs

  • Download size: 2.36 GiB

  • Dataset size: 5.04 GiB

  • Splits:

Split Examples
'test' 700
'train' 2,944
'validation' 290

xtreme_s/fleurs.sv_se

  • Download size: 1.95 GiB

  • Dataset size: 4.21 GiB

  • Splits:

Split Examples
'test' 759
'train' 2,385
'validation' 330

xtreme_s/fleurs.sw_ke

  • Download size: 2.70 GiB

  • Dataset size: 5.98 GiB

  • Splits:

Split Examples
'test' 487
'train' 3,070
'validation' 211

xtreme_s/fleurs.ta_in

  • Download size: 2.13 GiB

  • Dataset size: 4.50 GiB

  • Splits:

Split Examples
'test' 591
'train' 2,367
'validation' 377

xtreme_s/fleurs.te_in

  • Download size: 1.81 GiB

  • Dataset size: 3.86 GiB

  • Splits:

Split Examples
'test' 472
'train' 2,302
'validation' 311

xtreme_s/fleurs.tg_tj

  • Download size: 1.79 GiB

  • Dataset size: 4.26 GiB

  • Splits:

Split Examples
'test' 600
'train' 2,298
'validation' 240

xtreme_s/fleurs.th_th

  • Download size: 2.24 GiB

  • Dataset size: 4.91 GiB

  • Splits:

Split Examples
'test' 1,021
'train' 2,602
'validation' 439

xtreme_s/fleurs.tr_tr

  • Download size: 1.99 GiB

  • Dataset size: 4.36 GiB

  • Splits:

Split Examples
'test' 743
'train' 2,526
'validation' 338

xtreme_s/fleurs.uk_ua

  • Download size: 2.01 GiB

  • Dataset size: 4.44 GiB

  • Splits:

Split Examples
'test' 750
'train' 2,810
'validation' 325

xtreme_s/fleurs.umb_ao

  • Download size: 2.58 GiB

  • Dataset size: 5.55 GiB

  • Splits:

Split Examples
'test' 379
'train' 1,597
'validation' 135

xtreme_s/fleurs.ur_pk

  • Download size: 1.42 GiB

  • Dataset size: 3.13 GiB

  • Splits:

Split Examples
'test' 299
'train' 2,109
'validation' 267

xtreme_s/fleurs.uz_uz

  • Download size: 2.33 GiB

  • Dataset size: 5.18 GiB

  • Splits:

Split Examples
'test' 862
'train' 2,943
'validation' 363

xtreme_s/fleurs.vi_vn

  • Download size: 2.21 GiB

  • Dataset size: 4.88 GiB

  • Splits:

Split Examples
'test' 857
'train' 2,994
'validation' 361

xtreme_s/fleurs.wo_sn

  • Download size: 1.71 GiB

  • Dataset size: 3.86 GiB

  • Splits:

Split Examples
'test' 371
'train' 2,279
'validation' 169

xtreme_s/fleurs.xh_za

  • Download size: 2.86 GiB

  • Dataset size: 6.80 GiB

  • Splits:

Split Examples
'test' 1,041
'train' 3,466
'validation' 446

xtreme_s/fleurs.yo_ng

  • Download size: 2.65 GiB

  • Dataset size: 5.78 GiB

  • Splits:

Split Examples
'test' 831
'train' 2,339
'validation' 378

xtreme_s/fleurs.yue_hant_hk

  • Download size: 1.86 GiB

  • Dataset size: 4.06 GiB

  • Splits:

Split Examples
'test' 819
'train' 1,939
'validation' 362

xtreme_s/fleurs.zu_za

  • Download size: 3.15 GiB

  • Dataset size: 7.27 GiB

  • Splits:

Split Examples
'test' 854
'train' 2,858
'validation' 354

xtreme_s/fleurs.all

  • Download size: 231.36 GiB

  • Dataset size: 509.51 GiB

  • Splits:

Split Examples
'test' 77,810
'train' 271,798
'validation' 34,452