Aprenda o que há de mais recente em aprendizado de máquina, IA generativa e muito mais no WiML Symposium 2023 Registre-se

Esta página foi traduzida pela API Cloud Translation.

suaíli

Referências:

suaíli

Use o seguinte comando para carregar esse conjunto de dados no TFDS:

ds = tfds.load('huggingface:swahili/swahili')

Descrição :

The Swahili dataset developed specifically for language modeling task.
The dataset contains 28,000 unique words with 6.84M, 970k, and 2M words for the train,
valid and test partitions respectively which represent the ratio 80:10:10.
The entire dataset is lowercased, has no punctuation marks and,
the start and end of sentence markers have been incorporated to facilitate easy tokenization during language modeling.

Licença : Atribuição 4.0 Internacional
Versão : 1.0.0
Divisões :

Dividir	Exemplos
`'test'`	3371
`'train'`	42069
`'validation'`	3372

Características :

{
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}