مراجع:
درشت_دانه
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:roman_urdu_hate_speech/Coarse_Grained')
- توضیحات :
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- مجوز : مجوز MIT
- نسخه : 1.1.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'test' | 2002 |
'train' | 7208 |
'validation' | 800 |
- ویژگی ها :
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 2,
"names": [
"Abusive/Offensive",
"Normal"
],
"id": null,
"_type": "ClassLabel"
}
}
دانه ریز
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:roman_urdu_hate_speech/Fine_Grained')
- توضیحات :
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- مجوز : مجوز MIT
- نسخه : 1.1.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'test' | 2002 |
'train' | 7208 |
'validation' | 7208 |
- ویژگی ها :
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 5,
"names": [
"Abusive/Offensive",
"Normal",
"Religious Hate",
"Sexism",
"Profane/Untargeted"
],
"id": null,
"_type": "ClassLabel"
}
}