참고자료:
거친_결
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:roman_urdu_hate_speech/Coarse_Grained')
- 설명 :
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- 라이센스 : MIT 라이센스
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 2002년 |
'train' | 7208 |
'validation' | 800 |
- 특징 :
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 2,
"names": [
"Abusive/Offensive",
"Normal"
],
"id": null,
"_type": "ClassLabel"
}
}
미세한 입자
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:roman_urdu_hate_speech/Fine_Grained')
- 설명 :
The Roman Urdu Hate-Speech and Offensive Language Detection (RUHSOLD) dataset is a Roman Urdu dataset of tweets annotated by experts in the relevant language. The authors develop the gold-standard for two sub-tasks. First sub-task is based on binary labels of Hate-Offensive content and Normal content (i.e., inoffensive language). These labels are self-explanatory. The authors refer to this sub-task as coarse-grained classification. Second sub-task defines Hate-Offensive content with four labels at a granular level. These labels are the most relevant for the demographic of users who converse in RU and are defined in related literature. The authors refer to this sub-task as fine-grained classification. The objective behind creating two gold-standards is to enable the researchers to evaluate the hate speech detection approaches on both easier (coarse-grained) and challenging (fine-grained) scenarios.
- 라이센스 : MIT 라이센스
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 2002년 |
'train' | 7208 |
'validation' | 7208 |
- 특징 :
{
"tweet": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"label": {
"num_classes": 5,
"names": [
"Abusive/Offensive",
"Normal",
"Religious Hate",
"Sexism",
"Profane/Untargeted"
],
"id": null,
"_type": "ClassLabel"
}
}