참고자료:
현명한 시력_감정
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:wisesight_sentiment/wisesight_sentiment')
- 설명 :
Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment category (positive, neutral, negative, question)
* Released to public domain under Creative Commons Zero v1.0 Universal license.
* Category (Labels): {"pos": 0, "neu": 1, "neg": 2, "q": 3}
* Size: 26,737 messages
* Language: Central Thai
* Style: Informal and conversational. With some news headlines and advertisement.
* Time period: Around 2016 to early 2019. With small amount from other period.
* Domains: Mixed. Majority are consumer products and services (restaurants, cosmetics, drinks, car, hotels), with some current affairs.
* Privacy:
* Only messages that made available to the public on the internet (websites, blogs, social network sites).
* For Facebook, this means the public comments (everyone can see) that made on a public page.
* Private/protected messages and messages in groups, chat, and inbox are not included.
* Alternations and modifications:
* Keep in mind that this corpus does not statistically represent anything in the language register.
* Large amount of messages are not in their original form. Personal data are removed or masked.
* Duplicated, leading, and trailing whitespaces are removed. Other punctuations, symbols, and emojis are kept intact.
(Mis)spellings are kept intact.
* Messages longer than 2,000 characters are removed.
* Long non-Thai messages are removed. Duplicated message (exact match) are removed.
* More characteristics of the data can be explore: https://github.com/PyThaiNLP/wisesight-sentiment/blob/master/exploration.ipynb
- 라이센스 : 알려진 라이센스 없음
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 2671 |
'train' | 21628 |
'validation' | 2404 |
- 특징 :
{
"texts": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"category": {
"num_classes": 4,
"names": [
"pos",
"neu",
"neg",
"q"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
}
}