Referensi:
lintas_topik_1
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_1')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 1.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 207 |
'train' | 112 |
'validation' | 62 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_genre_1
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_1')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 13.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 269 |
'train' | 63 |
'validation' | 112 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_2
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_2')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 2.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 179 |
'train' | 112 |
'validation' | 90 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_3
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_3')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 3.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 152 |
'train' | 112 |
'validation' | 117 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_4
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_4')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 4.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 207 |
'train' | 62 |
'validation' | 112 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_5
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_5')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 5.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 229 |
'train' | 62 |
'validation' | 90 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_6
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_6')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 6.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 202 |
'train' | 62 |
'validation' | 117 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_7
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_7')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 7.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 179 |
'train' | 90 |
'validation' | 112 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_8
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_8')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 8.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 229 |
'train' | 90 |
'validation' | 62 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_9
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_9')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 9.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 174 |
'train' | 90 |
'validation' | 117 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_10
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_10')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 10.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 152 |
'train' | 117 |
'validation' | 112 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_11
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_11')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 11.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 202 |
'train' | 117 |
'validation' | 62 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_topik_12
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_12')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 12.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 174 |
'train' | 117 |
'validation' | 90 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_genre_2
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_2')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 14.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 319 |
'train' | 63 |
'validation' | 62 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_genre_3
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_3')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 15.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 291 |
'train' | 63 |
'validation' | 90 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lintas_genre_4
Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_4')
- Keterangan :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- Lisensi : Tidak ada lisensi yang diketahui
- Versi : 16.0.0
- Perpecahan :
Membelah | Contoh |
---|---|
'test' | 264 |
'train' | 63 |
'validation' | 117 |
- Fitur :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}