مراجع:
cross_topic_1
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_1')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 1.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 207 |
'train' | 112 |
'validation' | 62 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_1
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_1')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 13.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 269 |
'train' | 63 |
'validation' | 112 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_2
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_2')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 2.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 179 |
'train' | 112 |
'validation' | 90 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_3
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_3')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 3.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 152 |
'train' | 112 |
'validation' | 117 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_4
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_4')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 4.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 207 |
'train' | 62 |
'validation' | 112 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_5
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_5')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 5.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 229 |
'train' | 62 |
'validation' | 90 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_6
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_6')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 6.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 202 |
'train' | 62 |
'validation' | 117 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_7
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_7')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 7.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 179 |
'train' | 90 |
'validation' | 112 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_8
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_8')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 8.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 229 |
'train' | 90 |
'validation' | 62 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_9
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_9')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 9.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 174 |
'train' | 90 |
'validation' | 117 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_10
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_10')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 10.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 152 |
'train' | 117 |
'validation' | 112 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_11
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_11')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 11.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 202 |
'train' | 117 |
'validation' | 62 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_topic_12
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_topic_12')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 12.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 174 |
'train' | 117 |
'validation' | 90 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_2
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_2')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 14.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 319 |
'train' | 63 |
'validation' | 62 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_3
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_3')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 15.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 291 |
'train' | 63 |
'validation' | 90 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cross_genre_4
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:guardian_authorship/cross_genre_4')
- وصف :
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 16.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'test' | 264 |
'train' | 63 |
'validation' | 117 |
- سمات :
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}