参考文献:
クロストピック_1
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_1')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 1.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 207 |
'train' | 112 |
'validation' | 62 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロスジャンル_1
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_genre_1')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 13.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 269 |
'train' | 63 |
'validation' | 112 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック2
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_2')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 2.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 179 |
'train' | 112 |
'validation' | 90 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック3
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_3')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 3.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 152 |
'train' | 112 |
'validation' | 117 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック4
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_4')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 4.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 207 |
'train' | 62 |
'validation' | 112 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック5
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_5')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 5.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 229 |
'train' | 62 |
'validation' | 90 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック6
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_6')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 6.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 202 |
'train' | 62 |
'validation' | 117 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック7
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_7')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 7.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 179 |
'train' | 90 |
'validation' | 112 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック8
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_8')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 8.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 229 |
'train' | 90 |
'validation' | 62 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック9
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_9')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 9.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 174 |
'train' | 90 |
'validation' | 117 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック10
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_10')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 10.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 152 |
'train' | 117 |
'validation' | 112 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック11
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_11')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 11.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 202 |
'train' | 117 |
'validation' | 62 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロストピック12
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_topic_12')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 12.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 174 |
'train' | 117 |
'validation' | 90 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロスジャンル_2
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_genre_2')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 14.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 319 |
'train' | 63 |
'validation' | 62 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロスジャンル_3
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_genre_3')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 15.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 291 |
'train' | 63 |
'validation' | 90 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
クロスジャンル_4
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:guardian_authorship/cross_genre_4')
- 説明:
A dataset cross-topic authorship attribution. The dataset is provided by Stamatatos 2013.
1- The cross-topic scenarios are based on Table-4 in Stamatatos 2017 (Ex. cross_topic_1 => row 1:P S U&W ).
2- The cross-genre scenarios are based on Table-5 in the same paper. (Ex. cross_genre_1 => row 1:B P S&U&W).
3- The same-topic/genre scenario is created by grouping all the datasts as follows.
For ex., to use same_topic and split the data 60-40 use:
train_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[:60%]+validation[:60%]+test[:60%]')
tests_ds = load_dataset('guardian_authorship', name="cross_topic_<<#>>",
split='train[-40%:]+validation[-40%:]+test[-40%:]')
Important: train+validation+test[:60%] will generate the wrong splits becasue the data is imbalanced
* See https://huggingface.co/docs/datasets/splits.html for detailed/more examples
- ライセンス: 既知のライセンスはありません
- バージョン: 16.0.0
- 分割:
スプリット | 例 |
---|---|
'test' | 264 |
'train' | 63 |
'validation' | 117 |
- 特徴:
{
"author": {
"num_classes": 13,
"names": [
"catherinebennett",
"georgemonbiot",
"hugoyoung",
"jonathanfreedland",
"martinkettle",
"maryriddell",
"nickcohen",
"peterpreston",
"pollytoynbee",
"royhattersley",
"simonhoggart",
"willhutton",
"zoewilliams"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"topic": {
"num_classes": 5,
"names": [
"Politics",
"Society",
"UK",
"World",
"Books"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"article": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}