References:
18828_alt.atheism
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_alt.atheism')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
799 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_comp.graphics
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_comp.graphics')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
973 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_comp.os.ms-windows.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_comp.os.ms-windows.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
985 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_comp.sys.ibm.pc.hardware
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_comp.sys.ibm.pc.hardware')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
982 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_comp.sys.mac.hardware
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_comp.sys.mac.hardware')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
961 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_comp.windows.x
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_comp.windows.x')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
980 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_misc.forsale
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_misc.forsale')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
972 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_rec.autos
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_rec.autos')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
990 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_rec.motorcycles
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_rec.motorcycles')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
994 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_rec.sport.baseball
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_rec.sport.baseball')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
994 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_rec.sport.hockey
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_rec.sport.hockey')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
999 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_sci.crypt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_sci.crypt')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
991 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_sci.electronics
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_sci.electronics')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
981 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_sci.med
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_sci.med')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
990 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_sci.space
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_sci.space')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
987 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_soc.religion.christian
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_soc.religion.christian')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
997 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_talk.politics.guns
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_talk.politics.guns')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
910 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_talk.politics.mideast
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_talk.politics.mideast')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
940 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_talk.politics.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_talk.politics.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
775 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
18828_talk.religion.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/18828_talk.religion.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
does not include cross-posts and includes only the "From" and "Subject" headers.
- License: No known license
- Version: 3.0.0
- Splits:
Split | Examples |
---|---|
'train' |
628 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_alt.atheism
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_alt.atheism')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_comp.graphics
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_comp.graphics')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_comp.os.ms-windows.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_comp.os.ms-windows.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_comp.sys.ibm.pc.hardware
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_comp.sys.ibm.pc.hardware')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_comp.sys.mac.hardware
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_comp.sys.mac.hardware')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_comp.windows.x
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_comp.windows.x')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_misc.forsale
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_misc.forsale')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_rec.autos
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_rec.autos')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_rec.motorcycles
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_rec.motorcycles')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_rec.sport.baseball
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_rec.sport.baseball')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_rec.sport.hockey
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_rec.sport.hockey')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_sci.crypt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_sci.crypt')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_sci.electronics
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_sci.electronics')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_sci.med
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_sci.med')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_sci.space
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_sci.space')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_soc.religion.christian
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_soc.religion.christian')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
997 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_talk.politics.guns
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_talk.politics.guns')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_talk.politics.mideast
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_talk.politics.mideast')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_talk.politics.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_talk.politics.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
19997_talk.religion.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/19997_talk.religion.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
the original, unmodified version.
- License: No known license
- Version: 1.0.0
- Splits:
Split | Examples |
---|---|
'train' |
1000 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_alt.atheism
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_alt.atheism')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
319 |
'train' |
480 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_comp.graphics
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_comp.graphics')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
389 |
'train' |
584 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_comp.os.ms-windows.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_comp.os.ms-windows.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
394 |
'train' |
591 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_comp.sys.ibm.pc.hardware
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_comp.sys.ibm.pc.hardware')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
392 |
'train' |
590 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_comp.sys.mac.hardware
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_comp.sys.mac.hardware')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
385 |
'train' |
578 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_comp.windows.x
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_comp.windows.x')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
395 |
'train' |
593 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_misc.forsale
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_misc.forsale')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
390 |
'train' |
585 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_rec.autos
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_rec.autos')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
396 |
'train' |
594 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_rec.motorcycles
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_rec.motorcycles')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
398 |
'train' |
598 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_rec.sport.baseball
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_rec.sport.baseball')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
397 |
'train' |
597 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_rec.sport.hockey
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_rec.sport.hockey')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
399 |
'train' |
600 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_sci.crypt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_sci.crypt')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
396 |
'train' |
595 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_sci.electronics
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_sci.electronics')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
393 |
'train' |
591 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_sci.med
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_sci.med')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
396 |
'train' |
594 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_sci.space
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_sci.space')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
394 |
'train' |
593 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_soc.religion.christian
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_soc.religion.christian')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
398 |
'train' |
599 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_talk.politics.guns
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_talk.politics.guns')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
364 |
'train' |
546 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_talk.politics.mideast
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_talk.politics.mideast')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
376 |
'train' |
564 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_talk.politics.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_talk.politics.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
310 |
'train' |
465 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
bydate_talk.religion.misc
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:newsgroup/bydate_talk.religion.misc')
- Description:
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across
20 different newsgroups. The 20 newsgroups collection has become a popular data set for experiments in text applications of
machine learning techniques, such as text classification and text clustering.
sorted by date into training(60%) and test(40%) sets, does not include cross-posts (duplicates) and does not include newsgroup-identifying headers (Xref, Newsgroups, Path, Followup-To, Date)
- License: No known license
- Version: 2.0.0
- Splits:
Split | Examples |
---|---|
'test' |
251 |
'train' |
377 |
- Features:
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}