सन्दर्भ:
सभी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/all')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 14011953 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}
1800
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/1800s')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 13781747 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}
1700 के दशक
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/1700s')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 178224 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}
1510_1699
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/1510_1699')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 51982 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "timestamp[s]",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}
1500_1899
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/1500_1899')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 14011953 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "timestamp[s]",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}
1800_1899
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/1800_1899')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 13781747 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "timestamp[s]",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}
1700_1799
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:blbooks/1700_1799')
- विवरण :
A dataset comprising of text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900.
The books cover a wide range of subject areas including philosophy, history, poetry and literature.
- लाइसेंस : कोई ज्ञात लाइसेंस नहीं
- संस्करण : 1.0.2
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 178224 |
- विशेषताएँ :
{
"record_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "timestamp[s]",
"id": null,
"_type": "Value"
},
"raw_date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"place": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"empty_pg": {
"dtype": "bool",
"id": null,
"_type": "Value"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pg": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"mean_wc_ocr": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"std_wc_ocr": {
"dtype": "float64",
"id": null,
"_type": "Value"
},
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all_names": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Publisher": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Country of publication 1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"all Countries of publication": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Physical description": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_3": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"Language_4": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"multi_language": {
"dtype": "bool",
"id": null,
"_type": "Value"
}
}