참고자료:
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:s2orc')
- 설명 :
A large corpus of 81.1M English-language academic papers spanning many academic disciplines.
Rich metadata, paper abstracts, resolved bibliographic references, as well as structured full
text for 8.1M open access papers. Full text annotated with automatically-detected inline mentions of
citations, figures, and tables, each linked to their corresponding paper objects. Aggregated papers
from hundreds of academic publishers and digital archives into a unified source, and create the largest
publicly-available collection of machine-readable academic text to date.
- 라이센스 : Semantic Scholar Open Research Corpus는 ODC-BY에 따라 라이센스가 부여됩니다.
- 버전 : 1.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'train' | 189674763 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paperAbstract": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"entities": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"s2Url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pdfUrls": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"s2PdfUrl": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"authors": [
{
"name": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"ids": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
],
"inCitations": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"outCitations": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"fieldsOfStudy": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"year": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"venue": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"journalName": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"journalVolume": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"journalPages": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sources": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"doi": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"doiUrl": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pmid": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"magId": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}