s2orc

참고자료:

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:s2orc')
  • 설명 :
A large corpus of 81.1M English-language academic papers spanning many academic disciplines.
Rich metadata, paper abstracts, resolved bibliographic references, as well as structured full
text for 8.1M open access papers. Full text annotated with automatically-detected inline mentions of
citations, figures, and tables, each linked to their corresponding paper objects. Aggregated papers
from hundreds of academic publishers and digital archives into a unified source, and create the largest
publicly-available collection of machine-readable academic text to date.
  • 라이센스 : Semantic Scholar Open Research Corpus는 ODC-BY에 따라 라이센스가 부여됩니다.
  • 버전 : 1.1.0
  • 분할 :
나뉘다
'train' 189674763
  • 특징 :
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "paperAbstract": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "entities": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "s2Url": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "pdfUrls": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "s2PdfUrl": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "authors": [
        {
            "name": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "ids": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        }
    ],
    "inCitations": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "outCitations": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "fieldsOfStudy": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "year": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "venue": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "journalName": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "journalVolume": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "journalPages": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sources": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "doi": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "doiUrl": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "pmid": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "magId": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}