- Description:
databricks-dolly-15k
is an open source dataset of instruction-following
records used in training
databricks/dolly-v2-12b that
was generated by thousands of Databricks employees in several of the behavioral
categories outlined in the InstructGPT
paper, including brainstorming, classification, closed QA, generation,
information extraction, open QA, and summarization.
This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.
Homepage: https://github.com/databrickslabs/dolly
Source code:
tfds.datasets.databricks_dolly.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
12.60 MiB
Dataset size:
12.69 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'train' |
15,014 |
- Feature structure:
FeaturesDict({
'category': Text(shape=(), dtype=string),
'context': Text(shape=(), dtype=string),
'instruction': Text(shape=(), dtype=string),
'response': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
category | Text | string | ||
context | Text | string | ||
instruction | Text | string | ||
response | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation: