参考文献:
ollie_lemmagrep
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:ollie/ollie_lemmagrep')
- 説明:
The Ollie dataset includes two configs for the data
used to train the Ollie informatation extraction algorithm, for 18M
sentences and 3M sentences respectively.
This data is for academic use only. From the authors:
Ollie is a program that automatically identifies and extracts binary
relationships from English sentences. Ollie is designed for Web-scale
information extraction, where target relations are not specified in
advance.
Ollie is our second-generation information extraction system . Whereas
ReVerb operates on flat sequences of tokens, Ollie works with the
tree-like (graph with only small cycles) representation using
Stanford's compression of the dependencies. This allows Ollie to
capture expression that ReVerb misses, such as long-range relations.
Ollie also captures context that modifies a binary relation. Presently
Ollie handles attribution (He said/she believes) and enabling
conditions (if X then).
More information is available at the Ollie homepage:
https://knowitall.github.io/ollie/
ライセンス: ワシントン大学学術ライセンス : https://raw.githubusercontent.com/knowitall/ollie/master/LICENSE
バージョン: 1.1.0
分割:
スプリット | 例 |
---|---|
'train' | 18674630 |
- 特徴:
{
"arg1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"arg2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"rel": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"search_query": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"words": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pos": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"chunk": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_cnt": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
オーリーパターン
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:ollie/ollie_patterned')
- 説明:
The Ollie dataset includes two configs for the data
used to train the Ollie informatation extraction algorithm, for 18M
sentences and 3M sentences respectively.
This data is for academic use only. From the authors:
Ollie is a program that automatically identifies and extracts binary
relationships from English sentences. Ollie is designed for Web-scale
information extraction, where target relations are not specified in
advance.
Ollie is our second-generation information extraction system . Whereas
ReVerb operates on flat sequences of tokens, Ollie works with the
tree-like (graph with only small cycles) representation using
Stanford's compression of the dependencies. This allows Ollie to
capture expression that ReVerb misses, such as long-range relations.
Ollie also captures context that modifies a binary relation. Presently
Ollie handles attribution (He said/she believes) and enabling
conditions (if X then).
More information is available at the Ollie homepage:
https://knowitall.github.io/ollie/
ライセンス: ワシントン大学学術ライセンス : https://raw.githubusercontent.com/knowitall/ollie/master/LICENSE
バージョン: 1.1.0
分割:
スプリット | 例 |
---|---|
'train' | 3048961 |
- 特徴:
{
"rel": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"arg1": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"arg2": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"slot0": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"search_query": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"pattern": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"parse": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}