다중 예약됨

참고자료:

캘리포니아

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:multi_booked/ca')

설명 :

MultiBooked is a corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification.

The corpora are compiled from hotel reviews taken mainly from booking.com. The corpora are in Kaf/Naf format, which is
an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and
word-tokenized and lemmatized using Freeling for Catalan and ixa-pipes for Basque. Finally, for each language two
annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the
guidelines set out in the OpeNER project.

라이센스 : CC-BY 3.0
버전 : 0.0.0
분할 :

나뉘다	예
`'train'`	567

특징 :

{
    "text": {
        "feature": {
            "wid": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "sent": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "para": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "word": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "terms": {
        "feature": {
            "tid": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "lemma": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "morphofeat": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "pos": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "opinions": {
        "feature": {
            "oid": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "opinion_holder_target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "opinion_target_target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "opinion_expression_polarity": {
                "num_classes": 4,
                "names": [
                    "StrongNegative",
                    "Negative",
                    "Positive",
                    "StrongPositive"
                ],
                "names_file": null,
                "id": null,
                "_type": "ClassLabel"
            },
            "opinion_expression_target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}

ㅡ

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:multi_booked/eu')

설명 :

MultiBooked is a corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification.

The corpora are compiled from hotel reviews taken mainly from booking.com. The corpora are in Kaf/Naf format, which is
an xml-style stand-off format that allows for multiple layers of annotation. Each review was sentence- and
word-tokenized and lemmatized using Freeling for Catalan and ixa-pipes for Basque. Finally, for each language two
annotators annotated opinion holders, opinion targets, and opinion expressions for each review, following the
guidelines set out in the OpeNER project.

라이센스 : CC-BY 3.0
버전 : 0.0.0
분할 :

나뉘다	예
`'train'`	343

특징 :

{
    "text": {
        "feature": {
            "wid": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "sent": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "para": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "word": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "terms": {
        "feature": {
            "tid": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "lemma": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "morphofeat": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "pos": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "opinions": {
        "feature": {
            "oid": {
                "dtype": "string",
                "id": null,
                "_type": "Value"
            },
            "opinion_holder_target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "opinion_target_target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            },
            "opinion_expression_polarity": {
                "num_classes": 4,
                "names": [
                    "StrongNegative",
                    "Negative",
                    "Positive",
                    "StrongPositive"
                ],
                "names_file": null,
                "id": null,
                "_type": "ClassLabel"
            },
            "opinion_expression_target": {
                "feature": {
                    "dtype": "string",
                    "id": null,
                    "_type": "Value"
                },
                "length": -1,
                "id": null,
                "_type": "Sequence"
            }
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}