- Description:
Contextualization
ASSIN 2 is the second edition of the Avaliação de Similaridade Semântica e Inferência Textual (Evaluating Semantic Similarity and Textual Entailment), and was a workshop collocated with STIL 2019. It follows the first edition of ASSIN, proposing a new shared task with new data.
The workshop evaluated systems that assess two types of relations between two sentences: Semantic Textual Similarity and Textual Entailment.
Semantic Textual Similarity consists of quantifying the level of semantic equivalence between sentences, while Textual Entailment Recognition consists of classifying whether the first sentence entails the second.
Data
The corpus used in ASSIN 2 is composed of rather simple sentences. Following the procedures of SemEval 2014 Task 1, we tried to remove from the corpus named entities and indirect speech, and tried to have all verbs in the present tense. The annotation instructions given to annotators are available (in Portuguese).
The training and validation data are composed, respectively, of 6,500 and 500 sentence pairs in Brazilian Portuguese, annotated for entailment and semantic similarity. Semantic similarity values range from 1 to 5, and text entailment classes are either entailment or none. The test data are composed of approximately 3,000 sentence pairs with the same annotation. All data were manually annotated.
Evaluation
Evaluation The evaluation of submissions to ASSIN 2 was with the same metrics as the first ASSIN, with the F1 of precision and recall as the main metric for text entailment and Pearson correlation for semantic similarity. The evaluation scripts are the same as in the last edition.
PS.: Description is extracted from official homepage.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.datasets.assin2.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
2.02 MiB
Dataset size:
1.82 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'test' |
2,448 |
'train' |
6,500 |
'validation' |
500 |
- Feature structure:
FeaturesDict({
'entailment': ClassLabel(shape=(), dtype=int64, num_classes=2),
'hypothesis': Text(shape=(), dtype=string),
'id': int32,
'similarity': float32,
'text': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
entailment | ClassLabel | int64 | ||
hypothesis | Text | string | ||
id | Tensor | int32 | ||
similarity | Tensor | float32 | ||
text | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{DBLP:conf/propor/RealFO20,
author = {Livy Real and
Erick Fonseca and
Hugo Gon{\c{c} }alo Oliveira},
editor = {Paulo Quaresma and
Renata Vieira and
Sandra M. Alu{\'{\i} }sio and
Helena Moniz and
Fernando Batista and
Teresa Gon{\c{c} }alves},
title = {The {ASSIN} 2 Shared Task: {A} Quick Overview},
booktitle = {Computational Processing of the Portuguese Language - 14th International
Conference, {PROPOR} 2020, Evora, Portugal, March 2-4, 2020, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {12037},
pages = {406--412},
publisher = {Springer},
year = {2020},
url = {https://doi.org/10.1007/978-3-030-41505-1_39},
doi = {10.1007/978-3-030-41505-1_39},
timestamp = {Tue, 03 Mar 2020 09:40:18 +0100},
biburl = {https://dblp.org/rec/conf/propor/RealFO20.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}