सन्दर्भ:
सभी_भाषाएँ
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/all_languages')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1926192 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ए.एफ
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/af')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 307 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एआर
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ar')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 6446 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
अज़
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/az')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 624 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
होना
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/be')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1512 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
हिट
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ber')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 67484 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
बीजी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/bg')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 6324 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
अरब
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/bn')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1440 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
बीआर
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/br')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2536 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
सीए
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ca')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 518 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
सीबीके
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/cbk')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 262 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
सी.एम.एन
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/cmn')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 12549 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
सी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/cs')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 6659 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
दा
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/da')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 11220 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
डे
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/de')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 125091 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एल
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/el')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 10072 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एन
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/en')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 158053 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ईओ
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/eo')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 207105 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
तों
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/es')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 85064 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एट
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/et')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 241 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
यूरोपीय संघ
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/eu')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 573 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
फाई
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/fi')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 31753 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
फादर
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/fr')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 116733 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
जीएल
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/gl')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 351 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
गोस
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/gos')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 279 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
वह
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/he')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 68350 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
नमस्ते
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/hi')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1913 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
मानव संसाधन
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/hr')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 505 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
हू
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/hu')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 67964 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
हरियाणा
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/hy')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 603 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
आइए
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ia')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2548 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
पहचान
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/id')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1602 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
यानी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ie')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 488 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
आईओ
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/io')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 480 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
है
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/is')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1641 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
यह
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/it')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 198919 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
जा
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ja')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 44267 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
जेबो
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/jbo')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2704 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
कब
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/kab')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 15944 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
को
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ko')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 503 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
किलोवाट
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/kw')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1328 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ला
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/la')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 6889 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एलएफएन
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/lfn')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2313 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
लेफ्टिनेंट
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/lt')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 8042 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एमके
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/mk')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 14678 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
श्री
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/mr')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 16413 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
नायब
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/nb')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1094 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एनडीएस
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/nds')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2633 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
nl
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/nl')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 23561 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ओआरवी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/orv')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 471 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ओटीए
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ota')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 486 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
पेस
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/pes')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 4285 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
पी एल
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/pl')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 22391 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
पीटी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/pt')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 78430 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
आर एन
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/rn')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 648 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
आरओ
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ro')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2092 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
आरयू
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ru')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 251263 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
क्र
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/sl')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 706 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एसआर
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/sr')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 8175 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
एसवी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/sv')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 7005 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
टी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/tk')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1165 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
टी एल
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/tl')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1017 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
टीएलएच
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/tlh')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2804 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
टोकी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/toki')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 3738 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
टी.आर.
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/tr')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 142088 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
टीटी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/tt')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 2398 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
स्नातकीय
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ug')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 1183 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
यूके
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/uk')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 54431 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
उर
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/ur')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 252 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
छठी
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/vi')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 962 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
वो
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/vo')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 328 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
युद्ध
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/war')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 327 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
वू
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/wuu')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 408 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
यू
इस डेटासेट को TFDS में लोड करने के लिए निम्नलिखित कमांड का उपयोग करें:
ds = tfds.load('huggingface:tapaco/yue')
- विवरण :
A freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a
crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular
linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links
between sentences “meaning the same thing”. This graph is then traversed to extract sets of paraphrases. Several language-independent
filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows
that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or
near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 – 250 000
sentences per language. It covers a range of languages for which, to our knowledge, no other paraphrase dataset exists.
- लाइसेंस : क्रिएटिव कॉमन्स एट्रिब्यूशन 2.0 जेनेरिक
- संस्करण : 1.0.0
- विभाजन :
विभाजित करना | उदाहरण |
---|---|
'train' | 561 |
- विशेषताएँ :
{
"paraphrase_set_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"paraphrase": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"lists": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"tags": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"language": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}