- Description:
Web-Scale Parallel Corpora for Official European Languages.
Additional Documentation: Explore on Papers With Code
Homepage: https://paracrawl.eu/releases.html
Source code:
tfds.datasets.para_crawl.Builder
Versions:
1.2.0
(default): No release notes.
Figure (tfds.show_examples): Not supported.
Citation:
@misc {paracrawl,
title = "ParaCrawl",
year = "2018",
url = "http://paracrawl.eu/download.html."
}
para_crawl/enbg (default config)
Config description: Translation dataset from English to bg.
Download size:
98.94 MiB
Dataset size:
362.46 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,039,885 |
- Feature structure:
Translation({
'bg': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
bg | Text | string | ||
en | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'bg')
Examples (tfds.as_dataframe):
para_crawl/encs
Config description: Translation dataset from English to cs.
Download size:
187.31 MiB
Dataset size:
666.34 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
2,981,949 |
- Feature structure:
Translation({
'cs': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
cs | Text | string | ||
en | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'cs')
Examples (tfds.as_dataframe):
para_crawl/enda
Config description: Translation dataset from English to da.
Download size:
174.34 MiB
Dataset size:
619.77 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
2,414,895 |
- Feature structure:
Translation({
'da': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
da | Text | string | ||
en | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'da')
Examples (tfds.as_dataframe):
para_crawl/ende
Config description: Translation dataset from English to de.
Download size:
1.22 GiB
Dataset size:
4.04 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
16,264,448 |
- Feature structure:
Translation({
'de': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
de | Text | string | ||
en | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'de')
Examples (tfds.as_dataframe):
para_crawl/enel
Config description: Translation dataset from English to el.
Download size:
184.59 MiB
Dataset size:
698.75 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,985,233 |
- Feature structure:
Translation({
'el': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
el | Text | string | ||
en | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'el')
Examples (tfds.as_dataframe):
para_crawl/enes
Config description: Translation dataset from English to es.
Download size:
1.82 GiB
Dataset size:
6.23 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
21,987,267 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'es': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
es | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'es')
Examples (tfds.as_dataframe):
para_crawl/enet
Config description: Translation dataset from English to et.
Download size:
66.91 MiB
Dataset size:
209.16 MiB
Auto-cached (documentation): Only when
shuffle_files=False
(train)Splits:
Split | Examples |
---|---|
'train' |
853,422 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'et': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
et | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'et')
Examples (tfds.as_dataframe):
para_crawl/enfi
Config description: Translation dataset from English to fi.
Download size:
151.83 MiB
Dataset size:
543.85 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
2,156,069 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'fi': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
fi | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'fi')
Examples (tfds.as_dataframe):
para_crawl/enfr
Config description: Translation dataset from English to fr.
Download size:
2.63 GiB
Dataset size:
9.04 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
31,374,161 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'fr': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
fr | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'fr')
Examples (tfds.as_dataframe):
para_crawl/enga
Config description: Translation dataset from English to ga.
Download size:
28.03 MiB
Dataset size:
107.09 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'train' |
357,399 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'ga': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
ga | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'ga')
Examples (tfds.as_dataframe):
para_crawl/enhr
Config description: Translation dataset from English to hr.
Download size:
80.97 MiB
Dataset size:
256.37 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,002,053 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'hr': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
hr | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'hr')
Examples (tfds.as_dataframe):
para_crawl/enhu
Config description: Translation dataset from English to hu.
Download size:
114.24 MiB
Dataset size:
421.40 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,901,342 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'hu': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
hu | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'hu')
Examples (tfds.as_dataframe):
para_crawl/enit
Config description: Translation dataset from English to it.
Download size:
1017.30 MiB
Dataset size:
3.36 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
12,162,239 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'it': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
it | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'it')
Examples (tfds.as_dataframe):
para_crawl/enlt
Config description: Translation dataset from English to lt.
Download size:
63.28 MiB
Dataset size:
204.70 MiB
Auto-cached (documentation): Only when
shuffle_files=False
(train)Splits:
Split | Examples |
---|---|
'train' |
844,643 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'lt': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
lt | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'lt')
Examples (tfds.as_dataframe):
para_crawl/enlv
Config description: Translation dataset from English to lv.
Download size:
45.17 MiB
Dataset size:
147.09 MiB
Auto-cached (documentation): Only when
shuffle_files=False
(train)Splits:
Split | Examples |
---|---|
'train' |
553,060 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'lv': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
lv | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'lv')
Examples (tfds.as_dataframe):
para_crawl/enmt
Config description: Translation dataset from English to mt.
Download size:
18.15 MiB
Dataset size:
54.36 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'train' |
195,502 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'mt': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
mt | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'mt')
Examples (tfds.as_dataframe):
para_crawl/ennl
Config description: Translation dataset from English to nl.
Download size:
400.63 MiB
Dataset size:
1.40 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
5,659,268 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'nl': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
nl | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'nl')
Examples (tfds.as_dataframe):
para_crawl/enpl
Config description: Translation dataset from English to pl.
Download size:
257.90 MiB
Dataset size:
885.63 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
3,503,276 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'pl': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
pl | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'pl')
Examples (tfds.as_dataframe):
para_crawl/enpt
Config description: Translation dataset from English to pt.
Download size:
608.62 MiB
Dataset size:
2.05 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
8,141,940 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
pt | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'pt')
Examples (tfds.as_dataframe):
para_crawl/enro
Config description: Translation dataset from English to ro.
Download size:
153.24 MiB
Dataset size:
534.34 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,952,043 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'ro': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
ro | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'ro')
Examples (tfds.as_dataframe):
para_crawl/ensk
Config description: Translation dataset from English to sk.
Download size:
96.61 MiB
Dataset size:
352.91 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,591,831 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'sk': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
sk | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'sk')
Examples (tfds.as_dataframe):
para_crawl/ensl
Config description: Translation dataset from English to sl.
Download size:
62.02 MiB
Dataset size:
187.66 MiB
Auto-cached (documentation): Only when
shuffle_files=False
(train)Splits:
Split | Examples |
---|---|
'train' |
660,161 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'sl': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
sl | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'sl')
Examples (tfds.as_dataframe):
para_crawl/ensv
Config description: Translation dataset from English to sv.
Download size:
262.76 MiB
Dataset size:
905.72 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
3,476,729 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'sv': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
Translation | ||||
en | Text | string | ||
sv | Text | string |
Supervised keys (See
as_supervised
doc):('en', 'sv')
Examples (tfds.as_dataframe):