dominio_rastreo
Organízate con las colecciones
Guarda y clasifica el contenido según tus preferencias.
Referencias:
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:crawl_domain')
Corpus of domain names scraped from Common Crawl and manually annotated to add word boundaries (e.g. "commoncrawl" to "common crawl"). Breaking domain names such as "openresearch" into component words "open" and "research" is important for applications such as Text-to-Speech synthesis and web search. Common Crawl is an open repository of web crawl data that can be accessed and analyzed by anyone. Specifically, we scraped the plaintext (WET) extracts for domain names from URLs that contained diverse letter casing (e.g. "OpenBSD"). Although in the previous example, segmentation is trivial using letter casing, this was not always the case (e.g. "NASA"), so we had to manually annotate the data. The dataset is stored as plaintext file where each line is an example of space separated segments of a domain name. The examples are stored in their original letter casing, but harder and more interesting examples can be generated by lowercasing the input first.
- Licencia : Licencia MIT
- Versión : 1.0.0
- Divisiones :
Dividir | Ejemplos |
---|
'test' | 2170 |
'train' | 17572 |
'validation' | 1953 |
{
"example": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
A menos que se indique lo contrario, el contenido de esta página está sujeto a la licencia Reconocimiento 4.0 de Creative Commons y las muestras de código están sujetas a la licencia Apache 2.0. Para obtener más información, consulta las políticas del sitio web de Google Developers. Java es una marca registrada de Oracle o sus afiliados.
Última actualización: 2024-08-29 (UTC).
[{
"type": "thumb-down",
"id": "missingTheInformationINeed",
"label":"Me falta la información que necesito"
},{
"type": "thumb-down",
"id": "tooComplicatedTooManySteps",
"label":"Es demasiado complicado o hay demasiados pasos"
},{
"type": "thumb-down",
"id": "outOfDate",
"label":"Está obsoleto"
},{
"type": "thumb-down",
"id": "translationIssue",
"label":"Problema de traducción"
},{
"type": "thumb-down",
"id": "samplesCodeIssue",
"label":"Problema de muestras o código"
},{
"type": "thumb-down",
"id": "otherDown",
"label":"Otro"
}]
[{
"type": "thumb-up",
"id": "easyToUnderstand",
"label":"Es fácil de entender"
},{
"type": "thumb-up",
"id": "solvedMyProblem",
"label":"Me ofreció una solución al problema"
},{
"type": "thumb-up",
"id": "otherUp",
"label":"Otro"
}]
{"lastModified": "\u00daltima actualizaci\u00f3n: 2024-08-29 (UTC)."}
[[["Es fácil de entender","easyToUnderstand","thumb-up"],["Me ofreció una solución al problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Me falta la información que necesito","missingTheInformationINeed","thumb-down"],["Es demasiado complicado o hay demasiados pasos","tooComplicatedTooManySteps","thumb-down"],["Está obsoleto","outOfDate","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Problema de muestras o código","samplesCodeIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2024-08-29 (UTC)."],[],[]]