amazon_reviews_multi

Referensi:

semua_bahasa

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/all_languages')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Perpecahan :
Membelah Contoh
'test' 30000
'train' 1200000
'validation' 30000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

de

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/de')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Perpecahan :
Membelah Contoh
'test' 5000
'train' 200000
'validation' 5000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/en')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Perpecahan :
Membelah Contoh
'test' 5000
'train' 200000
'validation' 5000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

yaitu

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/es')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Perpecahan :
Membelah Contoh
'test' 5000
'train' 200000
'validation' 5000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

NS

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/fr')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Splits :
Membelah Contoh
'test' 5000
'train' 200000
'validation' 5000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ya

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/ja')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Perpecahan :
Membelah Contoh
'test' 5000
'train' 200000
'validation' 5000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

zh

Gunakan perintah berikut untuk memuat kumpulan data ini di TFDS:

ds = tfds.load('huggingface:amazon_reviews_multi/zh')
  • Keterangan :
We provide an Amazon product reviews dataset for multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Spanish, collected between November 1, 2015 and November 1, 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID and the coarse-grained product category (e.g. ‘books’, ‘appliances’, etc.) The corpus is balanced across stars, so each star rating constitutes 20% of the reviews in each language.

For each language, there are 200,000, 5,000 and 5,000 reviews in the training, development and test sets respectively. The maximum number of reviews per reviewer is 20 and the maximum number of reviews per product is 20. All reviews are truncated after 2,000 characters, and all reviews are at least 20 characters long.

Note that the language of a review does not necessarily match the language of its marketplace (e.g. reviews from amazon.de are primarily written in German, but could also be written in English, etc.). For this reason, we applied a language detection algorithm based on the work in Bojanowski et al. (2017) to determine the language of the review text and we removed reviews that were not written in the expected language.

Selain hak lisensi yang diberikan berdasarkan Ketentuan Penggunaan, Amazon atau penyedia kontennya memberi Anda lisensi terbatas, non-eksklusif, tidak dapat dipindahtangankan, tidak dapat disublisensikan, dan dapat dibatalkan untuk mengakses dan menggunakan Corpus Ulasan untuk tujuan penelitian akademis. Anda tidak boleh menjual kembali, menerbitkan ulang, atau menggunakan Korpus Ulasan atau kontennya secara komersial, termasuk penggunaan Korpus Ulasan untuk penelitian komersial, seperti penelitian terkait dengan kontrak pendanaan atau konsultasi, magang, atau hubungan lain yang hasilnya disediakan dengan biaya atau dikirimkan ke organisasi nirlaba. Anda tidak boleh (a) menautkan atau mengaitkan konten di Korpus Ulasan dengan informasi pribadi apa pun (termasuk akun pelanggan Amazon), atau (b) berupaya menentukan identitas penulis konten apa pun di Korpus Ulasan. Jika Anda melanggar salah satu ketentuan di atas, lisensi Anda untuk mengakses dan menggunakan Review Corpus akan otomatis berakhir tanpa mengurangi hak atau upaya hukum lain yang mungkin dimiliki Amazon.

  • Versi : 1.0.0
  • Perpecahan :
Membelah Contoh
'test' 5000
'train' 200000
'validation' 5000
  • Fitur :
{
    "review_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "reviewer_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "stars": {
        "dtype": "int32",
        "id": null,
        "_type": "Value"
    },
    "review_body": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "review_title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "language": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "product_category": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}