How to Scrape Shopee Product Reviews

Damasukma T

Damasukma T

· 5 min read
Thumbnail

I have tried various ways to retrieve product review data from Shopee, ranging from using Selenium to Python requests directly to the API. Unfortunately, this approach failed. The reason is quite obvious: Shopee has a very strong anti-bot system.

Some reasons why scraping Shopee is not an easy matter:

  • Shopee uses API parameters that are dynamically generated in the browser engine.
  • Some important API parameters are even encrypted.
  • The Response API is difficult to access without the browser's native JavaScript environment.

In other words, even if we know the endpoint, we still can't access it directly from a normal Python script.

Solution: Use Browser Dev Tools

Finally, I found a simpler and more effective way: using the DevTools (Developer Tools) browser.

Shopee apparently retrieves product review data via API every time a user presses the "Next" button in the review section. Therefore, we can utilize this normal interaction to simulate clicks and record the network traffic that occurs.

Here are the steps:

Step 1: Simulate an Automatic Click on the Review "Next" Button

Open the Shopee product page that you want to capture data from. Then open DevTools Console (usually with Ctrl+Shift+J) and run the following script:

javascript
CopyEdit
let clickCount = 0;

const interval = setInterval(() => {
  const nextBtn = document.querySelector('#sll2-normal-pdp-main > div > div > div > div.container > div.wAMdpk > div > div.page-product__content--left > div:nth-child(2) > div > div > div.CTDZTy.zfuffj > div > div.stardust-tabs-panels > section:nth-child(1) > div > div > div.product-ratings__list > nav > button.shopee-icon-button.shopee-icon-button--right');

  if (nextBtn && !nextBtn.disabled) {
    nextBtn.click();
    clickCount++;
    console.log(`Clicked ${clickCount} times`);
  } else {
    clearInterval(interval);
    console.log('Stopped clicking — button not found or disabled.');
  }
}, 2000);

This script will keep pressing the Next Review button every 2 seconds until the button is disabled. That way, all reviews will load automatically in the browser.

Step 2: Save Network Traffic as HAR File

Once all the reviews are loaded:

  1. Still in DevTools, go to the Network tab.
  2. Right-click on the request list → select Save all as HAR with content.
  3. Save the .har file to your computer.

Step 3: Extract Review Data from HAR with Python

After getting the HAR file, you can extract the reviews using Python:

import os
import json
import base64
import csv

# Folder tempat semua file .har berada
folder = "."
csv_rows = []

# Loop semua file .har di folder
for filename in os.listdir(folder):
    if filename.endswith(".har"):
        print(f"🔍 Memproses: {filename}")
        try:
            with open(os.path.join(folder, filename), 'r', encoding='utf-8') as f:
                har = json.load(f)

            entries = har['log']['entries']
            for entry in entries:
                url = entry['request']['url']
                if "get_ratings" in url:
                    response = entry.get('response', {})
                    content = response.get('content', {})
                    text_base64 = content.get('text')
                    encoding = content.get('encoding', '')

                    if not text_base64:
                        continue

                    try:
                        if encoding == 'base64':
                            text = base64.b64decode(text_base64).decode('utf-8')
                        else:
                            text = text_base64
                        body = json.loads(text)
                    except Exception as e:
                        print(f"[!] Gagal decode di {filename}: {e}")
                        continue

                    if 'data' in body and 'ratings' in body['data']:
                        for rating in body['data']['ratings']:
                            csv_rows.append({
                                "username": rating.get("author_username"),
                                "rating": rating.get("rating_star"),
                                "comment": rating.get("comment"),
                                "variation": rating.get("product_items")[0]["model_name"],
                                "timestamp": rating.get("ctime"),
                            })
                            # print(csv_rows)
        except Exception as e:
            print(f"[!] Error saat memproses {filename}: {e}")

# Simpan ke CSV gabungan
with open('shopee_ratings.csv', 'w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['username', 'rating', 'comment', 'variation', 'timestamp']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv_rows:
        writer.writerow(row)

print(f"\n✅ Total ulasan disimpan: {len(csv_rows)} ke file shopee_ratings.csv")

Conclusion

With this approach, we don't need to bother with bypassing CAPTCHA, API encryption, or headless browser emulation. We just need to click normally like a human user, then collect the network data that has been recorded by the browser.

This method is very useful for light scraping, especially for research, sentiment analysis, or product quality mapping. But make sure you still respect Shopee's ToS, and only use this data for legitimate purposes.

Damasukma T

About Damasukma T

I'm a lifelong learner driven by curiosity across disciplines—from blockchain, philosophy, and mathematics to biology, economics, and beyond. I enjoy exploring how systems work, breaking down complex ideas, and experimenting with emerging technologies through hands-on projects. For me, learning isn't just a phase—it's a habit and a way of life.
Copyright © 2025 . All rights reserved.