Amazon 数据抓取 API 入门：2025 产品数据提取指南

Amazon 产品数据提取已成为电商业务、市场研究人员和数据分析师的必备能力。无论您是监控竞争对手定价、进行产品研究，还是构建比价工具，可靠地访问 Amazon 庞大的产品目录都至关重要。本综合指南将带您了解使用 Pangolin Amazon Scraping API 高效且大规模提取产品数据所需的一切知识。

为什么 Amazon 产品数据提取很重要

Amazon 在全球多个市场拥有超过 3.5 亿种产品。对于在电商领域运营的企业而言，访问这些数据可提供无价的洞察：

竞争情报：实时跟踪竞争对手的定价策略、产品发布和库存水平
市场研究：识别热门产品，通过评论分析客户情绪，并发现市场空白
动态定价：根据实时市场数据调整您的定价策略，以实现利润最大化
产品选择：根据需求、竞争和盈利指标，做出数据驱动的产品销售决策
库存管理：监控库存水平和供货模式，以优化您自己的库存

然而，手动提取这些数据在规模化时是不切实际的。Amazon 的网站结构复杂，频繁变化，并实施了复杂的反爬虫措施。这正是 Pangolin Amazon Scraping API 发挥无价作用的地方。

了解 Pangolin Amazon Scraping API

Pangolin Amazon Scraping API 是专为 Amazon 数据提取设计的专业级解决方案。与通过基本网页爬虫不同，它处理了 Amazon 基础设施的所有复杂性：

主要功能

99.9% 成功率：先进的反检测技术确保可靠的数据提取
多市场支持：从 Amazon.com、Amazon.co.uk、Amazon.de 和其他 15+ 个市场提取数据
全面的数据字段：访问产品详情、定价、评论、评分、图片、变体等
实时数据：以亚秒级响应时间获取最新的实时信息
可扩展的基础设施：以企业级可靠性处理数百万个请求

入门：前置条件

在深入代码之前，您需要：

Pangolin API 账户：在 tool.pangolinfo.com 注册以获取您的 API 凭据
API Key：从控制台获取您的认证密钥（您将获得 1,000 个免费积分作为开始）
开发环境：Python 3.7+、Node.js 14+ 或任何可以发送 HTTP 请求的语言
基础编程知识：熟悉 REST API 和 JSON 数据结构

身份验证和 API 基础

Pangolin API 使用 Bearer 令牌认证。每个请求都必须在 Authorization 标头中包含您的 API Key。基本结构如下：

curl -X POST "https://scrapeapi.pangolinfo.com/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/PRODUCT_ASIN",
    "parserName": "amzProductDetail",
    "format": "json",
    "bizContext": {
      "zipcode": "10041"
    }
  }'

安全最佳实践

切勿在客户端代码中硬编码您的 API Key 或将其提交到版本控制系统。请使用环境变量或安全密钥管理系统。

提取产品数据：分步指南

1. 基础产品信息提取

让我们从提取基础产品信息开始。最常见的用例是使用 ASIN（Amazon Standard Identification Number）从产品详情页获取数据。

Python 示例：

import requests
import json

# 您的 Pangolin API 凭据
API_KEY = "your_api_key_here"
API_ENDPOINT = "https://scrapeapi.pangolinfo.com/api/v1/scrape"

# 您想要抓取的产品 ASIN
product_asin = "B0DYTF8L2W"
amazon_url = f"https://www.amazon.com/dp/{product_asin}"

# 请求头
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# 请求负载
payload = {
    "url": amazon_url,
    "parserName": "amzProductDetail",
    "format": "json",
    "bizContext": {
        "zipcode": "10041"  # 美国邮编 (Amazon 需要)
    }
}

# 发送 API 请求
response = requests.post(API_ENDPOINT, headers=headers, json=payload)

# 检查请求是否成功
if response.status_code == 200:
    result = response.json()
    
    # 从响应结构中提取产品信息
    if result.get('code') == 0:
        data = result.get('data', {})
        json_data = data.get('json', [{}])[0]
        
        if json_data.get('code') == 0:
            product_results = json_data.get('data', {}).get('results', [])
            
            if product_results:
                product = product_results[0]
                
                print(f"Product Title: {product.get('title')}")
                print(f"Price: {product.get('price')}")
                print(f"Rating: {product.get('star')} stars")
                print(f"Number of Reviews: {product.get('rating')}")
                print(f"Brand: {product.get('brand')}")
                print(f"Sales: {product.get('sales')}")
                
                # 保存到文件
                with open(f'product_{product_asin}.json', 'w') as f:
                    json.dump(product, f, indent=2)
            else:
                print("No product data found")
        else:
            print(f"Parser error: {json_data.get('message')}")
    else:
        print(f"API error: {result.get('message')}")
else:
    print(f"HTTP Error: {response.status_code}")
    print(response.text)

2. 理解响应结构

当您设置 format: "json" 时，Pangolin 会返回如下结构的结构化 JSON 数据：

{
  "code": 0,
  "message": "ok",
  "data": {
    "json": [
      {
        "code": 0,
        "data": {
          "results": [
            {
              "asin": "B0DYTF8L2W",
              "title": "Sweetcrispy Convertible Sectional Sofa Couch...",
              "price": "$599.99",
              "star": "4.4",
              "rating": "22",
              "image": "https://m.media-amazon.com/images/I/...",
              "images": ["https://...", "..."],
              "brand": "Sweetcrispy",
              "description": "Product description...",
              "sales": "50+ bought in past month",
              "seller": "Amazon.com",
              "shipper": "Amazon",
              "merchant_id": "null",
              "color": "Beige",
              "size": "126.77\"W",
              "has_cart": false,
              "otherAsins": ["B0DYTF8XXX"],
              "coupon": "null",
              "category_id": "3733551",
              "category_name": "Sofas & Couches",
              "product_dims": "20.07\"D x 126.77\"W x 24.01\"H",
              "pkg_dims": "20.07\"D x 126.77\"W x 24.01\"H",
              "product_weight": "47.4 Pounds",
              "reviews": {...},
              "customerReviews": "...",
              "first_date": "2024-01-15",
              "deliveryTime": "Dec 15 - Dec 18",
              "additional_details": false
            }
          ]
        },
        "message": "ok"
      }
    ],
    "url": "https://www.amazon.com/dp/B0DYTF8L2W",
    "taskId": "45403c7fd7c148f280d0f4f7284bc9e9"
  }
}

3. 构建价格监控系统

价格监控是 Amazon 数据提取最有价值的应用之一。这是一个完整的示例：

import time
from datetime import datetime
import sqlite3

class AmazonPriceTracker:
    def __init__(self, api_key, db_path='price_history.db'):
        self.api_key = api_key
        self.db_path = db_path
        self.setup_database()
    
    def setup_database(self):
        """Create database table for price history"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS price_history (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                asin TEXT NOT NULL,
                title TEXT,
                price TEXT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        ''')
        conn.commit()
        conn.close()
    
    def track_price(self, asin):
        """Fetch current price and save to database"""
        url = f"https://www.amazon.com/dp/{asin}"
        
        payload = {
            "url": url,
            "parserName": "amzProductDetail",
            "format": "json",
            "bizContext": {"zipcode": "10041"}
        }
        
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }
        
        response = requests.post(API_ENDPOINT, headers=headers, json=payload)
        
        if response.status_code == 200:
            data = response.json()
            product = data.get('data', {}).get('json', [{}])[0].get('data', {}).get('results', [{}])[0]
            
            # Save to database
            conn = sqlite3.connect(self.db_path)
            cursor = conn.cursor()
            cursor.execute('''
                INSERT INTO price_history (asin, title, price)
                VALUES (?, ?, ?)
            ''', (asin, product.get('title'), product.get('price')))
            conn.commit()
            conn.close()
            
            return product
        return None

# Usage
tracker = AmazonPriceTracker(API_KEY)
product = tracker.track_price('B08N5WRWNW')
print(f"Tracked: {product.get('title')} - {product.get('price')}")

最佳实践和优化

速率限制和错误处理

实施适当的速率限制和错误处理可确保可靠的长期运行：

import time
from functools import wraps

def rate_limit(calls_per_second=10):
    """Decorator to rate limit API calls"""
    min_interval = 1.0 / calls_per_second
    last_called = [0.0]
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            left_to_wait = min_interval - elapsed
            
            if left_to_wait > 0:
                time.sleep(left_to_wait)
            
            ret = func(*args, **kwargs)
            last_called[0] = time.time()
            return ret
        return wrapper
    return decorator

@rate_limit(calls_per_second=5)
def scrape_with_safety(asin):
    """Scrape with rate limiting"""
    # Your scraping code here
    pass

总结

Amazon 产品数据提取是一项强大的能力，可以改变您的电商业务战略。借助 Pangolin Amazon Scraping API，您可以访问企业级基础设施，处理所有数据提取的复杂性，从而专注于获取洞察和做出数据驱动的决策。

下一步

注册 Pangolin：在 tool.pangolinfo.com 免费获取您的 API Key
探索文档：访问 docs.pangolinfo.com 获取完整的 API 参考
在 Playground 中测试：尝试交互式 API Playground
加入社区：与其他开发者联系并分享您的用例

Amazon 数据抓取 API 产品数据提取入门指南