是否有一个 Amazon.com API 来检索产品评论?

是否有任何 AWS API/服务提供访问 Amazon 销售的产品评论的权限?我对通过(ASIN,user _ id) tuple 查找评论很感兴趣。我可以看到 Product Advertising API 返回一个包含 URL 的页面的 URL (用于嵌入 IFRAME) ,但是如果可能的话,我对评论数据的机器可读格式感兴趣。

105384 次浏览

Update 2:

Please see @jpillora's comment. It's probably the most relevant regarding Update 1.

I just tried out the Product Advertising API (as of 2014-09-17), it seems that this API only returns a URL pointing to an iframe containing just the reviews. I guess you'd have to screen scrape - though I imagine that would break Amazon's TOS.

Update 1:

Maybe. I wrote the original answer below earlier. I don't have time to look into this right now because I'm no longer on a project concerned with Amazon reviews, but their webpage at Product Advertising API states "The Product Advertising API helps you advertise Amazon products using product search and look up capability, product information and features such as Customer Reviews..." as of 2011-12-08. So I hope someone looks into it and posts back here; feel free to edit this answer.

Original:

Nope.

Here is an intersting forum discussion about the fact including theories as to why: http://forums.digitalpoint.com/showthread.php?t=1932326

If I'm wrong, please post what you find. I'm interested in getting the reviews content, as well as allowing submitting reviews to Amazon, if possible.

You might want to check this link: http://reviewazon.com/. I just stumbled across it and haven't looked into it, but I'm surprised I don't see any mention on their site about the update concerning the drop of Reviews from the Amazon Products Advertising API posted at: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

Unfortunately you can only get an iframe URL with the reviews, the content itself is not accessible.

Source: http://docs.amazonwebservices.com/AWSECommerceService/2011-08-01/DG/CHAP_MotivatingCustomerstoBuy.html#GettingCustomerReviews

Here's my quick take on it - you easily can retrieve the reviews themselves with a bit more work:

countries=['com','co.uk','ca','de']
books=[
'''http://www.amazon.%s/Glass-House-Climate-Millennium-ebook/dp/B005U3U69C''',
'''http://www.amazon.%s/The-Japanese-Observer-ebook/dp/B0078FMYD6''',
'''http://www.amazon.%s/Falling-Through-Water-ebook/dp/B009VJ1622''',
]
import urllib2;
for book in books:
print '-'*40
print book.split('%s/')[1]
for country in countries:
asin=book.split('/')[-1]; title=book.split('/')[3]
url='''http://www.amazon.%s/product-reviews/%s'''%(country,asin)
try: f = urllib2.urlopen(url)
except: page=""
page=f.read().lower(); print '%s=%s'%(country, page.count('member-review'))
print '-'*40

According to Amazon Product Advertizing API License Agreement (https://affiliate-program.amazon.com/gp/advertising/api/detail/agreement.html) and specifically it's point 4.b.iii:

You will use Product Advertising Content only ... to send end users to and drive sales on the Amazon Site.

which means it's prohibited for you to show Amazon products reviews taken trough their API to sale products at your site. It's only allowed to redirect your site visitors to Amazon and get the affiliate commissions.

I would use something like the answer of @mfs above. Unfortunately, his/her answer would only work for up to 10 reviews, since this is the maximum that can be displayed on one page.

You may consider the following code:

import requests


nreviews_re = {'com': re.compile('\d[\d,]+(?= customer review)'),
'co.uk':re.compile('\d[\d,]+(?= customer review)'),
'de': re.compile('\d[\d\.]+(?= Kundenrezens\w\w)')}
no_reviews_re = {'com': re.compile('no customer reviews'),
'co.uk':re.compile('no customer reviews'),
'de': re.compile('Noch keine Kundenrezensionen')}


def get_number_of_reviews(asin, country='com'):
url = 'http://www.amazon.{country}/product-reviews/{asin}'.format(country=country, asin=asin)
html = requests.get(url).text
try:
return int(re.compile('\D').sub('',nreviews_re[country].search(html).group(0)))
except:
if no_reviews_re[country].search(html):
return 0
else:
return None  # to distinguish from 0, and handle more cases if necessary

Running this with 1433524767 (which has significantly different number of reviews for the three countries of interest) I get:

>> print get_number_of_reviews('1433524767', 'com')
3185
>> print get_number_of_reviews('1433524767', 'co.uk')
378
>> print get_number_of_reviews('1433524767', 'de')
16

Hope it helps

You can use Amazon Product Advertising API. It has a Response Group 'Reviews' which you can use with operation 'ItemLookup'. You need to know ASIN i.e. unique item id of the product.

Once you set all the parameters and execute the signed URL, you will receive an XML which contains a link to customer reviews under "IFrameURL" tag.

Use this URL and use pattern searching in html returned from this url to extract the reviews. For each review in the html, there will be a unique review id and under that you can get all the data for that particular review.

As said by others above, amazon has discontinued providing reviews in its api. Howevever, i found this nice tutorial to do the same with python. Here is the code he gives, works for me! He uses python 2.7

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Written as part of https://www.scrapehero.com/how-to-scrape-amazon-product-reviews-using-python/
from lxml import html
import json
import requests
import json,re
from dateutil import parser as dateparser
from time import sleep


def ParseReviews(asin):
#This script has only been tested with Amazon.com
amazon_url  = 'http://www.amazon.com/dp/'+asin
# Add some recent user agent to prevent amazon from blocking the request
# Find some chrome user agent strings  here https://udger.com/resources/ua-list/browser-detail?browser=Chrome
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(amazon_url,headers = headers).text


parser = html.fromstring(page)
XPATH_AGGREGATE = '//span[@id="acrCustomerReviewText"]'
XPATH_REVIEW_SECTION = '//div[@id="revMHRL"]/div'
XPATH_AGGREGATE_RATING = '//table[@id="histogramTable"]//tr'
XPATH_PRODUCT_NAME = '//h1//span[@id="productTitle"]//text()'
XPATH_PRODUCT_PRICE  = '//span[@id="priceblock_ourprice"]/text()'


raw_product_price = parser.xpath(XPATH_PRODUCT_PRICE)
product_price = ''.join(raw_product_price).replace(',','')


raw_product_name = parser.xpath(XPATH_PRODUCT_NAME)
product_name = ''.join(raw_product_name).strip()
total_ratings  = parser.xpath(XPATH_AGGREGATE_RATING)
reviews = parser.xpath(XPATH_REVIEW_SECTION)


ratings_dict = {}
reviews_list = []


#grabing the rating  section in product page
for ratings in total_ratings:
extracted_rating = ratings.xpath('./td//a//text()')
if extracted_rating:
rating_key = extracted_rating[0]
raw_raing_value = extracted_rating[1]
rating_value = raw_raing_value
if rating_key:
ratings_dict.update({rating_key:rating_value})


#Parsing individual reviews
for review in reviews:
XPATH_RATING  ='./div//div//i//text()'
XPATH_REVIEW_HEADER = './div//div//span[contains(@class,"text-bold")]//text()'
XPATH_REVIEW_POSTED_DATE = './/a[contains(@href,"/profile/")]/parent::span/following-sibling::span/text()'
XPATH_REVIEW_TEXT_1 = './/div//span[@class="MHRHead"]//text()'
XPATH_REVIEW_TEXT_2 = './/div//span[@data-action="columnbalancing-showfullreview"]/@data-columnbalancing-showfullreview'
XPATH_REVIEW_COMMENTS = './/a[contains(@class,"commentStripe")]/text()'
XPATH_AUTHOR  = './/a[contains(@href,"/profile/")]/parent::span//text()'
XPATH_REVIEW_TEXT_3  = './/div[contains(@id,"dpReviews")]/div/text()'
raw_review_author = review.xpath(XPATH_AUTHOR)
raw_review_rating = review.xpath(XPATH_RATING)
raw_review_header = review.xpath(XPATH_REVIEW_HEADER)
raw_review_posted_date = review.xpath(XPATH_REVIEW_POSTED_DATE)
raw_review_text1 = review.xpath(XPATH_REVIEW_TEXT_1)
raw_review_text2 = review.xpath(XPATH_REVIEW_TEXT_2)
raw_review_text3 = review.xpath(XPATH_REVIEW_TEXT_3)


author = ' '.join(' '.join(raw_review_author).split()).strip('By')


#cleaning data
review_rating = ''.join(raw_review_rating).replace('out of 5 stars','')
review_header = ' '.join(' '.join(raw_review_header).split())
review_posted_date = dateparser.parse(''.join(raw_review_posted_date)).strftime('%d %b %Y')
review_text = ' '.join(' '.join(raw_review_text1).split())


#grabbing hidden comments if present
if raw_review_text2:
json_loaded_review_data = json.loads(raw_review_text2[0])
json_loaded_review_data_text = json_loaded_review_data['rest']
cleaned_json_loaded_review_data_text = re.sub('<.*?>','',json_loaded_review_data_text)
full_review_text = review_text+cleaned_json_loaded_review_data_text
else:
full_review_text = review_text
if not raw_review_text1:
full_review_text = ' '.join(' '.join(raw_review_text3).split())


raw_review_comments = review.xpath(XPATH_REVIEW_COMMENTS)
review_comments = ''.join(raw_review_comments)
review_comments = re.sub('[A-Za-z]','',review_comments).strip()
review_dict = {
'review_comment_count':review_comments,
'review_text':full_review_text,
'review_posted_date':review_posted_date,
'review_header':review_header,
'review_rating':review_rating,
'review_author':author


}
reviews_list.append(review_dict)


data = {
'ratings':ratings_dict,
'reviews':reviews_list,
'url':amazon_url,
'price':product_price,
'name':product_name
}
return data




def ReadAsin():
#Add your own ASINs here
AsinList = ['B01ETPUQ6E','B017HW9DEW']
extracted_data = []
for asin in AsinList:
print "Downloading and processing page http://www.amazon.com/dp/"+asin
extracted_data.append(ParseReviews(asin))
sleep(5)
f=open('data.json','w')
json.dump(extracted_data,f,indent=4)


if __name__ == '__main__':
ReadAsin()

Here, is the link to his website reviews scraping with python 2.7

Checkout RapidAPI: https://rapidapi.com/blog/amazon-product-reviews-api/ By using this API we can get Amazon product reviews.