从 Android 应用商店获取数据

我见过一些使用 Android Play 商店中的数据的应用程序和网站。例如,应用程序或排名靠前的网站等。但是你怎样才能得到数据呢?从我能解析的地方吗?

136271 次浏览

The Google Play Store doesn't provide this data, so the sites must just be scraping it.

There's an unofficial open-source API for the Android Market you may try to use to get the information you need. Hope this helps.

Here's a google chrome extension that'll allow you to download your reviews: https://chrome.google.com/webstore/detail/my-play-store-reviews/ldggikfajgoedghjnflfafiiheagngoa?hl=en

Disclaimer: I am from 42matters, who provides this data already on https://42matters.com/api , feel free to check it out or drop us a line.

As lenik mentioned there are open-source libraries that already help with obtaining some data from GPlay. If you want to build one yourself you can try to parse the Google Play App page, but you should pay attention to the following:

  • Make sure the URL you are trying to parse is not blocked in robots.txt - e.g. https://play.google.com/robots.txt
  • Make sure that you are not doing it too often, Google will throttle and potentially blacklist you if you are doing it too much.
  • Send a correct User-Agent header to actually show you are a bot
  • The page of an app is big - make sure you accept gzip and request the mobile version
  • GPlay website is not an API, it doesn't care that you parse it so it will change over time. Make sure you handle changes - e.g. by having test to make sure you get what you expected.

So that in mind getting one page metadata is a matter of fetching the page html and parsing it properly. With JSoup you can try:

      HttpClient httpClient = HttpClientBuilder.create().build();
HttpGet request = new HttpGet(crawlUrl);
HttpResponse rsp = httpClient.execute(request);


int statusCode = rsp.getStatusLine().getStatusCode();


if (statusCode == 200) {
String content = EntityUtils.toString(rsp.getEntity());
Document doc = Jsoup.parse(content);
//parse content, whatever you need
Element price = doc.select("[itemprop=price]").first();
}

For that very simple use case that should get you started. However, the moment you want to do more interesting stuff, things get complicated:

  • Search is forbidden in robots.
  • Keeping app metadata up-to-date is hard to do. There are more than 2.2m apps, if you want to refresh their metadata daily there are 2.2 requests/day, which will 1) get blocked immediately, 2) costs a lot of money - pessimistic 220gb data transfer per day if one app is 100k
  • How do you discover new apps
  • How do you get pricing in each country, translations of each language

The list goes on. If you don't want to do all this by yourself, you can consider 42matters API, which supports lookup and search, top google charts, advanced queries and filters. And this for 35 languages and more than 50 countries.

[2]:

I've coded a small Node.js module to scrape app and list data from Google Play: google-play-scraper

var gplay = require('google-play-scrapper');


gplay.List({
category: gplay.category.GAME_ACTION,
collection: gplay.collection.TOP_FREE,
num: 2
}).then(console.log);

Results:

 [ { url: 'https://play.google.com/store/apps/details?id=com.playappking.busrush',
appId: 'com.playappking.busrush',
title: 'Bus Rush',
developer: 'Play App King',
icon: 'https://lh3.googleusercontent.com/R6hmyJ6ls6wskk5hHFoW02yEyJpSG36il4JBkVf-Aojb1q4ZJ9nrGsx6lwsRtnTqfA=w340',
score: 3.9,
price: '0',
free: false },
{ url: 'https://play.google.com/store/apps/details?id=com.yodo1.crossyroad',
appId: 'com.yodo1.crossyroad',
title: 'Crossy Road',
developer: 'Yodo1 Games',
icon: 'https://lh3.googleusercontent.com/doHqbSPNekdR694M-4rAu9P2B3V6ivff76fqItheZGJiN4NBw6TrxhIxCEpqgO3jKVg=w340',
score: 4.5,
price: '0',
free: false } ]