Disclaimer: I am from 42matters, who provides this data already on https://42matters.com/api , feel free to check it out or drop us a line.
As lenik mentioned there are open-source libraries that already help with obtaining some data from GPlay. If you want to build one yourself you can try to parse the Google Play App page, but you should pay attention to the following:
Make sure that you are not doing it too often, Google will throttle and potentially blacklist you if you are doing it too much.
Send a correct User-Agent header to actually show you are a bot
The page of an app is big - make sure you accept gzip and request the mobile version
GPlay website is not an API, it doesn't care that you parse it so it will change over time. Make sure you handle changes - e.g. by having test to make sure you get what you expected.
So that in mind getting one page metadata is a matter of fetching the page html and parsing it properly. With JSoup you can try:
HttpClient httpClient = HttpClientBuilder.create().build();
HttpGet request = new HttpGet(crawlUrl);
HttpResponse rsp = httpClient.execute(request);
int statusCode = rsp.getStatusLine().getStatusCode();
if (statusCode == 200) {
String content = EntityUtils.toString(rsp.getEntity());
Document doc = Jsoup.parse(content);
//parse content, whatever you need
Element price = doc.select("[itemprop=price]").first();
}
For that very simple use case that should get you started. However, the moment you want to do more interesting stuff, things get complicated:
Search is forbidden in robots.
Keeping app metadata up-to-date is hard to do. There are more than 2.2m apps, if you want to refresh their metadata daily there are 2.2 requests/day, which will 1) get blocked immediately, 2) costs a lot of money - pessimistic 220gb data transfer per day if one app is 100k
How do you discover new apps
How do you get pricing in each country, translations of each language
The list goes on. If you don't want to do all this by yourself, you can consider 42matters API, which supports lookup and search, top google charts, advanced queries and filters. And this for 35 languages and more than 50 countries.