在Python中使用HTTP GET的最快方法是什么？

小开

最佳答案

python3：

import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()

python2：

import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()

urllib.request和read的文档。

小开

看看https plb2，它-旁边有很多非常有用的功能-提供了你想要的东西。

import httplib2


resp, content = httplib2.Http().request("http://example.com/foo/bar")

其中内容将是响应主体（作为字符串），resp将包含状态和响应标头。

虽然它没有包含在标准的python安装中（但它只需要标准的python），但它绝对值得一看。

小开

如果您希望使用http 2的解决方案是oneliner，请考虑实例化匿名Http对象

import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")

小开

这是一个Python中的wget脚本：

# From python cookbook, 2nd edition, page 487
import sys, urllib


def reporthook(a, b, c):
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print

小开

theller的wget解决方案非常有用，但是，我发现它不会在整个下载过程中打印出进度。如果您在报告中的打印语句后添加一行，那就太完美了。

import sys, urllib


def reporthook(a, b, c):
print "% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c),
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print

小开

使用要求库：

import requests
r = requests.get("http://example.com/foo/bar")

然后你可以做这样的事情：

>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content)  # bytes
>>> print(r.text)     # r.content as str

通过运行以下命令安装请求：

pip install requests

小开

如果您专门使用HTTP API，还有更方便的选择，例如Nap。

例如，以下是自2014年5月1日以来如何从Github获取gists：

from nap.url import Url
api = Url('https://api.github.com')


gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())

小开

优秀的解决方案宣，塞勒。

为了让它与python 3一起工作，请进行以下更改

import sys, urllib.request


def reporthook(a, b, c):
print ("% 3.1f%% of %d bytes\r" % (min(100, float(a * b) / c * 100), c))
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print (url, "->", file)
urllib.request.urlretrieve(url, file, reporthook)
print

此外，您输入的URL前面应加上“超文本传输协议：//”，否则会返回未知的url类型错误。

小开

如果没有进一步的必要导入，这个解决方案可以工作（对我来说）-也可以使用https：

try:
import urllib2 as urlreq # Python 2.x
except:
import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()

当没有在标头信息中指定“User-Agent”时，我经常难以抓取内容。然后通常用urllib2.HTTPError: HTTP Error 403: Forbidden或urllib.error.HTTPError: HTTP Error 403: Forbidden之类的东西取消请求。

小开

如何发送标题

python3：

import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
"https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

python2：

import urllib2
contents = urllib2.urlopen(urllib2.Request(
"https://api.github.com",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)

小开

使用强大的urllib3库很简单。

像这样导入：

import urllib3


http = urllib3.PoolManager()

并提出这样的请求：

response = http.request('GET', 'https://example.com')


print(response.data) # Raw data.
print(response.data.decode('utf-8')) # Text.
print(response.status) # Status code.
print(response.headers['Content-Type']) # Content type.

您也可以添加标题：

response = http.request('GET', 'https://example.com', headers={
'key1': 'value1',
'key2': 'value2'
})

更多信息可以在urllib3留档上找到。

urllib3比内置的urllib.request或http模块更安全，更易于使用，并且稳定。

小开

实际上在Python中，我们可以像从文件一样从HTTP响应中读取，这里有一个从API读取JSON的示例。

import json
from urllib.request import urlopen


with urlopen(url) as f:
resp = json.load(f)


return resp['some_key']

小开

对于python >= 3.6，您可以使用dload：

import dload
t = dload.text(url)

对于json：

j = dload.json(url)

安装：
pip install dload

小开

如果您想要更低级别的API：

import http.client


conn = http.client.HTTPSConnection('example.com')
conn.request('GET', '/')


resp = conn.getresponse()
content = resp.read()


conn.close()


text = content.decode('utf-8')


print(text)