Python urllib2、基本 HTTP 身份验证和 tr.im

我正在尝试编写一些代码来使用 < a href = “ http://www.Programableweb.com/api/tr.im”rel = “ norefrer”> tr.im 缩短 URL 的 API。

在阅读了 http://docs.python.org/library/urllib2.html之后,我试着:

   TRIM_API_URL = 'http://api.tr.im/api'
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm='tr.im',
uri=TRIM_API_URL,
user=USERNAME,
passwd=PASSWORD)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
response = urllib2.urlopen('%s/trim_simple?url=%s'
% (TRIM_API_URL, url_to_trim))
url = response.read().strip()

Code 是200(我认为应该是202) . url 是有效的,但是 基本的 HTTP 身份验证似乎不起作用,因为 缩短的 URL 不在我的 URL 列表中(在 http://tr.im/?page=1)。

在阅读了 < a href = “ http://www.voidspace.org.uk/python/article/entication.shtml # doing-it-”rel = “ norefrer”> http://www.voidspace.org.uk/python/articles/authentication.shtml#doing-it-properly 之后 我也试过:

   TRIM_API_URL = 'api.tr.im/api'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, TRIM_API_URL, USERNAME, PASSWORD)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://%s/trim_simple?url=%s'
% (TRIM_API_URL, url_to_trim))
url = response.read().strip()

但是我得到了相同的结果。(response. code 是200,url 是有效的, 但没有记录在我在 http://tr.im/的帐户上。)

如果我使用查询字符串参数而不是基本的 HTTP 身份验证, 像这样:

   TRIM_API_URL = 'http://api.tr.im/api'
response = urllib2.urlopen('%s/trim_simple?url=%s&username=%s&password=%s'
% (TRIM_API_URL,
url_to_trim,
USERNAME,
PASSWORD))
url = response.read().strip()

... 那么不仅是网址有效,而且它记录在我的装修帐户。 (虽然 response. code 仍然是200。)

不过,我的代码(而不是 tr.im 的 API)肯定有问题,因为

$ curl -u yacitus:xxxx http://api.tr.im/api/trim_url.json?url=http://www.google.co.uk

返回:

{"trimpath":"hfhb","reference":"nH45bftZDWOX0QpVojeDbOvPDnaRaJ","trimmed":"11\/03\/2009","destination":"http:\/\/www.google.co.uk\/","trim_path":"hfhb","domain":"google.co.uk","url":"http:\/\/tr.im\/hfhb","visits":0,"status":{"result":"OK","code":"200","message":"tr.im URL Added."},"date_time":"2009-03-11T10:15:35-04:00"}

... 并且 URL 确实出现在 http://tr.im/?page=1上我的 URL 列表中。

如果我逃跑:

$ curl -u yacitus:xxxx http://api.tr.im/api/trim_url.json?url=http://www.google.co.uk

... 再一次,我得到:

{"trimpath":"hfhb","reference":"nH45bftZDWOX0QpVojeDbOvPDnaRaJ","trimmed":"11\/03\/2009","destination":"http:\/\/www.google.co.uk\/","trim_path":"hfhb","domain":"google.co.uk","url":"http:\/\/tr.im\/hfhb","visits":0,"status":{"result":"OK","code":"201","message":"tr.im URL Already Created [yacitus]."},"date_time":"2009-03-11T10:15:35-04:00"}

注意代码是201,消息是“ tr.im URL Already Created [ yacitus ]”

我肯定没有正确地执行基本的 HTTP 身份验证(无论哪种尝试)。你能发现我的问题吗?也许我应该看看电报里都传了些什么?我以前从没这么做过。有没有我可以使用的 Python API (可能在 pdb 中) ?或者我可以使用其他工具(最好是 Mac OS X) ?

130227 次浏览

Really cheap solution:

urllib.urlopen('http://user:xxxx@api.tr.im/api')

(which you may decide is not suitable for a number of reasons, like security of the url)

Github API example:

>>> import urllib, json
>>> result = urllib.urlopen('https://personal-access-token:x-oauth-basic@api.github.com/repos/:owner/:repo')
>>> r = json.load(result.fp)
>>> result.close()

This seems to work really well (taken from another thread)

import urllib2, base64


request = urllib2.Request("http://api.foursquare.com/v1/user")
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)

Take a look at this SO post answer and also look at this basic authentication tutorial from the urllib2 missing manual.

In order for urllib2 basic authentication to work, the http response must contain HTTP code 401 Unauthorized and a key "WWW-Authenticate" with the value "Basic" otherwise, Python won't send your login info, and you will need to either use Requests, or urllib.urlopen(url) with your login in the url, or add a the header like in @Flowpoke's answer.

You can view your error by putting your urlopen in a try block:

try:
urllib2.urlopen(urllib2.Request(url))
except urllib2.HTTPError, e:
print e.headers
print e.headers.has_key('WWW-Authenticate')

Same solutions as Python urllib2 Basic Auth Problem apply.

see https://stackoverflow.com/a/24048852/1733117; you can subclass urllib2.HTTPBasicAuthHandler to add the Authorization header to each request that matches the known url.

class PreemptiveBasicAuthHandler(urllib2.HTTPBasicAuthHandler):
'''Preemptive basic auth.


Instead of waiting for a 403 to then retry with the credentials,
send the credentials if the url is handled by the password manager.
Note: please use realm=None when calling add_password.'''
def http_request(self, req):
url = req.get_full_url()
realm = None
# this is very similar to the code from retry_http_basic_auth()
# but returns a request object.
user, pw = self.passwd.find_user_password(realm, url)
if pw:
raw = "%s:%s" % (user, pw)
auth = 'Basic %s' % base64.b64encode(raw).strip()
req.add_unredirected_header(self.auth_header, auth)
return req


https_request = http_request

I would suggest that the current solution is to use my package urllib2_prior_auth which solves this pretty nicely (I work on inclusion to the standard lib.

The recommended way is to use requests module:

#!/usr/bin/env python
import requests # $ python -m pip install requests
####from pip._vendor import requests # bundled with python


url = 'https://httpbin.org/hidden-basic-auth/user/passwd'
user, password = 'user', 'passwd'


r = requests.get(url, auth=(user, password)) # send auth unconditionally
r.raise_for_status() # raise an exception if the authentication fails

Here's a single source Python 2/3 compatible urllib2-based variant:

#!/usr/bin/env python
import base64
try:
from urllib.request import Request, urlopen
except ImportError: # Python 2
from urllib2 import Request, urlopen


credentials = '{user}:{password}'.format(**vars()).encode()
urlopen(Request(url, headers={'Authorization': # send auth unconditionally
b'Basic ' + base64.b64encode(credentials)})).close()

Python 3.5+ introduces HTTPPasswordMgrWithPriorAuth() that allows:

..to eliminate unnecessary 401 response handling, or to unconditionally send credentials on the first request in order to communicate with servers that return a 404 response instead of a 401 if the Authorization header is not sent..

#!/usr/bin/env python3
import urllib.request as urllib2


password_manager = urllib2.HTTPPasswordMgrWithPriorAuth()
password_manager.add_password(None, url, user, password,
is_authenticated=True) # to handle 404 variant
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)


opener.open(url).close()

It is easy to replace HTTPBasicAuthHandler() with ProxyBasicAuthHandler() if necessary in this case.