在 Python 中向给定 URL 添加参数

假设给我一个 URL。
它可能已经有 GET 参数(例如 http://example.com/search?q=question) ,也可能没有(例如 http://example.com/)。

现在我需要给它添加一些参数,比如 {'lang':'en','tag':'python'}。在第一种情况下,我将得到 http://example.com/search?q=question&lang=en&tag=python,在第二种情况下,我将得到 http://example.com/search?lang=en&tag=python

有什么标准的方法吗?

177756 次浏览

是: 使用 Urllib

根据文件中的 例子:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.geturl() # Prints the final URL with parameters.
>>> print f.read() # Prints the contents

使用各种 urlparse函数拆分现有的 URL,在组合字典上使用 urllib.urlencode(),然后使用 urlparse.urlunparse()将它们重新组合在一起。

或者只是获取 urllib.urlencode()的结果并将其适当地连接到 URL。

如果字符串可以包含任意数据,则需要使用 URL 编码(例如,需要对与符号、斜杠等字符进行编码)。

查看 urllib.urlencode:

>>> import urllib
>>> urllib.urlencode({'lang':'en','tag':'python'})
'lang=en&tag=python'

在 python3中:

from urllib import parse
parse.urlencode({'lang':'en','tag':'python'})

urlliburlparse模块有一些怪异之处,下面是一个工作示例:

try:
import urlparse
from urllib import urlencode
except: # For Python 3
import urllib.parse as urlparse
from urllib.parse import urlencode


url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}


url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)


url_parts[4] = urlencode(query)


print(urlparse.urlunparse(url_parts))

ParseResulturlparse()是只读的的结果,我们需要把它转换成一个 list之前,我们可以尝试修改其数据。

在 python 2.5中

import cgi
import urllib
import urlparse


def add_url_param(url, **params):
n=3
parts = list(urlparse.urlsplit(url))
d = dict(cgi.parse_qsl(parts[n])) # use cgi.parse_qs for list values
d.update(params)
parts[n]=urllib.urlencode(d)
return urlparse.urlunsplit(parts)


url = "http://stackoverflow.com/search?q=question"
add_url_param(url, lang='en') == "http://stackoverflow.com/search?q=question&lang=en"

我喜欢 ukasz 版本,但是由于 urllib 和 urllparse 函数在这种情况下使用起来有些笨拙,所以我认为这样做更直接:

params = urllib.urlencode(params)


if urlparse.urlparse(url)[4]:
print url + '&' + params
else:
print url + '?' + params

下面是我如何实现它的。

import urllib


params = urllib.urlencode({'lang':'en','tag':'python'})
url = ''
if request.GET:
url = request.url + '&' + params
else:
url = request.url + '?' + params

非常有效。然而,我希望用一种更干净的方式来实现这一点。

实现上述内容的另一种方法是将其放在一个方法中。

import urllib


def add_url_param(request, **params):
new_url = ''
_params = dict(**params)
_params = urllib.urlencode(_params)


if _params:
if request.GET:
new_url = request.url + '&' + _params
else:
new_url = request.url + '?' + _params
else:
new_url = request.url


return new_ur

还有一个答案:

def addGetParameters(url, newParams):
(scheme, netloc, path, params, query, fragment) = urlparse.urlparse(url)
queryList = urlparse.parse_qsl(query, keep_blank_values=True)
for key in newParams:
queryList.append((key, newParams[key]))
return urlparse.urlunparse((scheme, netloc, path, params, urllib.urlencode(queryList), fragment))

您还可以使用 furl 模块 https://github.com/gruns/furl

>>> from furl import furl
>>> print furl('http://example.com/search?q=question').add({'lang':'en','tag':'python'}).url
http://example.com/search?q=question&lang=en&tag=python

为什么

我对这个页面(拜托,我们最喜欢的复制粘贴的东西呢?)上的所有解决方案都不满意,所以我根据这里的答案写了自己的解决方案。它试图变得更完整,更 Python 化。我已经在参数中为 迪克特布尔值添加了一个处理程序,以便对消费者端(JS)更友好,但它们是可选的,您可以删除它们。

它是如何工作的

测试1: 添加新参数,处理 Ararray 和 Bool 值:

url = 'http://stackoverflow.com/test'
new_params = {'answers': False, 'data': ['some','values']}


add_url_params(url, new_params) == \
'http://stackoverflow.com/test?data=some&data=values&answers=false'

测试2: 重写现有的参数,处理 DICT 值:

url = 'http://stackoverflow.com/test/?question=false'
new_params = {'question': {'__X__':'__Y__'}}


add_url_params(url, new_params) == \
'http://stackoverflow.com/test/?question=%7B%22__X__%22%3A+%22__Y__%22%7D'

说话很容易,告诉我密码。

代码本身,我试图详细描述它:

from json import dumps


try:
from urllib import urlencode, unquote
from urlparse import urlparse, parse_qsl, ParseResult
except ImportError:
# Python 3 fallback
from urllib.parse import (
urlencode, unquote, urlparse, parse_qsl, ParseResult
)




def add_url_params(url, params):
""" Add GET params to provided URL being aware of existing.


:param url: string of target URL
:param params: dict containing requested params to be added
:return: string with updated URL


>> url = 'http://stackoverflow.com/test?answers=true'
>> new_params = {'answers': False, 'data': ['some','values']}
>> add_url_params(url, new_params)
'http://stackoverflow.com/test?data=some&data=values&answers=false'
"""
# Unquoting URL first so we don't loose existing args
url = unquote(url)
# Extracting url info
parsed_url = urlparse(url)
# Extracting URL arguments from parsed URL
get_args = parsed_url.query
# Converting URL arguments to dict
parsed_get_args = dict(parse_qsl(get_args))
# Merging URL arguments dict with new params
parsed_get_args.update(params)


# Bool and Dict values should be converted to json-friendly values
# you may throw this part away if you don't like it :)
parsed_get_args.update(
{k: dumps(v) for k, v in parsed_get_args.items()
if isinstance(v, (bool, dict))}
)


# Converting URL argument to proper query string
encoded_get_args = urlencode(parsed_get_args, doseq=True)
# Creating new parsed result object based on provided with new
# URL arguments. Same thing happens inside of urlparse.
new_url = ParseResult(
parsed_url.scheme, parsed_url.netloc, parsed_url.path,
parsed_url.params, encoded_get_args, parsed_url.fragment
).geturl()


return new_url

请注意,可能会有一些问题,如果你会找到一个请让我知道,我们会让这件事情更好

基于 这个的答案,简单情况下的一行程序(Python 3代码) :

from urllib.parse import urlparse, urlencode




url = "https://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}


url += ('&' if urlparse(url).query else '?') + urlencode(params)

或:

url += ('&', '?')[urlparse(url).query == ''] + urlencode(params)

如果你正在使用 要求自由:

import requests
...
params = {'tag': 'python'}
requests.get(url, params=params)

把它外包给经过实战测试的 请求库

我会这么做:

from requests.models import PreparedRequest
url = 'http://example.com/search?q=question'
params = {'lang':'en','tag':'python'}
req = PreparedRequest()
req.prepare_url(url, params)
print(req.url)

我觉得这比前两个答案更优雅:

from urllib.parse import urlencode, urlparse, parse_qs


def merge_url_query_params(url: str, additional_params: dict) -> str:
url_components = urlparse(url)
original_params = parse_qs(url_components.query)
# Before Python 3.5 you could update original_params with
# additional_params, but here all the variables are immutable.
merged_params = {**original_params, **additional_params}
updated_query = urlencode(merged_params, doseq=True)
# _replace() is how you can create a new NamedTuple with a changed field
return url_components._replace(query=updated_query).geturl()


assert merge_url_query_params(
'http://example.com/search?q=question',
{'lang':'en','tag':'python'},
) == 'http://example.com/search?q=question&lang=en&tag=python'

最重要的事情我不喜欢在顶部的答案(他们仍然是好的) :

  • Ukasz: 必须记住 URL 组件中 query所在的索引
  • Saphire64: 创建更新的 ParseResult的非常详细的方法

我的反应不好的地方在于使用解压缩进行的 dict合并看起来很神奇,但是我更喜欢更新一个已经存在的字典,因为我对可变性有偏见。

python3,不言而喻

from urllib.parse import urlparse, urlencode, parse_qsl


url = 'https://www.linkedin.com/jobs/search?keywords=engineer'


parsed = urlparse(url)
current_params = dict(parse_qsl(parsed.query))
new_params = {'location': 'United States'}
merged_params = urlencode({**current_params, **new_params})
parsed = parsed._replace(query=merged_params)


print(parsed.geturl())
# https://www.linkedin.com/jobs/search?keywords=engineer&location=United+States