记录来自 python-request 模块的所有请求

我使用的是 python 请求。我需要调试一些 OAuth活动,为此我希望它能记录所有正在执行的请求。我可以通过 ngrep获得这些信息,但不幸的是,不可能对 https 连接进行 grep (这是 OAuth所需要的)

如何激活 Requests访问的所有 URL (+ 参数)的日志记录?

163782 次浏览

底层的 urllib3库记录所有与 logging模块相关的新连接和 URL,但不记录 POST主体。对于 GET请求,这应该足够了:

import logging


logging.basicConfig(level=logging.DEBUG)

它提供了最详细的日志记录选项; 有关如何配置日志记录级别和目标的详细信息,请参阅 伐木

简短演示:

>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

根据 urllib3的确切版本,会记录下列消息:

  • 重定向
  • WARN: 连接池已满(如果发生这种情况,通常会增加连接池的大小)
  • WARN: 未能解析报头(格式无效的响应报头)
  • 重试连接
  • 证书与预期的主机名不匹配
  • WARN: 在处理分块响应时,接收到同时具有 Content-Llength 和 Transfer-Encoding 的响应
  • DEBUG: 新连接(HTTP 或 HTTPS)
  • 连接中断
  • 连接细节: 方法、路径、 HTTP 版本、状态代码和响应长度
  • DEBUG: 重试计数增量

这不包括头部和身体。urllib3使用 http.client.HTTPConnection类来完成这些繁琐的工作,但是这个类不支持日志记录,它通常只能配置为 打印到 stdout。但是,您可以通过在模块中引入另一个 print名称来操纵它将所有调试信息发送到日志记录:

import logging
import http.client


httpclient_logger = logging.getLogger("http.client")


def httpclient_logging_patch(level=logging.DEBUG):
"""Enable HTTPConnection debug logging to the logging framework"""


def httpclient_log(*args):
httpclient_logger.log(level, " ".join(args))


# mask the print() built-in in the http.client module to use
# logging instead
http.client.print = httpclient_log
# enable debugging
http.client.HTTPConnection.debuglevel = 1

调用 httpclient_logging_patch()导致 http.client连接将所有调试信息输出到标准日志记录器,因此由 logging.basicConfig()接收:

>>> httpclient_logging_patch()
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:http.client:send: b'GET /get?foo=bar&baz=python HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:http.client:reply: 'HTTP/1.1 200 OK\r\n'
DEBUG:http.client:header: Date: Tue, 04 Feb 2020 13:36:53 GMT
DEBUG:http.client:header: Content-Type: application/json
DEBUG:http.client:header: Content-Length: 366
DEBUG:http.client:header: Connection: keep-alive
DEBUG:http.client:header: Server: gunicorn/19.9.0
DEBUG:http.client:header: Access-Control-Allow-Origin: *
DEBUG:http.client:header: Access-Control-Allow-Credentials: true
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366

您需要在 httplib级别(requests & rarr; urllib3 & rarr; httplib)启用调试。

下面是一些可以同时切换(..._on()..._off())或暂时打开的函数:

import logging
import contextlib
try:
from http.client import HTTPConnection # py3
except ImportError:
from httplib import HTTPConnection # py2


def debug_requests_on():
'''Switches on logging of the requests module.'''
HTTPConnection.debuglevel = 1


logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True


def debug_requests_off():
'''Switches off logging of the requests module, might be some side-effects'''
HTTPConnection.debuglevel = 0


root_logger = logging.getLogger()
root_logger.setLevel(logging.WARNING)
root_logger.handlers = []
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.WARNING)
requests_log.propagate = False


@contextlib.contextmanager
def debug_requests():
'''Use with 'with'!'''
debug_requests_on()
yield
debug_requests_off()

演示用途:

>>> requests.get('http://httpbin.org/')
<Response [200]>


>>> debug_requests_on()
>>> requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 12150
send: 'GET / HTTP/1.1\r\nHost: httpbin.org\r\nConnection: keep-alive\r\nAccept-
Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.11.1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
...
<Response [200]>


>>> debug_requests_off()
>>> requests.get('http://httpbin.org/')
<Response [200]>


>>> with debug_requests():
...     requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
...
<Response [200]>

您将看到请求,包括头和数据,以及带头但不带数据的响应。唯一缺少的是没有记录的 response. body。

来源

对于那些使用 python 3 + 的用户

import requests
import logging
import http.client


http.client.HTTPConnection.debuglevel = 1


logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

我使用的是 python 3.4,请求2.19.1:

‘ urllib3’是现在要获得的日志记录器(不再是‘ requests.packages.urllib3’)

当尝试让 Python 日志系统(import logging)发出低级别的调试日志消息时,我惊讶地发现:

requests --> urllib3 --> http.client.HTTPConnection

实际上只有 urllib3使用 Python logging系统:

  • requests 没有
  • http.client.HTTPConnection 没有
  • urllib3 是的

当然,您可以通过设置以下命令从 HTTPConnection提取调试消息:

HTTPConnection.debuglevel = 1

但是这些输出仅通过 print语句发出。要证明这一点,只需要抓取 Python 3.7 client.py源代码并自己查看 print 语句(多谢@Yohann) :

curl https://raw.githubusercontent.com/python/cpython/3.7/Lib/http/client.py |grep -A1 debuglevel`

可以推测,以某种方式重定向 stdout 可以将 stdout 放入日志系统,并可能捕获到日志文件。

选择‘ urllib3’日志记录器,而不是‘ requests.packages.urllib3

为了通过 Python 3 logging系统捕获 urllib3调试信息,与互联网上的许多建议相反,正如@MikeSmith 指出的那样,你不会有太多机会拦截:

log = logging.getLogger('requests.packages.urllib3')

相反,你需要:

log = logging.getLogger('urllib3')

urllib3调试为日志文件

下面是一些使用 Python logging系统将 urllib3工作记录到日志文件的代码:

import requests
import logging
from http.client import HTTPConnection  # py3


# log = logging.getLogger('requests.packages.urllib3')  # useless
log = logging.getLogger('urllib3')  # works


log.setLevel(logging.DEBUG)  # needed
fh = logging.FileHandler("requests.log")
log.addHandler(fh)


requests.get('http://httpbin.org/')

结果是:

Starting new HTTP connection (1): httpbin.org:80
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168

启用 HTTPConnection.debuglevel print ()语句

如果你设置 HTTPConnection.debuglevel = 1

from http.client import HTTPConnection  # py3
HTTPConnection.debuglevel = 1
requests.get('http://httpbin.org/')

你会得到 打印语句输出的额外多汁的低级信息:

send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-
requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin
header: Content-Encoding header: Content-Type header: Date header: ...

请记住,此输出使用 print而不是 Python logging系统,因此不能使用传统的 logging流或文件处理程序 (尽管可以通过重定向 stdout 将输出捕获到文件)捕获。

将以上两者结合起来——最大限度地将所有可能的日志记录到控制台

为了最大限度地利用所有可能的日志记录,您必须使用以下命令满足控制台/标准输出的要求:

import requests
import logging
from http.client import HTTPConnection  # py3


log = logging.getLogger('urllib3')
log.setLevel(logging.DEBUG)


# logging from urllib3 to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
log.addHandler(ch)


# print statements from `http.client.HTTPConnection` to console/stdout
HTTPConnection.debuglevel = 1


requests.get('http://httpbin.org/')

提供全面的产出:

Starting new HTTP connection (1): httpbin.org:80
send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin
header: Content-Encoding header: ...

通过应用程序的脚本甚至子系统进行网络协议调试,我们希望了解请求-响应对到底是什么,包括有效的 URL、头、有效负载和状态。通常情况下,将个人请求分布在各个地方是不切实际的。同时,还有一些性能方面的考虑,建议使用单个(或少数几个专门的) requests.Session,因此下面假设遵循 建议

requests支持所谓的 活动挂钩(截至2.23,实际上只有 response挂钩)。它基本上是一个事件侦听器,在从 requests.request返回控制之前发出事件。此时,请求和响应都已完全定义,因此可以进行日志记录。

import logging


import requests




logger = logging.getLogger('httplogger')


def logRoundtrip(response, *args, **kwargs):
extra = {'req': response.request, 'res': response}
logger.debug('HTTP roundtrip', extra=extra)


session = requests.Session()
session.hooks['response'].append(logRoundtrip)

这基本上就是如何记录会话的所有 HTTP 往返过程。

格式化 HTTP 往返日志记录

为了使上面的日志记录有用,可以有专门的 日志格式化程序,它可以理解日志记录上的 reqres附加值。它可以是这样的:

import textwrap


class HttpFormatter(logging.Formatter):


def _formatHeaders(self, d):
return '\n'.join(f'{k}: {v}' for k, v in d.items())


def formatMessage(self, record):
result = super().formatMessage(record)
if record.name == 'httplogger':
result += textwrap.dedent('''
---------------- request ----------------
{req.method} {req.url}
{reqhdrs}


{req.body}
---------------- response ----------------
{res.status_code} {res.reason} {res.url}
{reshdrs}


{res.text}
''').format(
req=record.req,
res=record.res,
reqhdrs=self._formatHeaders(record.req.headers),
reshdrs=self._formatHeaders(record.res.headers),
)


return result


formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logging.basicConfig(level=logging.DEBUG, handlers=[handler])

现在,如果您使用 session执行一些请求,比如:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')

stderr的输出如下所示。

2020-05-14 22:10:13,224 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): httpbin.org:443
2020-05-14 22:10:13,695 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
2020-05-14 22:10:13,698 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/user-agent
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive


None
---------------- response ----------------
200 OK https://httpbin.org/user-agent
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: application/json
Content-Length: 45
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true


{
"user-agent": "python-requests/2.23.0"
}




2020-05-14 22:10:13,814 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
2020-05-14 22:10:13,818 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/status/200
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive


None
---------------- response ----------------
200 OK https://httpbin.org/status/200
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

图形用户界面

当您有很多查询时,拥有一个简单的 UI 和一种过滤记录的方法将非常方便。我将展示如何使用 计时员(我是该文章的作者)。

首先,钩子已被重写,以产生记录,logging可以序列化时,发送电线。它可以是这样的:

def logRoundtrip(response, *args, **kwargs):
extra = {
'req': {
'method': response.request.method,
'url': response.request.url,
'headers': response.request.headers,
'body': response.request.body,
},
'res': {
'code': response.status_code,
'reason': response.reason,
'url': response.url,
'headers': response.headers,
'body': response.text
},
}
logger.debug('HTTP roundtrip', extra=extra)


session = requests.Session()
session.hooks['response'].append(logRoundtrip)

其次,必须调整日志配置以使用 logging.handlers.HTTPHandler(Chronologer 能够理解)。

import logging.handlers


chrono = logging.handlers.HTTPHandler(
'localhost:8080', '/api/v1/record', 'POST', credentials=('logger', ''))
handlers = [logging.StreamHandler(), chrono]
logging.basicConfig(level=logging.DEBUG, handlers=handlers)

最后,运行 Chronologer 实例,例如使用 Docker:

docker run --rm -it -p 8080:8080 -v /tmp/db \
-e CHRONOLOGER_STORAGE_DSN=sqlite:////tmp/db/chrono.sqlite \
-e CHRONOLOGER_SECRET=example \
-e CHRONOLOGER_ROLES="basic-reader query-reader writer" \
saaj/chronologer \
python -m chronologer -e production serve -u www-data -g www-data -m

再次运行请求:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')

流处理程序将产生:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
DEBUG:httplogger:HTTP roundtrip
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
DEBUG:httplogger:HTTP roundtrip

现在,如果你打开 http://localhost:8080/(对于用户名使用“ logger”,对于基本的 auth 弹出窗口使用空密码) ,然后点击“ Open”按钮,你应该看到这样的东西:

Screenshot of Chronologer

我使用一个 logger_config.yaml文件来配置我的日志,要显示这些日志,我所需要做的就是在它的末尾添加一个 disable_existing_loggers: False

我的日志设置是相当广泛和混乱的,所以我甚至不知道在这里解释它的好方法,但如果有人也使用 YAML 文件来配置他们的日志,这可能会有帮助。

Https://docs.python.org/3/howto/logging.html#configuring-logging

只是改进了 这个的答案

对我来说是这样的:

import logging
import sys
import requests
import textwrap
    

root = logging.getLogger('httplogger')




def logRoundtrip(response, *args, **kwargs):
extra = {'req': response.request, 'res': response}
root.debug('HTTP roundtrip', extra=extra)
    



class HttpFormatter(logging.Formatter):


def _formatHeaders(self, d):
return '\n'.join(f'{k}: {v}' for k, v in d.items())


def formatMessage(self, record):
result = super().formatMessage(record)
if record.name == 'httplogger':
result += textwrap.dedent('''
---------------- request ----------------
{req.method} {req.url}
{reqhdrs}


{req.body}
---------------- response ----------------
{res.status_code} {res.reason} {res.url}
{reshdrs}


{res.text}
''').format(
req=record.req,
res=record.res,
reqhdrs=self._formatHeaders(record.req.headers),
reshdrs=self._formatHeaders(record.res.headers),
)


return result


formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
root.addHandler(handler)
root.setLevel(logging.DEBUG)




session = requests.Session()
session.hooks['response'].append(logRoundtrip)
session.get('http://httpbin.org')