我应该使用什么来打开 url 而不是 urllib3中的 urlopen

小开

您不必安装 urllib3 。您可以选择任何适合您需要的 HTTP 请求生成库，并将响应提供给 BeautifulSoup。选择通常是 requests，因为它有丰富的特性集和方便的 API。可以通过在命令行中输入 pip install requests来安装 requests。下面是一个基本的例子:

from bs4 import BeautifulSoup
import requests


url = "url"
response = requests.get(url)


soup = BeautifulSoup(response.content, "html.parser")

小开

Urllib3是与 urllib 和 urllib2不同的库。它对标准库中的 urllib 有很多额外的特性，如果您需要的话，比如重用连接。文档在这里: https://urllib3.readthedocs.org/

如果你想使用 urllib3，你需要使用 pip install urllib3:

from bs4 import BeautifulSoup
import urllib3


http = urllib3.PoolManager()


url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)

小开

新的 Urllib3库有一个很好的文档给你
为了得到你想要的结果，你应该这样做:

Import urllib3
from bs4 import BeautifulSoup


url = 'http://www.thefamouspeople.com/singers.php'


http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))

“解码 utf-8”部分是可选的。当我尝试时它没有工作，但我张贴的选项无论如何。
来源: < a href = “ http://urllib3.readthedocs.io/en/update/User-Guide.html”rel = “ noReferrer”> User Guide

小开

使用西班牙凉菜汤，您可以将页面直接管道化为一个可解析的汤对象:

from gazpacho import Soup
url = "http://www.thefamouspeople.com/singers.php"
soup = Soup.get(url)

然后在上面运行发现:

soup.find("div")

小开

在 urlip3中没有 .urlopen，试试这个:

import requests
html = requests.get(url)

小开

您应该使用 urllib.reurequest，而不是 urllib3。

import urllib.request   # not urllib - important!
urllib.request.urlopen('https://...')