给定文本文件的 URL，读取文本文件内容的最简单方法是什么？

小开

import urllib2


f = urllib2.urlopen(target_url)
for l in f.readlines():
print l

小开

import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
print line

小开

最佳答案

编辑09/2016: 在 Python 3中使用 Urllib 请求代替 urllib2

实际上最简单的方法是:

import urllib2  # the lib that handles the url stuff


data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
print line

正如 Will 建议的那样，你甚至不需要“ readlines”，你甚至可以把它缩短为: ^*

import urllib2


for line in urllib2.urlopen(target_url):
print line

但请记住，在 Python 中，可读性很重要。

然而，这是最简单的方法，但不是安全的方法，因为在网络编程的大部分时间里，您不知道期望的数据量是否会得到尊重。因此，通常情况下，你最好阅读固定且合理数量的数据，你知道这些数据对于你所期望的数据来说已经足够了，但可以防止你的脚本被淹没:

import urllib2


data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines


for line in data:
print line

^{* Python 3中的第二个例子:}

import urllib.request  # the lib that handles the url stuff


for line in urllib.request.urlopen(target_url):
print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

小开

实际上没有必要一行一行地阅读，你可以看到这样的全部内容:

import urllib
txt = urllib.urlopen(target_url).read()

小开

我是 Python 的新手，在公认的解决方案中对 巨蟒3的随意评论令人困惑。对于子孙后代，在 Python3中执行此操作的代码是

import urllib.request
data = urllib.request.urlopen(target_url)


for line in data:
...

或者选择

from urllib.request import urlopen
data = urlopen(target_url)

请注意，仅仅 import urllib是不起作用的。

小开

Python3中的另一种方法是使用 Urllib3包。

import urllib3


http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

这是一个比 urllib 更好的选择，因为 urllib3声称拥有

螺纹安全。

连接池。

客户端 SSL/TLS 验证。

使用多部分编码的文件上传。

用于重试请求和处理 HTTP 重定向的帮助器。

支持 gzip 和 flate 编码。

对 HTTP 和 SOCKS 的代理支持。

100% 的测试覆盖率。

小开

请求库有一个更简单的接口，可以同时使用 Python2和3。

import requests


response = requests.get(target_url)
data = response.text

小开

对我来说，以上的回答都不是直接有效的，相反，我必须做以下的事情(Python 3) :

from urllib.request import urlopen


data = urlopen("[your url goes here]").read().decode('utf-8')


# Do what you need to do with the data.

小开

这里只是更新了@ken-kinder 为 Python2提出的与 Python3协同工作的解决方案:

import urllib
urllib.request.urlopen(target_url).read()

小开

Request 包对于简单的 UI 非常有效就像@Andrew Mao 建议的那样

import requests
response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
data = response.text
for i, line in enumerate(data.split('\n')):
print(f'{i}   {line}')

返回文章页面

0    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
1    prices and the demand for clean air', J. Environ. Economics & Management,
2    vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
3    ...', Wiley, 1980.   N.B. Various transformations are used in the table on
4    pages 244-261 of the latter.
5
6    Variables in order:

在如何从 URL 中提取数据集/数据框上检查 Kaggle 笔记本

小开

我确实认为 requests是最好的选择。还要注意手动设置编码的可能性。

import requests
response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
# response.encoding = "utf-8"
hehe = response.text

小开

您可以使用这个，也可以用于简单的方法:

import requests
url_res = requests.get(url= "http://www.myhost.com/SomeFile.txt")
with open(filename + ".txt", "wb") as file:
file.write(url_res.content)