如何使用 Python 读取 URL 的内容?

当我把它粘贴到浏览器上时,下面的操作可以正常工作:

http://www.somesite.com/details.pl?urn=2344

但是当我尝试用 Python 阅读 URL 时,什么也没有发生:

 link = 'http://www.somesite.com/details.pl?urn=2344'
f = urllib.urlopen(link)
myfile = f.readline()
print myfile

Do I need to encode the URL, or is there something I'm not seeing?

391735 次浏览

URL 应该是一个字符串:

import urllib


link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.readline()
print myfile

回答你的问题:

import urllib


link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)

你需要的是 read()而不是 readline()

编辑(2018-06-25) : 自 Python 3以来,遗留的 urllib.urlopen()urllib.request.urlopen()取代(详见 https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen的说明)。

如果您正在使用 Python 3,请参阅 Martin Thoma 或 i.n.n.m 在这个问题中给出的答案: Https://stackoverflow.com/a/28040508/158111 (Python 2/3 compat) https://stackoverflow.com/a/45886824/158111 (Python 3)

或者,只需在这里获得这个库: http://docs.python-requests.org/en/latest/并认真使用它:)

import requests


link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)

与 Python 2.X 和 Python 3.X 兼容的解决方案利用了 Python 2和3兼容库 six:

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

我使用了以下代码:

import urllib


def read_text():
quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
contents_file = quotes.read()
print contents_file


read_text()

对于 python3用户,为了节省时间,请使用以下代码,

from urllib.request import urlopen


link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"


f = urlopen(link)
myfile = f.read()
print(myfile)

我知道有不同的线程错误: Name Error: urlopen is not defined,但认为这可能节省时间。

We can read website html content as below :

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)

这些答案对于 Python3来说都不是很好(在本文发表之时已经在最新版本上进行了测试)。

This is how you do it...

import urllib.request


try:
with urllib.request.urlopen('http://www.python.org/') as f:
print(f.read().decode('utf-8'))
except urllib.error.URLError as e:
print(e.reason)

The above is for contents that return 'utf-8'. Remove .decode('utf-8') if you want python to "guess the appropriate encoding."

文件: Https://docs.python.org/3/library/urllib.request.html#module-urllib.request

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.


import sys


if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
url:
data = url.read()


print data


# When the server does not know where the request is coming from.
# Works on python 3.


import urllib.request


user_agent = \
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'


url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}


request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data
# retrieving data from url
# only for python 3


import urllib.request


def main():
url = "http://docs.python.org"


# retrieving data from URL
webUrl = urllib.request.urlopen(url)
print("Result code: " + str(webUrl.getcode()))


# print data from URL
print("Returned data: -----------------")
data = webUrl.read().decode("utf-8")
print(data)


if __name__ == "__main__":
main()
from urllib.request import urlopen


# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)