如何使用请求下载图像

我试图使用python的requests模块从web下载并保存一张图像。

下面是我使用的(工作)代码:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
f.write(img.read())

下面是使用requests的新(无效)代码:

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
img = r.raw.read()
with open(path, 'w') as f:
f.write(img)

你能帮助我从响应中使用requests的哪个属性吗?

581269 次浏览

你可以使用response.raw文件对象,也可以遍历响应。

默认情况下,使用response.raw类文件对象不会解码压缩后的响应(使用GZIP或deflate)。无论如何,你可以通过将decode_content属性设置为True (requests将其设置为False以控制解码本身)来强制它为你解压。然后,您可以使用shutil.copyfileobj()让Python将数据流传输到文件对象:

import requests
import shutil


r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)

要遍历响应,请使用循环;这样的迭代确保数据在此阶段解压缩:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)

这将读取128字节的数据块;如果你觉得另一个块大小更好,使用自定义块大小的Response.iter_content()方法:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)

注意,您需要以二进制模式打开目标文件,以确保python不会尝试为您翻译换行符。我们还设置了stream=True,以便requests不会先将整个图像下载到内存中。

从请求中获取一个类似文件的对象,并将其复制到文件中。这也将避免将整个内容一次性读入内存。

import shutil


import requests


url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response

我同样需要使用请求下载图像。我首先尝试了Martijn Pieters的答案,效果很好。但是,当我对这个简单的函数进行分析时,我发现与urlliburllib2相比,它使用了太多的函数调用。

然后我尝试了请求模块作者的推荐的方法:

import requests
from PIL import Image
# python2.x, use this instead
# from StringIO import StringIO
# for python3.x,
from io import StringIO


r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))
这大大减少了函数调用的数量,从而加快了我的应用程序。 下面是我的分析器的代码和结果
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile


def testRequest():
image_name = 'test1.jpg'
url = 'http://example.com/image.jpg'


r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)


def testRequest2():
image_name = 'test2.jpg'
url = 'http://example.com/image.jpg'


r = requests.get(url)
    

i = Image.open(StringIO(r.content))
i.save(image_name)


if __name__ == '__main__':
profile.run('testUrllib()')
profile.run('testUrllib2()')
profile.run('testRequest()')

testRequest的结果:

343080 function calls (343068 primitive calls) in 2.580 seconds

testRequest2的结果:

3129 function calls (3105 primitive calls) in 0.024 seconds

这个怎么样,一个快速的解决方案。

import requests


url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
f.write(response.content)

这里有一个更友好的答案,仍然使用流媒体。

只需定义这些函数并调用getImage()。默认情况下,它将使用与url相同的文件名并写入当前目录,但两者都可以更改。

import requests
from StringIO import StringIO
from PIL import Image


def createFilename(url, name, folder):
dotSplit = url.split('.')
if name == None:
# use the same as the url
slashSplit = dotSplit[-2].split('/')
name = slashSplit[-1]
ext = dotSplit[-1]
file = '{}{}.{}'.format(folder, name, ext)
return file


def getImage(url, name=None, folder='./'):
file = createFilename(url, name, folder)
with open(file, 'wb') as f:
r = requests.get(url, stream=True)
for block in r.iter_content(1024):
if not block:
break
f.write(block)


def getImageFast(url, name=None, folder='./'):
file = createFilename(url, name, folder)
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(file)


if __name__ == '__main__':
# Uses Less Memory
getImage('http://www.example.com/image.jpg')
# Faster
getImageFast('http://www.example.com/image.jpg')

getImage()request核心基于答案在这里,而getImageFast()request核心基于答案以上

这可能比使用requests更容易。这是我唯一一次建议不要使用requests来做HTTP的事情。

使用urllib的两个内衬:

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

还有一个名为wget的很好的Python模块,非常容易使用。发现在这里

这说明了设计的简单性:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

享受。

你也可以添加一个out参数来指定路径。

>>> out_filepath = <output_filepath>
>>> filename = wget.download(url, out=out_filepath)

我将发布一个答案,因为我没有足够的代表来发表评论,但使用Blairg23发布的wget,您还可以为路径提供一个out参数。

 wget.download(url, out=path)

主要有两种方式:

  1. 使用.content(最简单/官方)(参见张振义的回答):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
    with Image.open(f) as img:
    img.show()
    
  2. Using .raw (see Martijn Pieters's answer):

    import requests
    
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
    img.show()
    r.close()  # Safety when stream=True ensure the connection is released.
    

Timing both shows no noticeable difference.

下面的代码片段下载一个文件。

该文件以其文件名保存为指定的url。

import requests


url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)


if r.status_code == 200:
with open(filename, 'wb') as f:
f.write(r.content)

如导入图像和请求一样简单

from PIL import Image
import requests


img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

你可以这样做:

import requests
import random


url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
with open(filename,'w') as f:
f.write(response.content)

这是谷歌搜索如何下载带有请求的二进制文件时出现的第一个响应。如果你需要下载带有请求的任意文件,你可以使用:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

我是这么做的

import requests
from PIL import Image
from io import BytesIO


url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)


img = Image.open(BytesIO(response.content))
img.show()

我的方法是使用回应。内容(blob)并以二进制模式保存到文件中

img_blob = requests.get(url, timeout=5).content
with open(destination + '/' + title, 'wb') as img_file:
img_file.write(img_blob)

看看我的python项目,它根据关键字从unsplash.com下载图像。

下载图像

import requests
Picture_request = requests.get(url)

同意Blairg23的观点,使用urllib.request.urlretrieve是最简单的解决方案之一。

这里我想指出一点。有时它不会下载任何东西,因为请求是通过脚本(bot)发送的,如果你想解析来自谷歌图像或其他搜索引擎的图像,你需要先传递user-agent来请求headers,然后再下载图像,否则,请求将被阻塞并抛出错误。

传递user-agent并下载图像:

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)


urllib.request.urlretrieve(URL, 'image_name.jpg')

在线IDE中的代码,从谷歌图像中抓取和下载图像使用requestsbs4urllib.requests


或者,如果你的目标是从谷歌,Bing, Yahoo!, DuckDuckGo(和其他搜索引擎),然后你可以使用SerpApi。这是一个带有免费计划的付费API。

最大的区别是,不需要弄清楚如何绕过搜索引擎的块,或者如何从HTML或JavaScript中提取某些部分,因为这些已经为最终用户完成了。

要集成的示例代码:

import os, urllib.request
from serpapi import GoogleSearch


params = {
"api_key": os.getenv("API_KEY"),
"engine": "google",
"q": "pexels cat",
"tbm": "isch"
}


search = GoogleSearch(params)
results = search.get_dict()


print(json.dumps(results['images_results'], indent=2, ensure_ascii=False))


# download images
for index, image in enumerate(results['images_results']):


# print(f'Downloading {index} image...')
    

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582')]
urllib.request.install_opener(opener)


# saves original res image to the SerpApi_Images folder and add index to the end of file name
urllib.request.urlretrieve(image['original'], f'SerpApi_Images/original_size_img_{index}.jpg')


-----------
'''
]
# other images
{
"position": 100, # 100 image
"thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQK62dIkDjNCvEgmGU6GGFZcpVWwX-p3FsYSg&usqp=CAU",
"source": "homewardboundnj.org",
"title": "pexels-helena-lopes-1931367 - Homeward Bound Pet Adoption Center",
"link": "https://homewardboundnj.org/upcoming-event/black-cat-appreciation-day/pexels-helena-lopes-1931367/",
"original": "https://homewardboundnj.org/wp-content/uploads/2020/07/pexels-helena-lopes-1931367.jpg",
"is_product": false
}
]
'''

免责声明,我为SerpApi工作。

这是一个非常简单的代码

import requests


response = requests.get("https://i.imgur.com/ExdKOOz.png") ## Making a variable to get image.


file = open("sample_image.png", "wb") ## Creates the file for image
file.write(response.content) ## Saves file content
file.close()