Python: 使用 url 从 google drive 下载文件

小开

PyDrive allows you to download a file with the function GetContentFile(). You can find the function's documentation here.

See example below:

# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.

This code assumes that you have an authenticated drive object, the docs on this can be found here and here.

In the general case this is done like so:

from pydrive.auth import GoogleAuth


gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()


# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)

Info on silent authentication on a server can be found here and involves writing a settings.yaml (example: here) in which you save the authentication details.

小开

最佳答案

If by "drive's url" you mean the shareable link of a file on Google Drive, then the following might help:

import requests


def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"


session = requests.Session()


response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)


if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)


save_response_content(response, destination)


def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value


return None


def save_response_content(response, destination):
CHUNK_SIZE = 32768


with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)


if __name__ == "__main__":
file_id = 'TAKE ID FROM SHAREABLE LINK'
destination = 'DESTINATION FILE ON YOUR DISK'
download_file_from_google_drive(file_id, destination)

The snipped does not use pydrive, nor the Google Drive SDK, though. It uses the requests module (which is, somehow, an alternative to urllib2).

When downloading large files from Google Drive, a single GET request is not sufficient. A second one is needed - see wget/curl large file from google drive.

小开

This has also been described above,

   from pydrive.auth import GoogleAuth
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)

This creates its own server too do the dirty work of authenticating

   file_obj = drive.CreateFile({'id': '<Put the file ID here>'})
file_obj.GetContentFile('Demo.txt')

This downloads the file

小开

Having had similar needs many times, I made an extra simple class GoogleDriveDownloader starting on the snippet from @user115202 above. You can find the source code here.

You can also install it through pip:

pip install googledrivedownloader

Then usage is as simple as:

from google_drive_downloader import GoogleDriveDownloader as gdd


gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
dest_path='./data/mnist.zip',
unzip=True)

This snippet will download an archive shared in Google Drive. In this case 1iytA1n2z4go3uVCwE__vIKouTKyIDjEq is the id of the sharable link got from Google Drive.

小开

# Importing [PyDrive][1] OAuth
from pydrive.auth import GoogleAuth


def download_tracking_file_by_id(file_id, download_dir):
gauth = GoogleAuth(settings_file='../settings.yaml')
# Try to load saved client credentials
gauth.LoadCredentialsFile("../credentials.json")
if gauth.credentials is None:
# Authenticate if they're not there
gauth.LocalWebserverAuth()
elif gauth.access_token_expired:
# Refresh them if expired
gauth.Refresh()
else:
# Initialize the saved creds
gauth.Authorize()
# Save the current credentials to a file
gauth.SaveCredentialsFile("../credentials.json")


drive = GoogleDrive(gauth)


logger.debug("Trying to download file_id " + str(file_id))
file6 = drive.CreateFile({'id': file_id})
file6.GetContentFile(download_dir+'mapmob.zip')
zipfile.ZipFile(download_dir + 'test.zip').extractall(UNZIP_DIR)
tracking_data_location = download_dir + 'test.json'
return tracking_data_location

The above function downloads the file given the file_id to a specified downloads folder. Now the question remains, how to get the file_id? Simply split the url by id= to get the file_id.

file_id = url.split("id=")[1]

小开

You can install https://pypi.org/project/googleDriveFileDownloader/

pip install googleDriveFileDownloader

And download the file, here is the sample code to download

from googleDriveFileDownloader import googleDriveFileDownloader
a = googleDriveFileDownloader()
a.downloadFile("https://drive.google.com/uc?id=1O4x8rwGJAh8gRo8sjm0kuKFf6vCEm93G&export=download")

小开

I recommend gdown package.

pip install gdown

Take your share link

https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing

and grab the id - eg. 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N by pressing the download button (look for at the link), and swap it in after the id below.

import gdown


url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)

小开

There's in the docs a function that downloads a file when we provide an ID of the file to download,

from __future__ import print_function


import io


import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload




def download_file(real_file_id):
"""Downloads a file
Args:
real_file_id: ID of the file to download
Returns : IO object with location.


Load pre-authorized user credentials from the environment.
TODO(developer) - See https://developers.google.com/identity
for guides on implementing OAuth2 for the application.
"""
creds, _ = google.auth.default()


try:
# create drive api client
service = build('drive', 'v3', credentials=creds)


file_id = real_file_id


# pylint: disable=maybe-no-member
request = service.files().get_media(fileId=file_id)
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')


except HttpError as error:
print(F'An error occurred: {error}')
file = None


return file.getvalue()




if __name__ == '__main__':
download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')

This bears the question:

How do we get the file ID to download the file?

Generally speaking, a URL from a shared file from Google Drive looks like this

https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing

where 1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh corresponds to fileID.

You can simply copy it from the URL or, if you prefer, it's also possible to create a function to get the fileID from the URL.

For instance, given the following url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing,

def url_to_id(url):
x = url.split("/")
return x[5]

Printing x will give

['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']

And so, as we want to return the 6th array value, we use x[5].

小开

Here's an easy way to do it with no third-party libraries and a service account.

pip install google-api-core and google-api-python-client

from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io


credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication


credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)


file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))

小开

This example is based on an similar to RayB, but keeps the file in memory and is a little simpler, and you can paste it into colab and it works.

import googleapiclient.discovery
import oauth2client.client
from google.colab import auth
auth.authenticate_user()


def download_gdrive(id):
creds = oauth2client.client.GoogleCredentials.get_application_default()
service = googleapiclient.discovery.build('drive', 'v3', credentials=creds)
return service.files().get_media(fileId=id).execute()


a = download_gdrive("1F-yaQB8fdsfsdafm2l8WFjhEiYSHZrCcr")

小开

I tried using google Colaboratory: https://colab.research.google.com/

Suppose your sharable link is https://docs.google.com/spreadsheets/d/12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu/edit?usp=sharing&ouid=102608702203033509854&rtpof=true&sd=true

all you need is id that is 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu

command in cell

!gdown 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu

run the cell and you will see that file is downloaded in /content/Amazon_Reviews.xlsx

Note: one should know how to use Google colab

小开

import requests


def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"


session = requests.Session()


response = session.get(URL, params = { 'id' : id , 'confirm': 1 }, stream = True)
token = get_confirm_token(response)


if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)


save_response_content(response, destination)


def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value


return None


def save_response_content(response, destination):
CHUNK_SIZE = 32768


with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)


if __name__ == "__main__":
file_id = 'TAKE ID FROM SHAREABLE LINK'
destination = 'DESTINATION FILE ON YOUR DISK'
download_file_from_google_drive(file_id, destination)

Just repeating the accepted answer but adding confirm=1 parameter so it always downloads even if the file is too big