我如何从 Gmail 下载所有附件的电子邮件?

如何连接到 Gmail 并确定哪些邮件具有附件?然后我想下载每个附件,在处理每条消息时打印出 Subject: 和 From: 。

64860 次浏览

我不是 Perl 方面的专家,但是我知道 GMail 支持 IMAP 和 POP3,这两个协议是完全标准的,允许您这样做。

也许这能帮助你开始。

在 gmail 中,你可以过滤“ has: atting”,用它来识别测试时你应该得到的消息。请注意,这似乎给两个附加文件的消息(曲别针图标显示) ,以及内联附加图像(没有曲别针显示)。

没有 Gmail API,所以 IMAP 或 POP 是你唯一真正的选择。JavaMail API可能会有一些帮助,以及这个非常简短的文章关于 使用 Perl 从 IMAP 下载附件。在 SO 上的一些 前几个问题也可能有帮助。

这个 PHP 例子也可能有帮助。遗憾的是,在 imap _ header 中没有包含任何附件信息,因此需要下载主体才能看到 X-Attache-Id 字段。(有人请证明我错了)。

由于 Gmail 支持标准协议 POP 和 IMAP,任何提供这两种协议客户端的平台、工具、应用程序、组件或 API 都应该可以工作。

我建议用谷歌搜索你最喜欢的语言/平台(例如,“ python”) ,加上“ pop”,加上“ imap”,也许再加上“ open source”,也许再加上“ download”或“ review”,看看你能得到什么选项。

有许多免费的应用程序和组件,挑选一些似乎有价值的,检查评论,然后下载和享受。

您应该意识到这样一个事实,即您需要 SSL 连接到 GMail (对于 POP3和 IMAP 都是如此——当然对于它们的 SMTP 服务器也是如此——除了端口25之外,但那是另一回事)。

你看过维基百科上的 GMail 第三方插件吗?

特别是,PhpGmailDrive是一个开源的附加组件,您可以使用原样,或者可以通过学习获得灵感?

#!/usr/bin/env python
"""Save all attachments for given gmail account."""
import os, sys
from libgmail import GmailAccount


ga = GmailAccount("your.account@gmail.com", "pA$$w0Rd_")
ga.login()


# folders: inbox, starred, all, drafts, sent, spam
for thread in ga.getMessagesByFolder('all', allPages=True):
for msg in thread:
sys.stdout.write('.')
if msg.attachments:
print "\n", msg.id, msg.number, msg.subject, msg.sender
for att in msg.attachments:
if att.filename and att.content:
attdir = os.path.join(thread.id, msg.id)
if not os.path.isdir(attdir):
os.makedirs(attdir)
with open(os.path.join(attdir, att.filename), 'wb') as f:
f.write(att.content)

未经测试

  1. 确保 TOS 允许这样的脚本,否则您的帐户将被暂停
  2. 可能还有更好的选择: GMail 脱机模式、 Thunderbird + ExtratExtended、 GmailFS、 GMail Drive 等等。

对于 Java,您将找到使用的 G4J。它是一组 API,用于通过 Java 与 Google Mail 进行通信(主页上的屏幕截图是围绕此构建的演示电子邮件客户端)

难题一: -)

import email, getpass, imaplib, os


detach_dir = '.' # directory where to save attachments (default: current)
user = raw_input("Enter your GMail username:")
pwd = getpass.getpass("Enter your password: ")


# connecting to the gmail imap server
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") # here you a can choose a mail box like INBOX instead
# use m.list() to get all the mailboxes


resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id


for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
email_body = data[0][1] # getting the mail content
mail = email.message_from_string(email_body) # parsing the mail content to get a mail object


#Check if any attachments at all
if mail.get_content_maintype() != 'multipart':
continue


print "["+mail["From"]+"] :" + mail["Subject"]


# we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
for part in mail.walk():
# multipart are just containers, so we skip them
if part.get_content_maintype() == 'multipart':
continue


# is this part an attachment ?
if part.get('Content-Disposition') is None:
continue


filename = part.get_filename()
counter = 1


# if there is no filename, we create one with a counter to avoid duplicates
if not filename:
filename = 'part-%03d%s' % (counter, 'bin')
counter += 1


att_path = os.path.join(detach_dir, filename)


#Check if its already there
if not os.path.isfile(att_path) :
# finally write the stuff
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()

哇! 这是东西。——但是尝试同样在爪哇,只是为了好玩!

顺便说一下,我在 shell 中测试了它,所以可能还存在一些错误。

好好享受吧

编辑:

因为邮箱名称可以从一个国家变更到另一个国家,我建议执行 m.list()并在 m.select("the mailbox name")之前选择一个项目,以避免这个错误:

Error: 命令 SEARCH 在状态 AUTH 中是非法的,只允许在 国家选择

看看 Mail: : Webmail: : Gmail 邮件: : Webmail:

获得依恋

有两种方法可以获得附件:

通过发送对 get_indv_email返回的特定附件的引用

# Creates an array of references to every attachment in your account
my $messages = $gmail->get_messages();
my @attachments;


foreach ( @{ $messages } ) {
my $email = $gmail->get_indv_email( msg => $_ );
if ( defined( $email->{ $_->{ 'id' } }->{ 'attachments' } ) ) {
foreach ( @{ $email->{ $_->{ 'id' } }->{ 'attachments' } } ) {
push( @attachments, $gmail->get_attachment( attachment => $_ ) );
if ( $gmail->error() ) {
print $gmail->error_msg();
}
}
}
}

或者通过发送附件 ID 和消息 ID

#retrieve specific attachment
my $msgid = 'F000000000';
my $attachid = '0.1';
my $attach_ref = $gmail->get_attachment( attid => $attachid, msgid => $msgid );

(返回对保存附件中数据的标量的引用。)

下面是我用 好极了(Java 平台的动态语言)下载的银行对账单。

import javax.mail.*
import java.util.Properties


String  gmailServer
int gmailPort
def user, password, LIMIT
def inboxFolder, root, StartDate, EndDate




//    Downloads all attachments from a gmail mail box as per some criteria
//    to a specific folder
//    Based on code from
//    http://agileice.blogspot.com/2008/10/using-groovy-to-connect-to-gmail.html
//    http://stackoverflow.com/questions/155504/download-mail-attachment-with-java
//
//    Requires:
//        java mail jars in the class path (mail.jar and activation.jar)
//        openssl, with gmail certificate added to java keystore (see agileice blog)
//
//    further improvement: maybe findAll could be used to filter messages
//    subject could be added as another criteria
////////////////////// <CONFIGURATION> //////////////////////
// Maximm number of emails to access in case parameter range is too high
LIMIT = 10000


// gmail credentials
gmailServer = "imap.gmail.com"
gmailPort = 993


user = "gmailuser@gmail.com"
password = "gmailpassword"


// gmail label, or "INBOX" for inbox
inboxFolder = "finance"


// local file system where the attachment files need to be stored
root = "D:\\AttachmentStore"


// date range dd-mm-yyyy
StartDate= "31-12-2009"
EndDate = "1-6-2010"
////////////////////// </CONFIGURATION> //////////////////////


StartDate = Date.parse("dd-MM-yyyy", StartDate)
EndDate = Date.parse("dd-MM-yyyy", EndDate)


Properties props = new Properties();
props.setProperty("mail.store.protocol", "imaps");
props.setProperty("mail.imaps.host", gmailServer);
props.setProperty("mail.imaps.port", gmailPort.toString());
props.setProperty("mail.imaps.partialfetch", "false");


def session = javax.mail.Session.getDefaultInstance(props,null)
def store = session.getStore("imaps")


store.connect(gmailServer, user, password)


int i = 0;
def folder = store.getFolder(inboxFolder)


folder.open(Folder.READ_ONLY)


for(def msg : folder.messages) {


//if (msg.subject?.contains("bank Statement"))
println "[$i] From: ${msg.from} Subject: ${msg.subject} -- Received: ${msg.receivedDate}"


if (msg.receivedDate <  StartDate || msg.receivedDate > EndDate) {
println "Ignoring due to date range"
continue
}




if (msg.content instanceof Multipart) {
Multipart mp = (Multipart)msg.content;


for (int j=0; j < mp.count; j++) {


Part part = mp.getBodyPart(j);


println " ---- ${part.fileName} ---- ${part.disposition}"


if (part.disposition?.equalsIgnoreCase(Part.ATTACHMENT)) {


if (part.content) {


def name = msg.receivedDate.format("yyyy_MM_dd") + " " + part.fileName
println "Saving file to $name"


def f = new File(root, name)


//f << part.content
try {
if (!f.exists())
f << part.content
}
catch (Exception e) {
println "*** Error *** $e"
}
}
else {
println "NO Content Found!!"
}
}
}
}


if (i++ > LIMIT)
break;


}

如果你们中有人已经更新到 python 3.3,我从 给你中提取了2.7脚本,并将其更新到3.3。还修复了一些 Gmail 返回信息的方式问题。

# Something in lines of http://stackoverflow.com/questions/348630/how-can-i-download-all-emails-with-attachments-from-gmail
# Make sure you have IMAP enabled in your gmail settings.
# Right now it won't download same file name twice even if their contents are different.
# Gmail as of now returns in bytes but just in case they go back to string this line is left here.


import email
import getpass, imaplib
import os
import sys
import time


detach_dir = '.'
if 'attachments' not in os.listdir(detach_dir):
os.mkdir('attachments')


userName = input('Enter your GMail username:\n')
passwd = getpass.getpass('Enter your password:\n')




try:
imapSession = imaplib.IMAP4_SSL('imap.gmail.com',993)
typ, accountDetails = imapSession.login(userName, passwd)
if typ != 'OK':
print ('Not able to sign in!')
raise


imapSession.select('Inbox')
typ, data = imapSession.search(None, 'ALL')
if typ != 'OK':
print ('Error searching Inbox.')
raise


# Iterating over all emails
for msgId in data[0].split():
typ, messageParts = imapSession.fetch(msgId, '(RFC822)')


if typ != 'OK':
print ('Error fetching mail.')
raise


#print(type(emailBody))
emailBody = messageParts[0][1]
#mail = email.message_from_string(emailBody)
mail = email.message_from_bytes(emailBody)


for part in mail.walk():
#print (part)
if part.get_content_maintype() == 'multipart':
# print part.as_string()
continue
if part.get('Content-Disposition') is None:
# print part.as_string()
continue


fileName = part.get_filename()


if bool(fileName):
filePath = os.path.join(detach_dir, 'attachments', fileName)
if not os.path.isfile(filePath) :
print (fileName)
fp = open(filePath, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()


imapSession.close()
imapSession.logout()


except :
print ('Not able to download all attachments.')
time.sleep(3)
/*based on http://www.codejava.net/java-ee/javamail/using-javamail-for-searching-e-mail-messages*/
package getMailsWithAtt;


import java.io.File;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Properties;


import javax.mail.Address;
import javax.mail.Folder;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.Multipart;
import javax.mail.NoSuchProviderException;
import javax.mail.Part;
import javax.mail.Session;
import javax.mail.Store;
import javax.mail.internet.MimeBodyPart;
import javax.mail.search.AndTerm;
import javax.mail.search.SearchTerm;
import javax.mail.search.ReceivedDateTerm;
import javax.mail.search.ComparisonTerm;


public class EmailReader {
private String saveDirectory;


/**
* Sets the directory where attached files will be stored.
*
* @param dir
*            absolute path of the directory
*/
public void setSaveDirectory(String dir) {
this.saveDirectory = dir;
}


/**
* Downloads new messages and saves attachments to disk if any.
*
* @param host
* @param port
* @param userName
* @param password
* @throws IOException
*/
public void downloadEmailAttachments(String host, String port,
String userName, String password, Date startDate, Date endDate) {
Properties props = System.getProperties();
props.setProperty("mail.store.protocol", "imaps");
try {
Session session = Session.getDefaultInstance(props, null);
Store store = session.getStore("imaps");
store.connect("imap.gmail.com", userName, password);
// ...
Folder inbox = store.getFolder("INBOX");
inbox.open(Folder.READ_ONLY);
SearchTerm olderThan = new ReceivedDateTerm (ComparisonTerm.LT, startDate);
SearchTerm newerThan = new ReceivedDateTerm (ComparisonTerm.GT, endDate);
SearchTerm andTerm = new AndTerm(olderThan, newerThan);
//Message[] arrayMessages = inbox.getMessages(); <--get all messages
Message[] arrayMessages = inbox.search(andTerm);
for (int i = arrayMessages.length; i > 0; i--) { //from newer to older
Message msg = arrayMessages[i-1];
Address[] fromAddress = msg.getFrom();
String from = fromAddress[0].toString();
String subject = msg.getSubject();
String sentDate = msg.getSentDate().toString();
String receivedDate = msg.getReceivedDate().toString();


String contentType = msg.getContentType();
String messageContent = "";


// store attachment file name, separated by comma
String attachFiles = "";


if (contentType.contains("multipart")) {
// content may contain attachments
Multipart multiPart = (Multipart) msg.getContent();
int numberOfParts = multiPart.getCount();
for (int partCount = 0; partCount < numberOfParts; partCount++) {
MimeBodyPart part = (MimeBodyPart) multiPart
.getBodyPart(partCount);
if (Part.ATTACHMENT.equalsIgnoreCase(part
.getDisposition())) {
// this part is attachment
String fileName = part.getFileName();
attachFiles += fileName + ", ";
part.saveFile(saveDirectory + File.separator + fileName);
} else {
// this part may be the message content
messageContent = part.getContent().toString();
}
}
if (attachFiles.length() > 1) {
attachFiles = attachFiles.substring(0,
attachFiles.length() - 2);
}
} else if (contentType.contains("text/plain")
|| contentType.contains("text/html")) {
Object content = msg.getContent();
if (content != null) {
messageContent = content.toString();
}
}


// print out details of each message
System.out.println("Message #" + (i + 1) + ":");
System.out.println("\t From: " + from);
System.out.println("\t Subject: " + subject);
System.out.println("\t Received: " + sentDate);
System.out.println("\t Message: " + messageContent);
System.out.println("\t Attachments: " + attachFiles);
}


// disconnect
inbox.close(false);
store.close();


} catch (NoSuchProviderException e) {
e.printStackTrace();
System.exit(1);
} catch (MessagingException e) {
e.printStackTrace();
System.exit(2);
} catch (IOException ex) {
ex.printStackTrace();
}
}


/**
* Runs this program with Gmail POP3 server
* @throws ParseException
*/
public static void main(String[] args) throws ParseException {
String host = "pop.gmail.com";
String port = "995";
String userName = "user@gmail.com";
String password = "pass";
Date startDate = new SimpleDateFormat("yyyy-MM-dd").parse("2014-06-30");
Date endDate = new SimpleDateFormat("yyyy-MM-dd").parse("2014-06-01");
String saveDirectory = "C:\\Temp";


EmailReader receiver = new EmailReader();
receiver.setSaveDirectory(saveDirectory);
receiver.downloadEmailAttachments(host, port, userName, password,startDate,endDate);


}
}

依赖性:

<dependency>
<groupId>com.sun.mail</groupId>
<artifactId>javax.mail</artifactId>
<version>1.5.1</version>
</dependency>

这个问题很古老,当时 Gmail API 还不可用。但是现在 Google 提供了 Gmail API 来访问 IMAP。参见 Google 的 Gmail API 给你。也请参阅皮皮上的 Google-api-python-client