删除与模式匹配的多个文件

我用 Python 和 Django 做了一个在线画廊。我刚刚开始添加编辑功能,从一个旋转开始。我使用 sorl.thumbail 根据需要自动生成缩略图。

当我编辑原始文件时,我需要清理所有的缩略图,以便生成新的缩略图。每张图片有三到四张(我在不同的场合有不同的图片)。

我在 可以硬代码的文件变体... 但这是混乱的,如果我改变我做事的方式,我需要重新访问的代码。

理想情况下,我想做一个正则表达式删除。在正则表达式术语中,我所有的原件都是这样命名的:

^(?P<photo_id>\d+)\.jpg$

所以我想删除:

^(?P<photo_id>\d+)[^\d].*jpg$

(在这里,我将 photo_id替换为要清理的 ID。)

121758 次浏览

Try something like this:

import os, re


def purge(dir, pattern):
for f in os.listdir(dir):
if re.search(pattern, f):
os.remove(os.path.join(dir, f))

Then you would pass the directory containing the files and the pattern you wish to match.

If you need recursion into several subdirectories, you can use this method:

import os, re, os.path
pattern = "^(?P<photo_id>\d+)[^\d].*jpg$"
mypath = "Photos"
for root, dirs, files in os.walk(mypath):
for file in filter(lambda x: re.match(pattern, x), files):
os.remove(os.path.join(root, file))

You can safely remove subdirectories on the fly from dirs, which contains the list of the subdirectories to visit at each node.

Note that if you are in a directory, you can also get files corresponding to a simple pattern expression with glob.glob(pattern). In this case you would have to substract the set of files to keep from the whole set, so the code above is more efficient.

It's not clear to me that you actually want to do any named-group matching -- in the use you describe, the photoid is an input to the deletion function, and named groups' purpose is "output", i.e., extracting certain substrings from the matched string (and accessing them by name in the match object). So, I would recommend a simpler approach:

import re
import os


def delete_thumbnails(photoid, photodirroot):
matcher = re.compile(r'^%s\d+\D.*jpg$' % photoid)
numdeleted = 0
for rootdir, subdirs, filenames in os.walk(photodirroot):
for name in filenames:
if not matcher.match(name):
continue
path = os.path.join(rootdir, name)
os.remove(path)
numdeleted += 1
return "Deleted %d thumbnails for %r" % (numdeleted, photoid)

You can pass the photoid as a normal string, or as a RE pattern piece if you need to remove several matchable IDs at once (e.g., r'abc[def] to remove abcd, abce, and abcf in a single call) -- that's the reason I'm inserting it literally in the RE pattern, rather than inserting the string re.escape(photoid) as would be normal practice. Certain parts such as counting the number of deletions and returning an informative message at the end are obviously frills which you should remove if they give you no added value in your use case.

Others, such as the "if not ... // continue" pattern, are highly recommended practice in Python (flat is better than nested: bailing out to the next leg of the loop as soon as you determine there is nothing to do on this one is better than nesting the actions to be done within an if), although of course other arrangements of the code would work too.

My recomendation:

def purge(dir, pattern, inclusive=True):
regexObj = re.compile(pattern)
for root, dirs, files in os.walk(dir, topdown=False):
for name in files:
path = os.path.join(root, name)
if bool(regexObj.search(path)) == bool(inclusive):
os.remove(path)
for name in dirs:
path = os.path.join(root, name)
if len(os.listdir(path)) == 0:
os.rmdir(path)

This will recursively remove every file that matches the pattern by default, and every file that doesn't if inclusive is true. It will then remove any empty folders from the directory tree.

I find Popen(["rm " + file_name + "*.ext"], shell=True, stdout=PIPE).communicate() to be a much simpler solution to this problem. Although this is prone to injection attacks, I don't see any issues if your program is using this internally.

How about this?

import glob, os, multiprocessing
p = multiprocessing.Pool(4)
p.map(os.remove, glob.glob("P*.jpg"))

Mind you this does not do recursion and uses wildcards (not regex).

UPDATE In Python 3 the map() function will return an iterator, not a list. This is useful since you will probably want to do some kind processing on the items anyway, and an iterator will always be more memory-efficient to that end.

If however, a list is what you really need, just do this:

...
list(p.map(os.remove, glob.glob("P*.jpg")))

I agree it's not the most functional way, but it's concise and does the job.

Using the glob module:

import glob, os
for f in glob.glob("P*.jpg"):
os.remove(f)

Alternatively, using pathlib:

from pathlib import Path
for p in Path(".").glob("P*.jpg"):
p.unlink()
def recursive_purge(dir, pattern):
for f in os.listdir(dir):
if os.path.isdir(os.path.join(dir, f)):
recursive_purge(os.path.join(dir, f), pattern)
elif re.search(pattern, os.path.join(dir, f)):
os.remove(os.path.join(dir, f))
import os, sys, glob, re


def main():


mypath = "<Path to Root Folder to work within>"
for root, dirs, files in os.walk(mypath):
for file in files:
p = os.path.join(root, file)
if os.path.isfile(p):
if p[-4:] == ".jpg": #Or any pattern you want
os.remove(p)