如何检查没有扩展名的文件类型?

我有一个文件夹的文件,他们没有一个扩展。如何检查文件类型?我想检查文件类型并相应地更改文件名。假设函数 filetype(x)返回类似于 png的文件类型。我想这么做:

files = os.listdir(".")
for f in files:
os.rename(f, f+filetype(f))

我该怎么做?

198264 次浏览

On unix and linux there is the file command to guess file types. There's even a windows port.

From the man page:

File tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic number tests, and language tests. The first test that succeeds causes the file type to be printed.

You would need to run the file command with the subprocess module and then parse the results to figure out an extension.

edit: Ignore my answer. Use Chris Johnson's answer instead.

import subprocess as sub
p = sub.Popen('file yourfile.txt', stdout=sub.PIPE, stderr=sub.PIPE)
output, errors = p.communicate()
print(output)

As Steven pointed out, subprocess is the way. You can get the command output by the way above as this post said

There are Python libraries that can recognize files based on their content (usually a header / magic number) and that don't rely on the file name or extension.

If you're addressing many different file types, you can use python-magic. That's just a Python binding for the well-established magic library. This has a good reputation and (small endorsement) in the limited use I've made of it, it has been solid.

There are also libraries for more specialized file types. For example, the Python standard library has the imghdr module that does the same thing just for image file types.

If you need dependency-free (pure Python) file type checking, see filetype.

With newer subprocess library, you can now use the following code (*nix only solution):

import subprocess
import shlex


filename = 'your_file'
cmd = shlex.split('file --mime-type {0}'.format(filename))
result = subprocess.check_output(cmd)
mime_type = result.split()[-1]
print mime_type

The Python Magic library provides the functionality you need.

You can install the library with pip install python-magic and use it as follows:

>>> import magic


>>> magic.from_file('iceland.jpg')
'JPEG image data, JFIF standard 1.01'


>>> magic.from_file('iceland.jpg', mime=True)
'image/jpeg'


>>> magic.from_file('greenland.png')
'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced'


>>> magic.from_file('greenland.png', mime=True)
'image/png'

The Python code in this case is calling to libmagic beneath the hood, which is the same library used by the *NIX file command. Thus, this does the same thing as the subprocess/shell-based answers, but without that overhead.

In the case of images, you can use the imghdr module.

>>> import imghdr
>>> imghdr.what('8e5d7e9d873e2a9db0e31f9dfc11cf47')  # You can pass a file name or a file object as first param. See doc for optional 2nd param.
'png'

Python 2 imghdr doc
Python 3 imghdr doc

You can also install the official file binding for Python, a library called file-magic (it does not use ctypes, like python-magic).

It's available on PyPI as file-magic and on Debian as python-magic. For me this library is the best to use since it's available on PyPI and on Debian (and probably other distributions), making the process of deploying your software easier. I've blogged about how to use it, also.

Only works for Linux but Using the "sh" python module you can simply call any shell command

https://pypi.org/project/sh/

pip install sh

import sh

sh.file("/root/file")

Output: /root/file: ASCII text

also you can use this code (pure python by 3 byte of header file):

full_path = os.path.join(MEDIA_ROOT, pathfile)


try:
image_data = open(full_path, "rb").read()
except IOError:
return "Incorrect Request :( !!!"


header_byte = image_data[0:3].encode("hex").lower()


if header_byte == '474946':
return "image/gif"
elif header_byte == '89504e':
return "image/png"
elif header_byte == 'ffd8ff':
return "image/jpeg"
else:
return "binary file"

without any package install [and update version]

This code list all files of a given extension in a given folder recursively

import magic
import glob
from os.path import isfile


ROOT_DIR = 'backup'
WANTED_EXTENSION = 'sqlite'


for filename in glob.iglob(ROOT_DIR + '/**', recursive=True):
if isfile(filename):
extension = magic.from_file(filename, mime = True)
if WANTED_EXTENSION in extension:
print(filename)

https://gist.github.com/izmcm/6a5d6fa8d4ec65fd9851a1c06c8946ac