在 Python 中转换文件大小的更好方法

我正在使用一个读取文件并以字节为单位返回其大小的库。

这个文件大小然后显示给最终用户; 为了使他们更容易理解,我显式地将文件大小除以 1024.0 * 1024.0转换为 MB。当然,这是可行的,但是我想知道在 Python 中有没有更好的方法来实现这一点?

我说的更好是指 stdlib 函数,它可以根据我想要的类型操作大小。如果我指定 MB,它会自动将其除以 1024.0 * 1024.0。这些线上有东西。

200640 次浏览

有一个 快点,文件化,它将以字节为单位获取大小,并且如果。

>>> from hurry.filesize import size
>>> size(11000)
'10K'
>>> size(198283722)
'189M'

或者如果您想要1K = = 1000(这是大多数用户假设的) :

>>> from hurry.filesize import size, si
>>> size(11000, system=si)
'11K'
>>> size(198283722, system=si)
'198M'

它也有 IEC 的支持(但是没有记录在案) :

>>> from hurry.filesize import size, iec
>>> size(11000, system=iec)
'10Ki'
>>> size(198283722, system=iec)
'189Mi'

因为它是由 Awesome Martijn Faassen 编写的,所以代码小巧、清晰且可扩展。编写自己的系统非常容易。

这里有一个:

mysystem = [
(1024 ** 5, ' Megamanys'),
(1024 ** 4, ' Lotses'),
(1024 ** 3, ' Tons'),
(1024 ** 2, ' Heaps'),
(1024 ** 1, ' Bunches'),
(1024 ** 0, ' Thingies'),
]

用法如下:

>>> from hurry.filesize import size
>>> size(11000, system=mysystem)
'10 Bunches'
>>> size(198283722, system=mysystem)
'189 Heaps'

你可以使用 << 位移运算符位移运算符来代替 1024 * 1024的大小除数,也就是 1<<20得到兆字节,1<<30得到千兆字节,等等。

在最简单的场景中,您可以有一个常量 MBFACTOR = float(1<<20),然后可以使用字节,即: megas = size_in_bytes/MBFACTOR

兆字节通常是您所需要的全部,或者其他类似的东西可以使用:

# bytes pretty-printing
UNITS_MAPPING = [
(1<<50, ' PB'),
(1<<40, ' TB'),
(1<<30, ' GB'),
(1<<20, ' MB'),
(1<<10, ' KB'),
(1, (' byte', ' bytes')),
]




def pretty_size(bytes, units=UNITS_MAPPING):
"""Get human-readable file sizes.
simplified version of https://pypi.python.org/pypi/hurry.filesize/
"""
for factor, suffix in units:
if bytes >= factor:
break
amount = int(bytes / factor)


if isinstance(suffix, tuple):
singular, multiple = suffix
if amount == 1:
suffix = singular
else:
suffix = multiple
return str(amount) + suffix


print(pretty_size(1))
print(pretty_size(42))
print(pretty_size(4096))
print(pretty_size(238048577))
print(pretty_size(334073741824))
print(pretty_size(96995116277763))
print(pretty_size(3125899904842624))


## [Out] ###########################
1 byte
42 bytes
4 KB
227 MB
311 GB
88 TB
2 PB

以下是我使用的方法:

import math


def convert_size(size_bytes):
if size_bytes == 0:
return "0B"
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(size_bytes, 1024)))
p = math.pow(1024, i)
s = round(size_bytes / p, 2)
return "%s %s" % (s, size_name[i])

注意: 大小应以字节表示。

下面是计算尺寸的紧凑函数

def GetHumanReadable(size,precision=2):
suffixes=['B','KB','MB','GB','TB']
suffixIndex = 0
while size > 1024 and suffixIndex < 4:
suffixIndex += 1 #increment the index of the suffix
size = size/1024.0 #apply the division
return "%.*f%s"%(precision,size,suffixes[suffixIndex])

更详细的输出和反向操作请参阅: http://code.activestate.com/recipes/578019-bytes-to-human-human-to-bytes-converter/

为了防止有人寻找这个问题的反面(正如我所做的那样) ,以下是对我有效的方法:

def get_bytes(size, suffix):
size = int(float(size))
suffix = suffix.lower()


if suffix == 'kb' or suffix == 'kib':
return size << 10
elif suffix == 'mb' or suffix == 'mib':
return size << 20
elif suffix == 'gb' or suffix == 'gib':
return size << 30


return False

这里我的两分钱,可以上下铸造,并增加可定制的精度:

def convertFloatToDecimal(f=0.0, precision=2):
'''
Convert a float to string of decimal.
precision: by default 2.
If no arg provided, return "0.00".
'''
return ("%." + str(precision) + "f") % f


def formatFileSize(size, sizeIn, sizeOut, precision=0):
'''
Convert file size to a string representing its value in B, KB, MB and GB.
The convention is based on sizeIn as original unit and sizeOut
as final unit.
'''
assert sizeIn.upper() in {"B", "KB", "MB", "GB"}, "sizeIn type error"
assert sizeOut.upper() in {"B", "KB", "MB", "GB"}, "sizeOut type error"
if sizeIn == "B":
if sizeOut == "KB":
return convertFloatToDecimal((size/1024.0), precision)
elif sizeOut == "MB":
return convertFloatToDecimal((size/1024.0**2), precision)
elif sizeOut == "GB":
return convertFloatToDecimal((size/1024.0**3), precision)
elif sizeIn == "KB":
if sizeOut == "B":
return convertFloatToDecimal((size*1024.0), precision)
elif sizeOut == "MB":
return convertFloatToDecimal((size/1024.0), precision)
elif sizeOut == "GB":
return convertFloatToDecimal((size/1024.0**2), precision)
elif sizeIn == "MB":
if sizeOut == "B":
return convertFloatToDecimal((size*1024.0**2), precision)
elif sizeOut == "KB":
return convertFloatToDecimal((size*1024.0), precision)
elif sizeOut == "GB":
return convertFloatToDecimal((size/1024.0), precision)
elif sizeIn == "GB":
if sizeOut == "B":
return convertFloatToDecimal((size*1024.0**3), precision)
elif sizeOut == "KB":
return convertFloatToDecimal((size*1024.0**2), precision)
elif sizeOut == "MB":
return convertFloatToDecimal((size*1024.0), precision)

如您所愿,添加 TB等。

这里有一些易于复制的一行使用,如果你已经知道你想要什么单位大小。如果你正在寻找一个更通用的功能,有一些不错的选项,请看我的2021年2月更新进一步..。

字节

print(f"{os.path.getsize(filepath):,} B")

千比特

print(f"{os.path.getsize(filepath)/(1<<7):,.0f} kb")

千字节

print(f"{os.path.getsize(filepath)/(1<<10):,.0f} KB")

兆比特

print(f"{os.path.getsize(filepath)/(1<<17):,.0f} mb")

兆字节

print(f"{os.path.getsize(filepath)/(1<<20):,.0f} MB")

千兆位

print(f"{os.path.getsize(filepath)/(1<<27):,.0f} gb")

千兆字节

print(f"{os.path.getsize(filepath)/(1<<30):,.0f} GB")

兆字节

print(f"{os.path.getsize(filepath)/(1<<40):,.0f} TB")

更新日期: 2021年2月 下面是我更新和充实的功能: a)获取文件/文件夹大小,b)转换成所需的单位:

from pathlib import Path


def get_path_size(path = Path('.'), recursive=False):
"""
Gets file size, or total directory size


Parameters
----------
path: str | pathlib.Path
File path or directory/folder path


recursive: bool
True -> use .rglob i.e. include nested files and directories
False -> use .glob i.e. only process current directory/folder


Returns
-------
int:
File size or recursive directory size in bytes
Use cleverutils.format_bytes to convert to other units e.g. MB
"""
path = Path(path)
if path.is_file():
size = path.stat().st_size
elif path.is_dir():
path_glob = path.rglob('*.*') if recursive else path.glob('*.*')
size = sum(file.stat().st_size for file in path_glob)
return size




def format_bytes(bytes, unit, SI=False):
"""
Converts bytes to common units such as kb, kib, KB, mb, mib, MB


Parameters
---------
bytes: int
Number of bytes to be converted


unit: str
Desired unit of measure for output




SI: bool
True -> Use SI standard e.g. KB = 1000 bytes
False -> Use JEDEC standard e.g. KB = 1024 bytes


Returns
-------
str:
E.g. "7 MiB" where MiB is the original unit abbreviation supplied
"""
if unit.lower() in "b bit bits".split():
return f"{bytes*8} {unit}"
unitN = unit[0].upper()+unit[1:].replace("s","")  # Normalised
reference = {"Kb Kib Kibibit Kilobit": (7, 1),
"KB KiB Kibibyte Kilobyte": (10, 1),
"Mb Mib Mebibit Megabit": (17, 2),
"MB MiB Mebibyte Megabyte": (20, 2),
"Gb Gib Gibibit Gigabit": (27, 3),
"GB GiB Gibibyte Gigabyte": (30, 3),
"Tb Tib Tebibit Terabit": (37, 4),
"TB TiB Tebibyte Terabyte": (40, 4),
"Pb Pib Pebibit Petabit": (47, 5),
"PB PiB Pebibyte Petabyte": (50, 5),
"Eb Eib Exbibit Exabit": (57, 6),
"EB EiB Exbibyte Exabyte": (60, 6),
"Zb Zib Zebibit Zettabit": (67, 7),
"ZB ZiB Zebibyte Zettabyte": (70, 7),
"Yb Yib Yobibit Yottabit": (77, 8),
"YB YiB Yobibyte Yottabyte": (80, 8),
}
key_list = '\n'.join(["     b Bit"] + [x for x in reference.keys()]) +"\n"
if unitN not in key_list:
raise IndexError(f"\n\nConversion unit must be one of:\n\n{key_list}")
units, divisors = [(k,v) for k,v in reference.items() if unitN in k][0]
if SI:
divisor = 1000**divisors[1]/8 if "bit" in units else 1000**divisors[1]
else:
divisor = float(1 << divisors[0])
value = bytes / divisor
return f"{value:,.0f} {unitN}{(value != 1 and len(unitN) > 3)*'s'}"




# Tests
>>> assert format_bytes(1,"b") == '8 b'
>>> assert format_bytes(1,"bits") == '8 bits'
>>> assert format_bytes(1024, "kilobyte") == "1 Kilobyte"
>>> assert format_bytes(1024, "kB") == "1 KB"
>>> assert format_bytes(7141000, "mb") == '54 Mb'
>>> assert format_bytes(7141000, "mib") == '54 Mib'
>>> assert format_bytes(7141000, "Mb") == '54 Mb'
>>> assert format_bytes(7141000, "MB") == '7 MB'
>>> assert format_bytes(7141000, "mebibytes") == '7 Mebibytes'
>>> assert format_bytes(7141000, "gb") == '0 Gb'
>>> assert format_bytes(1000000, "kB") == '977 KB'
>>> assert format_bytes(1000000, "kB", SI=True) == '1,000 KB'
>>> assert format_bytes(1000000, "kb") == '7,812 Kb'
>>> assert format_bytes(1000000, "kb", SI=True) == '8,000 Kb'
>>> assert format_bytes(125000, "kb") == '977 Kb'
>>> assert format_bytes(125000, "kb", SI=True) == '1,000 Kb'
>>> assert format_bytes(125*1024, "kb") == '1,000 Kb'
>>> assert format_bytes(125*1024, "kb", SI=True) == '1,024 Kb'

更新2022年10月

我对最近一条评论的回答太长了,所以这里有一些关于1 < < 20魔法的进一步解释!我还注意到,漂浮是不需要的,所以我已经从上面的例子中删除了它。

正如在另一个回复(上面)中所说的,“ < <”被称为“按位运算符”。它将左侧转换为二进制,并将二进制数字向左移动20位(在本例中)。当我们正常地用十进制计数时,数字的总数决定了我们是否达到了数十位、数百位、数千位、数百万位等等。二进制中类似的东西,除了数字的数量决定了我们是否在谈论比特、字节、千字节、兆字节等等。所以... . 1 < 20实际上等于(二进制)1后面有20个(二进制)零,或者如果你还记得如何从二进制转换为十进制: 2的20次方(2 * * 20)等于1048576。在上面的代码片段中,os.path.getsize 返回一个以 BYTES 为单位的值,而1048576字节严格来说是一个 Mebibyte (MiB) ,偶尔也是一个 Megabyte (MB)。

下面是一个与 输出匹配的版本。

def human_size(num: int) -> str:
base = 1
for unit in ['B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
n = num / base
if n < 9.95 and unit != 'B':
# Less than 10 then keep 1 decimal place
value = "{:.1f}{}".format(n, unit)
return value
if round(n) < 1000:
# Less than 4 digits so use this
value = "{}{}".format(round(n), unit)
return value
base *= 1024
value = "{}{}".format(round(n), unit)
return value

以下是我的实施方案:

from bisect import bisect


def to_filesize(bytes_num, si=True):
decade = 1000 if si else 1024
partitions = tuple(decade ** n for n in range(1, 6))
suffixes = tuple('BKMGTP')


i = bisect(partitions, bytes_num)
s = suffixes[i]


for n in range(i):
bytes_num /= decade


f = '{:.3f}'.format(bytes_num)


return '{}{}'.format(f.rstrip('0').rstrip('.'), s)

它将打印最多三个小数,并去掉后面的零和句点。布尔参数 si将切换基于10和基于2的大小等级的使用。

这是它的对应物。它允许编写干净的配置文件,如 {'maximum_filesize': from_filesize('10M')。它返回一个近似于预期文件大小的整数。我没有使用位移,因为源值是一个浮点数(它可以接受 from_filesize('2.15M'))。将其转换为整数/十进制可以工作,但是会使代码更加复杂,而且它已经可以工作了。

def from_filesize(spec, si=True):
decade = 1000 if si else 1024
suffixes = tuple('BKMGTP')


num = float(spec[:-1])
s = spec[-1]
i = suffixes.index(s)


for n in range(i):
num *= decade


return int(num)

这就是:

def convert_bytes(size):
for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
if size < 1024.0:
return "%3.1f %s" % (size, x)
size /= 1024.0


return size

输出

>>> convert_bytes(1024)
'1.0 KB'
>>> convert_bytes(102400)
'100.0 KB'
UNITS = {1000: ['KB', 'MB', 'GB'],
1024: ['KiB', 'MiB', 'GiB']}


def approximate_size(size, flag_1024_or_1000=True):
mult = 1024 if flag_1024_or_1000 else 1000
for unit in UNITS[mult]:
size = size / mult
if size < mult:
return '{0:.3f} {1}'.format(size, unit)


approximate_size(2123, False)

我想要双向转换,并且我想要使用 Python 3 format ()支持来实现最 Python 化的转换。也许试试数据库模块?https://pypi.org/project/datasize/

$ pip install -qqq datasize
$ python
...
>>> from datasize import DataSize
>>> 'My new {:GB} SSD really only stores {:.2GiB} of data.'.format(DataSize('750GB'),DataSize(DataSize('750GB') * 0.8))
'My new 750GB SSD really only stores 558.79GiB of data.'