The way to check a HDFS directory's size?

小开

最佳答案

0.20.203之前，在2.6.0中正式弃用:

hadoop fs -dus [directory]

由于 ~~< a href = “ http://hadoop.apache.org/docs/r0.20.203.0/file _ system _ shell.html # du”rel = “ noReferrer”> 0.20.203~~ (死链接) 1.0.4和仍然通过 2.6.0兼容:

hdfs dfs -du [-s] [-h] URI [URI …]

你也可以运行 hadoop fs -help获得更多的信息和细节。

小开

Hadoop 集群上使用的空间百分比
sudo -u hdfs hadoop fs –df

特定文件夹下的容量:
sudo -u hdfs hadoop fs -du -h /user

小开

hadoop fs -du -s -h /path/to/dir以可读的形式显示目录的大小。

小开

有了这个你会得到大小的 GB

hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

小开

要获取目录的大小，可以使用 Hdfs dfs-du-s-h/$yourDirectoryName。 Hdfs dfsadmin-report 可用于查看快速集群级存储报告。

小开

Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

它显示给定目录中包含的文件和目录的大小，或者文件的长度，以防它只是一个文件。

选择:

The 是的 option will result in an 文件长度汇总汇总 being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.

The - 哦 option will format file sizes in a 人类可读 fashion (e.g 64.0m instead of 67108864)

V选项将 列的名称显示为标题行。

X选项将从结果计算中得到 排除快照。如果没有 -x 选项(默认值) ，结果总是从所有的 INodes 中计算出来，包括给定路径下的所有快照。

`du` returns three columns with the following format:

 +-------------------------------------------------------------------+
| size  |  disk_space_consumed_with_all_replicas  |  full_path_name |
+-------------------------------------------------------------------+

示例命令:

hadoop fs -du /user/hadoop/dir1 \
/user/hadoop/file1 \
hdfs://nn.example.com/user/hadoop/dir1

退出代码: 成功时返回0，错误时返回 -1。

来源: Apache doc

小开

命令应该是 hadoop fs -du -s -h \dirPath

- du [-s ][-h ] ... : 显示匹配指定文件模式的文件所使用的空间量(以字节为单位)。
-s : Rather than showing the size of each individual file that matches the
模式，显示总(汇总)大小。
-h : Formats the sizes of files in a human-readable fashion rather than a number of bytes. (Ex MB/GB/TB etc)

注意，即使没有 -s 选项，这也只显示了一个级别的大小汇总深入到一个目录中。

输出的格式是 Size name (full path)

小开

当试图计算目录中特定文件组的总数时，-s选项不起作用(在 Hadoop 2.7.1中)。例如:

目录结构:

some_dir
├abc.txt
├count1.txt
├count2.txt
└def.txt

假设每个文件大小为1KB，可以用以下方式总结整个目录:

hdfs dfs -du -s some_dir
4096 some_dir

但是，如果我想要包含“ count”的所有文件的总和，命令就不够了。

hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt

为了解决这个问题，我通常通过 awk 传递输出。

hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048

小开

Hadoop 版本2.3.33:

hadoop fs -dus  /path/to/dir  |   awk '{print $2/1024**3 " G"}'

小开

hdfs dfs -count <dir>

info from man page:

-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
Count the number of directories, files and bytes under the paths
that match the specified file pattern.  The output columns are:
DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
or, with the -q option:
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA
DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

小开

以人类可读的格式获得文件夹大小的最简单方法是

hdfs dfs -du -h /folderpath

在哪里可以加上 -s得到总和

小开

如果有人需要通过蟒蛇的方式:)

安装 hdfs python 包

pip install hdfs

密码

from hdfs import InsecureClient


client = InsecureClient('http://hdfs_ip_or_nameservice:50070',user='hdfs')
folder_info = client.content("/tmp/my/hdfs/path")


#prints folder/directory size in bytes
print(folder_info['length'])

The way to check a HDFS directory's size?

hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]

选择:

du returns three columns with the following format:

示例命令:

`hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]`

`du` returns three columns with the following format: