If you want the 'apparent size' (that is the number of bytes in each file), not size taken up by files on the disk, use the -b or --bytes option (if you got a Linux system with GNU coreutils):
If you use busybox's "du" in emebedded system, you can not get a exact bytes with du, only Kbytes you can get.
BusyBox v1.4.1 (2007-11-30 20:37:49 EST) multi-call binary
Usage: du [-aHLdclsxhmk] [FILE]...
Summarize disk space used for each FILE and/or directory.
Disk space is printed in units of 1024 bytes.
Options:
-a Show sizes of files in addition to directories
-H Follow symbolic links that are FILE command line args
-L Follow all symbolic links encountered
-d N Limit output to directories (and files with -a) of depth < N
-c Output a grand total
-l Count sizes many times if hard linked
-s Display only a total for each argument
-x Skip directories on different filesystems
-h Print sizes in human readable format (e.g., 1K 243M 2G )
-m Print sizes in megabytes
-k Print sizes in kilobytes(default)
The '-c' gives you grand total data which is extracted using the 'grep total' portion of the command, and the count in Kbytes is extracted with the awk command.
The only caveat here is if you have a subdirectory containing the text "total" it will get spit out as well.
du is handy, but find is useful in case if you want to calculate the size of some files only (for example, using filter by extension). Also note that find themselves can print the size of each file in bytes. To calculate a total size we can connect dc command in the following manner:
find . -type f -printf "%s + " | dc -e0 -f- -ep
Here find generates sequence of commands for dc like 123 + 456 + 11 +.
Although, the completed program should be like 0 123 + 456 + 11 + p (remember postfix notation).
So, to get the completed program we need to put 0 on the stack before executing the sequence from stdin, and print the top number after executing (the p command at the end).
We achieve it via dc options:
-e0 is just shortcut for -e '0' that puts 0 on the stack,
-f- is for read and execute commands from stdin (that generated by find here),
-ep is for print the result (-e 'p').
To print the size in MiB like 284.06 MiB we can use -e '2 k 1024 / 1024 / n [ MiB] p' in point 3 instead (most spaces are optional).
When a folder is created, many Linux filesystems allocate 4096 bytes to store some metadata about the directory itself.
This space is increased by a multiple of 4096 bytes as the directory grows.
du command (with or without -b option) take in count this space, as you can see typing:
mkdir test && du -b test
you will have a result of 4096 bytes for an empty dir.
So, if you put 2 files of 10000 bytes inside the dir, the total amount given by du -sb would be 24096 bytes.
If you read carefully the question, this is not what asked. The questioner asked:
the sum total of all the data in files and subdirectories I would get if I opened each file and counted the bytes
that in the example above should be 20000 bytes, not 24096.
So, the correct answer IMHO could be a blend of Nelson answer and hlovdal suggestion to handle filenames containing spaces:
find . -type f -print0 | xargs -0 stat --format=%s | awk '{s+=$1} END {print s}'
There are at least three ways to get the "sum total of all the data in files and subdirectories" in bytes that work in both Linux/Unix and Git Bash for Windows, listed below in order from fastest to slowest on average. For your reference, they were executed at the root of a fairly deep file system (docroot in a Magento 2 Enterprise installation comprising 71,158 files in 30,027 directories).
1.
$ time find -type f -printf '%s\n' | awk '{ total += $1 }; END { print total" bytes" }'
748660546 bytes
real 0m0.221s
user 0m0.068s
sys 0m0.160s
2.
$ time echo `find -type f -print0 | xargs -0 stat --format=%s | awk '{total+=$1} END {print total}'` bytes
748660546 bytes
real 0m0.256s
user 0m0.164s
sys 0m0.196s
3.
$ time echo `find -type f -exec du -bc {} + | grep -P "\ttotal$" | cut -f1 | awk '{ total += $1 }; END { print total }'` bytes
748660546 bytes
real 0m0.553s
user 0m0.308s
sys 0m0.416s
These two also work, but they rely on commands that don't exist on Git Bash for Windows:
1.
$ time echo `find -type f -printf "%s + " | dc -e0 -f- -ep` bytes
748660546 bytes
real 0m0.233s
user 0m0.116s
sys 0m0.176s
2.
$ time echo `find -type f -printf '%s\n' | paste -sd+ | bc` bytes
748660546 bytes
real 0m0.242s
user 0m0.104s
sys 0m0.152s
If you only want the total for the current directory, then add -maxdepth 1 to find.
Note that some of the suggested solutions don't return accurate results, so I would stick with the solutions above instead.
$ du -sbh
832M .
$ ls -lR | grep -v '^d' | awk '{total += $5} END {print "Total:", total}'
Total: 583772525
$ find . -type f | xargs stat --format=%s | awk '{s+=$1} END {print s}'
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
4390471
$ ls -l| grep -v '^d'| awk '{total = total + $5} END {print "Total" , total}'
Total 968133