小开

head -c 10 does the right thing here.

小开

You can use the dd command to copy an arbitrary number of bytes from a binary file.

dd if=infile of=outfile1 bs=10 count=1
dd if=infile of=outfile2 bs=10 skip=1

小开

最佳答案

To get the first 10 bytes, as noted already:

head -c 10

To get all but the first 10 bytes (at least with GNU tail):

tail -c+11

小开

How to split a stream (or a file) under bash

Two answer here!

Reading SO request:

get the header (first 10 bytes) of a file and then in another section get everything except the first 10 bytes.

I understand:

How to split a file at specific point

As all answers here does access same file two time, instead of just split it!!

Here is my two cents:

The interesting thing using Un*x is considering every whole job as a filter, it's easy to a split stream using unbuffered I/O. Most of standard un*x tools (cat, grep, awk, sed, python, perl ...) work as filters.

1. Using `head` or `dd` but in a single pass

{ head -c 10 >head_part; cat >tail_part;} <file

This is the more efficient, as your file is read only 1 time, the first 10 byte goes to head_part and the rest goes to tail_part.

Note: second redirection >tail_part could be place outside of whole list ({ ...;}) as well...

You could do same, using `dd`:

{ dd count=1 bs=10 of=head_part; cat;} <file >tail_part

This stay more efficient than running two process of dd to open same file two times.

...And still use standard block size for the rest of file:

Another sample based on read by line:

Split HTTP (or mail) stream on near empty line (line containing only carriage return: \r):

nc google.com 80 <<<$'GET / HTTP/1.0\r\nHost: google.com\r\n\r' |
{ sed -u '/^\r$/q' >/tmp/so_head.raw; cat;} >/tmp/so_body.raw

or, to drop empty last head line:

nc google.com 80 <<<$'GET / HTTP/1.0\r\nHost: google.com\r\n\r' |
{ sed -nu '/^\r$/q;p' >/tmp/so_head.raw; cat;} >/tmp/so_body.raw

This will produce two files:

ls -l so_*.raw
-rw-r--r-- 1 root    root           307 Apr 25 11:40  so_head.raw
-rw-r--r-- 1 root    root           219 Apr 25 11:40  so_body.raw


grep www so_*.raw
so_body.raw:<A HREF="http://www.google.com/">here</A>.
so_head.raw:Location: http://www.google.com/

2. Pure bash way:

If the goal is to obtain values of first 10 bytes in a usable bash variable, here is a nice and efficient way:

Because ten byte are few, fork to head could be avoided. from Read a file by bytes in BASH:

read8() {
local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car || { printf -v $_r8_var '';return 1;}
printf -v $_r8_var %02X "'"$_r8_car
}
{
first10=()
for i in {0..9};do
read8 first10[i] || break
done
cat
} < "$infile" >"$outfile"

This will create an array ${first10[@]} containing hexadecimal values of first ten bytes of $infile and store rest of data into $outfile.

declare -p first10


declare -a first10=([0]="25" [1]="50" [2]="44" [3]="46" [4]="2D" [5]="31" [6]="2E"
[7]="34" [8]="0A" [9]="25")

This was a PDF (%PDF -> 25 50 44 46)... Here's another sample:

{
first10=()
for i in {0..9};do
read8 first10[i] || break
done
cat
} <<<"Hello world!"
d!

As I didn't redirect output, string d! will be output on terminal.

echo ${first10[@]}
48 65 6C 6C 6F 20 77 6F 72 6C


printf '%b%b%b%b%b%b%b%b%b%b\n' ${first10[@]/#/\\x}
Hello worl

About binary

You said:

These are binary files and will likely have \0's and \n's throughout the first 10 bytes.

{
first10=()
for i in {0..9};do
read8 first10[i] || break
done
cat
} < <(gzip <<<"Hello world!") >/dev/null


echo ${first10[@]}
1F 8B 08 00 00 00 00 00 00 03

( Sample with a \n at bottom of this ;)

As a function

read8() { local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car || { printf -v $_r8_var '';return 1;}
printf -v $_r8_var %02X "'"$_r8_car ;}
get10() {
local -n result=${1:-first10}     # 1st arg is array name
local -i _i
result=()
for ((_i=0;_i<${2:-10};_i++));do  # 2nd arg is number of bytes
read8 result[_i] || { unset result[_i] ; return 1 ;}
done
cat
}

Then (here, I use the special character ⛶ for: there was no newline. ).

get10 pdf 4 <$infile >$outfile
printf %b ${pdf[@]/#/\\x}
%PDF⛶


echo $(( $(stat -c %s $infile) - $(stat -c %s $outfile) ))
4


get10 test 8 <<<'Hello world'
rld!


printf %b ${test[@]/#/\\x}
Hello Wo⛶


get10 test 24 <<<'Hello World!'
printf %b ${test[@]/#/\\x}
Hello World!

( And the last character printed is a \n! ;)

Final binary demo:

get10 test 256 < <(gzip <<<'Hello world!')


printf '%b' ${test[@]/#/\\x} | gunzip
Hello world!


printf "  %s %s %s %s  %s %s %s %s    %s %s %s %s  %s %s %s %s\n" ${test[@]}
1F 8B 08 00  00 00 00 00    00 03 F3 48  CD C9 C9 57
28 CF 2F CA  49 51 E4 02    00 41 E4 A9  B2 0D 00 00
00

Note!! This work fine and is very quick while number of byte to read stay low, even processing large files. This could be used for file recognition, for sample. But for spliting files on larger parts, you have to use split, head, tail and/or dd.

如何只获取二进制文件的前十个字节