如何分解原始的16位 x86机器代码?

我想分解我所拥有的可引导 x86磁盘的 MBR (前512字节)。我已经将 MBR 复制到一个文件中,使用

dd if=/dev/my-device of=mbr bs=512 count=1

对于可以反汇编文件 mbr的 Linux 实用程序有什么建议吗?

80238 次浏览

You can use objdump. According to this article the syntax is:

objdump -D -b binary -mi386 -Maddr16,data16 mbr

I like ndisasm for this purpose. It comes with the NASM assembler, which is free and open source and included in the package repositories of most linux distros.

The GNU tool is called objdump, for example:

objdump -D -b binary -m i8086 <file>

Try this command:

sudo dd if=/dev/sda bs=512 count=1 | ndisasm -b16 -o7c00h -
ndisasm -b16 -o7c00h -a -s7c3eh mbr

Explanation - from ndisasm manpage

  • -b = Specifies 16-, 32- or 64-bit mode. The default is 16-bit mode.
  • -o = Specifies the notional load address for the file. This option causes ndisasm to get the addresses it lists down the left hand margin, and the target addresses of PC-relative jumps and calls, right.
  • -a = Enables automatic (or intelligent) sync mode, in which ndisasm will attempt to guess where synchronisation should be performed, by means of examining the target addresses of the relative jumps and calls it disassembles.
  • -s = Manually specifies a synchronisation address, such that ndisasm will not output any machine instruction which encompasses bytes on both sides of the address. Hence the instruction which starts at that address will be correctly disassembled.
  • mbr = The file to be disassembled.

starblue and hlovdal both have parts of the canonical answer. If you want to disassemble raw i8086 code, you usually want Intel syntax, not AT&T syntax, too, so use:

objdump -D -Mintel,i8086 -b binary -m i386 mbr.bin
objdump -D -Mintel,i386 -b binary -m i386 foo.bin    # for 32-bit code
objdump -D -Mintel,x86-64 -b binary -m i386 foo.bin  # for 64-bit code

If your code is ELF (or a.out (or (E)COFF)), you can use the short form:

objdump -D -Mintel,i8086 a.out  # disassembles the entire file
objdump -d -Mintel,i8086 a.out  # disassembles only code sections

For 32-bit or 64-bit code, omit the ,8086; the ELF header already includes this information.

ndisasm, as suggested by jameslin, is also a good choice, but objdump usually comes with the OS and can deal with all architectures supported by GNU binutils (superset of those supported by GCC), and its output can usually be fed into GNU as (ndisasm’s can usually be fed into nasm though, of course).

Peter Cordes suggests that “Agner Fog's objconv is very nice. It puts labels on branch targets, making a lot easier to figure out what the code does. It can disassemble into NASM, YASM, MASM, or AT&T (GNU) syntax.”

Multimedia Mike already found out about --adjust-vma; the ndisasm equivalent is the -o option.

To disassemble, say, sh4 code (I used one binary from Debian to test), use this with GNU binutils (almost all other disassemblers are limited to one platform, such as x86 with ndisasm and objconv):

objdump -D -b binary -m sh -EL x

The -m is the machine, and -EL means Little Endian (for sh4eb use -EB instead), which is relevant for architectures that exist in either endianness.

If you're just looking to use a disassembler, then objdump is one choice. The disassembler that comes with the nasm assembler is ndisasm. You can also run "debug.exe" in DOS Box on Linux, provided you get a hold of a copy of the program. It also does disassembly, as well as controlled execution; i.e. simulation of the CPU, itself - which is also important, even when doing disassembly, for reasons I'm about to describe.

Fake86 has a cpu emulator. You may be able to hack it into doing disassembly by (a) having it show the instruction instead of simulating it, (b) having it not take conditional jumps or invoke calls, but (instead) stacking the address as a new entry point to do disassembly from (i.e., in effect, taking both branches and encapsulating subroutines), (c) having it stop the current disassembly at an unconditional jump or return, (d) having it accept one, two or more entry points to start with and ideally (e) having it also accept base addresses for data segments, and (f) getting it to do a hex dump of all the areas unprocessed as data or code segments (as these are usually where indirect jumps or calls or indirectly-accessed data segments land into.)

This gets to the other sense of your query: "I want to make a disassembler". The source for ndisasm is available, and it handles many of the descendants of 8086, not just 8086, itself (which seriously clutters it, if all you want is an 8086 or even 80386 disassembler), but it is not self-contained and has a heavy dependency on the rest of the distribution.

Its main talking point is that it uses octal digits for the opcodes - which better fits the 80x86 - as I pointed out on the USENET in 1995 in comp.lang.asm ... and (in fact) nasm's creation was a direct response to that. So, it's potentially more transparent and you may want to keep the source handy as a check and comparison, if you're making your own disassembler.

You can also run the debug.exe program on itself.

You could also try to run ndisasm on debug.exe; after stripping out the 0x200-byte .EXE file header, to make it a raw binary, after extracting out the entry point address CS:IP and stack pointer address SS:SP from it (80x86 stacks grow down, so the stack segment is nominally SS:0 to SS:(SP-1)). The EXE for debug.exe has no relocations, so you're okay with that treating the code as raw binary.

But you won't get anything that's clearly recognizable, since the program is self-modifying - more precisely: self-extracting. You'll get a (barely) compressed code image (about 5/6 compression ratio) followed by a loader routine.

You have to run emulation on it, e.g. by running debug.exe on debug.exe to emulate its unpacking routine, to get it to extract itself, and then you dump the unpacked program image and disassemble that. There is a "relocation table" at the end of the loader routine, so it does actually have relocations in it - it's just that they're applied when the program unpacks itself, rather than by the OS when the EXE file is loaded.

And then you've just disassembled a disassembler that also happens to do CPU emulation, like Fake86 does - but only for the 8086. You'll have to make the absolute addresses relative (using the original relocation table as a guide), to make is re-assemblable. Once you do that, you can work on the source. The opcode table is in clear view (if you display it as text) - both when seen in the packed and unpacked versions of debug.exe.

There's also DosDebug up on GitHub. It handles everything up to "80586" (or Pentium") and "80686": it flags a generation "6" for some instructions.; e.g. the conditional "cmov" operations are handled by it, as well as their "fcmov" floating point versions. DosDebug is in 8086 assembly and is best-suited to compile with jwasm. You might be able to run nasm on it, I don't know. I never tried.

I might port the DAS disassembler to the x86, since items (a)-(f) are already incorporated into DAS's design. I've only ever ported it to the 8051, 6800, 6809 and 8080/8085 (and Z80) up to now; but the transition from 8085 to 8086 is relatively small. To that end, I might hack something out of Fake86. That's mostly abandonware, now, since the author replaced it by XTulator, as Fake86 was written when the programmer was relatively new to C. You might also be able to hack something directly out of DosDebug's opcode tables (their "instr.*" files).