Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

Reading and Using an Assembly Language Reference

The MOV instruction is a good start. Like a medium-sized screwdriver, you'll end up using it for normal tasks and maybe some abnormal ones, just as I use screwdrivers to pry nails out of boards, club black widow spiders in the garage bathroom, discharge large electrolytic capacitors, and other intriguing things over and above workaday screw turning. (Not all of these are a good idea ... but then again, many have said that assembly language programming isn't a good idea ...) The x86 instruction set contains dozens of instructions, however, and over the course of the rest of this book, I mix in descriptions of various other instructions with further discussions of memory addressing and program logic and design.

Remembering a host of tiny, tangled details involving dozens of different instructions is brutal and unnecessary. Even the Big Guys don't try to keep it all between their ears at all times. Most keep a blue card or some other sort of reference document handy to jog their memories about machine instruction details.

Blue Cards

A blue card is a reference summary printed on a piece of colored card stock. It folds up like a road map and fits in your pocket. The original blue card may actually have been blue, but knowing the perversity of programmers in general, it was probably bright orange.

Blue cards aren't always cards anymore. One of the best is a full sheet of very stiff shiny plastic, sold by Micro Logic Corporation of Hackensack, New Jersey. The one sold with Microsoft's MASM is actually published by Intel and has grown to a pocket-sized booklet stapled on the spine.

Blue cards contain very terse summaries of what an instruction does, which operands are legal, which flags it affects, and how many machine cycles it takes to execute. This information, while helpful in the extreme, is often so tersely put that newcomers might not quite fathom which edge of the card is up.

An Assembly Language Reference for Beginners

In deference to people just starting out in assembly language, I have put together a beginner's reference to the most common x86 instructions and called it Appendix A. It contains at least a page on every instruction I cover in this book, plus a few additional instructions that everyone ought to know. It does not include descriptions on every instruction, but only the most common and most useful. Once you've gotten skillful enough to use the more arcane instructions, you should be able to read the NASM documentation (or that of some other assembler) and run with it.

On page 213 is a sample entry from Appendix A. Refer to it during the following discussion.

The instruction's mnemonic is at the top of the page, highlighted in a box to make it easy to spot while flipping quickly through the appendix. To the mnemonic's right is the name of the instruction, which is a little more descriptive than the naked mnemonic.

Flags

Immediately beneath the mnemonic is a minichart of machine flags in the Flags register. I haven't spoken in detail of flags yet, but the Flags register is a collection of 1-bit values that retain certain essential information about the state of the machine for short periods of time. Many (but by no means all) x86 instructions change the values of one or more flags. The flags may then be individually tested by one of the JMP instructions, which then change the course of the program depending on the state of the flags.

We'll get into this business of tests and jumps in Chapter 10. For now, simply understand that each of the flags has a name, and that for each flag is a symbol in the flags minichart. You'll come to know the flags by their two-character symbols in time, but until then, the full names of the flags are shown to the right of the minichart. The majority of the flags are not used frequently in beginning assembly language work. Most of what you'll be paying attention to, flags-wise, is the Carry flag (CF). It's used, as you might imagine, for keeping track of binary arithmetic when an arithmetic operation carries out of a single

byte or word.

There will be an asterisk (*) beneath the symbol of any flag affected by the instruction. How the flag is affected depends on what the instruction does. You'll have to divine that from the Notes section. When an instruction affects no flags at all, the word <none> will appear in the minichart.

In the example page, the minichart indicates that the NEG instruction affects the Overflow flag, the Sign flag, the Zero flag, the Auxiliary carry flag, the Parity flag, and the Carry flag. The ways that the flags are affected depend on the results of the negation operation on the operand specified. These ways are summarized in the second paragraph of the Notes section.

NEG Negate (Two's Complement; That Is, Multiply by -1)

Flags affected:

O

D

I

T S Z A P C

OF: Overflow flag TF: Trap flag AF: Aux carry

F

F

F

F F F F F F

DF: Direction flag SF: Sign flag PF: Parity flag

** * * * * IF: Interrupt flag ZF: Zero flag CF: Carry flag

Legal forms: 8086/8 286 386 486 Pentium

 

 

 

 

NEG r8

X

X

X

X

X

NEG m8

X

X

X

X

X

NEG r16

X

X

X

X

X

NEG m16

X

X

X

X

X

NEG r32

 

 

X

X

X

NEG m32

 

 

X

X

X

Examples:

NEG AL

NEG ECX

NEG BYTE [BX] ; Negates byte quantity at DS:BX

NEG WORD [DI] ; Negates word quantity at DS:BX

Notes:

This is the assembly language equivalent of multiplying a value by -1. Keep in mind that negation is not the same as simply inverting each bit in the operand. (Another instruction, NOT, does that.) The process is also known as generating the two's complement of a value. The two's complement of a value added to that value yields zero. -1 = $FF; -2 = $FE; -3 = $FD; and so forth.

If the operand is 0, CF is cleared and ZF is set; otherwise, CF is set and ZF is cleared. If the operand contains the maximum negative value (-128 for 8-bit or -32768 for 16-bit), the operand does not change, but OF and CF are set. SF is set if the result is negative, else SF is cleared. PF is set if the low-order 8 bits of the result contain an even number of set (1) bits; otherwise, PF is cleared.

Note You must use a type override specifier (BYTE or WORD) with memory data.

r8

= AL AH BL

BH CL CH DL DH

r16

= AX BX CX DX BP SP SI DI

sr =

CS DS

SS

ES

m16 = 16-bit memory data

m8

=

8-bit

memory data

i8

=

8-bit

immediate data

i16

=

16-bit

immediate data

d8

=

8 bit

signed displacement

d16

=

16-bit

signed displacement

Legal Forms

A given mnemonic represents a single x86 instruction, but each instruction may include more than one legal form. The form of an instruction varies by the type and order of the operands passed to it.

What the individual forms actually represent are different binary number opcodes. For example, beneath the surface, the POP AX instruction is the number 58H, whereas the POP SI instruction is the number 5EH.

Sometimes there will be special cases of an instruction and its operands that are shorter than the more general cases. For example, the XCHG instruction, which exchanges the contents of the two operands, has a special case when one of the operands is register AX. Any XCHG instruction with AX as one of the operands is represented by a single-byte opcode. The general forms of XCHG (for example, XCHG r16,r16) are always 2 bytes long instead. This implies that there are actually two different opcodes that will do the job for a given combination of operands; for example, XCHG AX,DX. True enough—and some assembler programs are smart enough to choose the shortest form possible in any given situation. If you are hand-assembling a sequence of raw opcode bytes, say, for use in a higher-level language INLINE statement, you need to be aware of the special cases, and all special cases will be marked as such in the Legal forms section.

When you want to use an instruction with a certain set of operands, make sure you check the Legal forms section of the reference guide for that instruction to make sure that the combination is legal. The

MOV instruction, for example, cannot move one segment register directly into another, nor can it move immediate data directly into a segment register. Neither combination of operands is a legal form of the

MOV instruction, though they make sense and would be nice to have.

In the example reference page on the NEG instruction, you see that a segment register cannot be an operand to NEG. (If it could, there would be a NEG sr item in the Legal forms list.) If you want to negate the value in a segment register, you'll first have to use MOV to move the value from the segment register into one of the general-purpose registers before using NEG on the general-purpose register, and finally moving the negated value back into the segment register. (Note well that using

NEG on a segment register is an almighty peculiar thing to do, and for that reason, that form of NEG was not given any transistor budget in the real mode portion of the x86 CPUs.)

Operand Symbols

The symbols used to indicate the nature of the operands in the Legal forms section are summarized at the bottom of every page in the reference appendix. They're close to self-explanatory, but I'll take a moment to expand upon them slightly here:

r8— An 8-bit register half, one of AH, AL, BH, BL, CH, CL, DH, or DL.

r16— A 16-bit general-purpose register, one of AX, BX, CX, DX, BP, SP, SI, or DI.

sr— One of the four segment registers, CS, DS, SS, or ES.

m8— An 8-bit byte of memory data.

m16— A 16-bit word of memory data.

m32— A 32-bit word of memory data.

i8— An 8-bit byte of immediate data.

i16— A 16-bit word of immediate data.

i32— A 32-bit word of immediate data.

d8— An 8-bit signed displacement. We haven't covered these yet, but a displacement is a distance between the current location in the code and another place in the code to which we want to jump. It's signed (that is, either negative or positive) because a positive displacement jumps you higher

(forward) in memory, whereas a negative displacement jumps you lower (back) in memory. We examine this notion in detail in Chapter 10.

d16— A 16-bit signed displacement. Again, for use with jump and call instructions. See Chapter

10.

d32— A 32-bit signed displacement.

Examples

Whereas the Legal forms section shows what combinations of operands is legal for a given instruction, the Examples section shows examples of the instruction in actual use, just as it would be coded in an assembly language program. I've tried to put a good sampling of examples for each instruction, demonstrating the range of different possibilities with the instruction. This includes situations that require type override specifiers, which I cover in the next section.

Notes

The Notes section of the reference page describes the instruction's action briefly and provides information on how it affects the flags, how it may be limited in use, and any other detail that needs to be remembered, especially things that beginners would overlook or misconstrue.

What's Not Here ...

Appendix A differs from most detailed assembly language references in that it does not have the binary opcode encoding information, nor indications of how many machine cycles are used by each form of the instruction.

The binary encoding of an instruction is the actual sequence of binary bytes that the CPU digests and recognizes as the machine instruction. What we would call POP AX, the machine sees as the binary number 58H. What we call ADD SI,07733H, the machine sees as the 4-byte sequence 81H 0C6H 33H 77H. Machine instructions are encoded into anywhere from one to four (rarely more) binary bytes depending on what instruction they are and what their operands are. Laying out the system for determining what the encoding will be for any given instruction is extremely complicated, in that its component bytes must be set up bit by bit from several large tables. I've decided that this book is not the place for that particular discussion and have left encoding information out of the reference appendix.

Finally, I've included nothing anywhere in this book that indicates how many machine cycles are expended by any given machine instruction. A machine cycle is one pulse of the master clock that makes the PC perform its magic. Each instruction uses some number of those cycles to do its work, and the number varies all over the map depending on criteria that I won't be explaining in this book.

Furthermore, as Michael Abrash explains in his immense book Michael Abrash's Graphics Programming Black Book (Coriolis Group Books, 1997), knowing the cycle requirements for individual instructions is rarely sufficient to allow even an expert assembly language programmer to calculate how much time a given series of instructions will take. He and I both agree that it is no fit subject for beginners, and I will let him take it up in his far more advanced volume.