Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

The Semiautomatic Weapon: STOSW without REP

I chose to show you REP STOSW first because it's dramatic in the extreme. But even more, it's actually simpler to use REP than not to use REP.REP simplifies string processing from the programmer's perspective, because it brings the instruction loop inside the CPU. You can use the STOSW instruction without REP, but it's a little more work. The work involves setting up the instruction loop outside the CPU and making sure it's correct.

Why bother? Simply this: With REP STOSW, you can only store the same value into the destination string. Whatever you put into AX before executing REP STOSW is the value that gets fired into memory CX times. STOSW can be used to store different values into the destination string by firing it semiautomatically and changing the value in AX between each squeeze of the trigger.

Also, by firing each character individually, you can change the value in DI periodically to break up the data transfer into separated regions of memory instead of one contiguous area as you must with REP STOSW. This may be hard to picture until you see it in action. The SHOWCHAR program listing I present a little later will give you a for instance that will make it instantly clear what I mean.

You lose a little time in handling the loop yourself, outside the CPU. This is because there is a certain amount of time spent in fetching the loop's instruction bytes from memory. Still, if you keep your loop as tight as you can, you don't lose an objectionable amount of speed, especially on the newer processors like the Pentium.

Who Decrements CX?

Early in my experience with assembly language, I recall being massively confused about where and when the CX register was decremented when using string instructions. It's a key issue, especially when you don't use the REP prefix.

When you use REP STOSW (or REP with any of the string instructions), CX is decremented automatically, by 1, for each memory access the instruction makes. And once CX gets itself decremented down to 0, REP STOSW detects that CX is now 0 and stops firing into memory. Control then passes down to the next instruction in line. But take away REP, and the automatic decrementing of CX stops. So, also, does the automatic detection of when CX has been counted down to 0.

Obviously, something has to decrement CX, since CX governs how many times the string instruction accesses memory. If STOSW doesn't do it-you guessed it-you have to do it somewhere else, with another instruction.

The obvious way to decrement CX is to use DEC CX. And the obvious way to determine if CX has been decremented to 0 is to follow the DEC CX instruction with a JNZ (Jump if Not Zero) instruction. JNZ tests the Zero flag ZF and jumps back to the STOSW instruction until ZF becomes true. And ZF becomes true when a DEC instruction causes its operand (here, CX) to become 0.

The LOOP Instructions

With all that in mind, consider the following assembly language instruction loop. Note that I've split it into three parts by inserting blank lines:

DoChar: stosw

 

; Note that there's no REP prefix!

add

AL,'1'

; Bump the character value in AL up by 1

aaa

AL,'0'

; Adjust AX to make this a BCD addition

add

; Basically, put binary 3 in AL's high nybble

mov

AH,07

; Make sure our attribute is still 7

dec

CX

; Decrement the count by 1..

jnz

DoChar

; ..and loop again if CX > 0

Ignore the block of instructions in the middle for the time being. What it does is what I suggested could be done a little earlier: change AX inbetween each store of AX into memory. I'll explain in detail shortly.

Look instead (for now) to see how the loop runs. STOSW fires, AX is modified, and then CX is decremented. The JNZ instruction tests to see if the DEC instruction has forced CX to zero. If so, the Zero flag ZF is set, and the loop will terminate. But until ZF is set, the jump is made to the label

DoChar, where STOSW fires yet again.

There is a simpler way, using an instruction I haven't discussed until now: LOOP. The LOOP instruction combines the decrementing of CX with a test and jump based on ZF. It looks like this:

DoChar: stosw

 

; Note that there's no REP prefix!

add

AL,'1'

; Bump the character value in AL up by 1

aaa

AL,'0'

; Adjust AX to make this a BCD addition

add

; Basically, put binary 3 in AL's high nybble

mov

AH,07

; Make sure our attribute is still 7

loop

DoChar

; Go back & do another char until CX goes to 0

The LOOP instruction first decrements CX by 1. It then checks the Zero flag to see if the decrement operation forced CX to zero. If so, it falls through to the next instruction. If not (that is, if ZF remains 0, indicating that CX was still greater than 0), LOOP branches to the label specified as its operand.

So, the loop keeps looping the LOOP until CX counts down to 0. At that point, the loop is finished, and execution falls through and continues with the next instruction following the loop.

Displaying a Ruler on the Screen

As a useful demonstration of when it makes sense to use STOSW without REP (but with LOOP) let me offer you another item for your video toolkit.

The Ruler macro which follows displays a repeating sequence of ascending digits from 1 at some selectable location on your screen. In other words, you can display a string of digits like this at the top of a window:

123456789012345678901234567890123456789012345678901234567890

This might allow you to determine where in the horizontal dimension of the window a line begins or some character falls. The macro allows you to specify how long the ruler is, in digits, and where on the screen it will be displayed.

A call to Ruler would look like this:

Ruler VidOrigin,20,80,15,5

This invocation (assuming you had defined VidOrigin to be the address of the start of the video refresh buffer in your machine) places a 20-character long ruler at position 15,5. The "80" argument indicates to Ruler that your screen is 80 characters wide. If you had a wider or narrower text screen, you would have to change the "80" to reflect the true width of your screen in text mode.

Don't just read the code inside Ruler! Load it up into a copy of EAT5.ASM, and display some rulers on the screen. You don't learn half as much by just reading assembly code as you do by loading and using it!

;---------------------------------------------------------------

; RULER -- Displays a "1234567890"-style ruler on-screen

;Last update 9/16/99

;Caller must pass:

;In VidAddress: The address of the start of the video buffer

;

In

Length:

The

length of the ruler to be displayed

;

In

ScreenW:

The width of the current screen (usually 80)

;

In

ScreenY:

The

line of the screen where the ruler is

;

In ScreenX:

to be displayed (0-24)

;

The row of the screen where the ruler should

;

Action:

start (0-79)

;

Displays an ASCII ruler at ScreenX,ScreenY.

;

---------------------------------------------------------------

Ruler

5 ;VidAddress,Length,ScreenW,ScreenX,ScreenY

%macro

 

 

les

DI,[%1] ; Load video address to ES:DI

 

 

mov

AL,%5

; Move Y position to AL

 

 

mov

AH,%3

; Move screen width to AH

 

 

imul

AH

; Do 8-bit multiply AL*AH to AX

 

 

add

DI,AX

; Add Y offset into vidbuff to DI

 

 

add

DI,%4

; Add X offset into vidbuf to DI

 

 

shl

DI,1

; Multiply by two for final address

 

 

mov

CX,%2

; CX monitors the ruler length

 

 

mov

AH,07

; Attribute 7 is "normal" text

 

 

mov

AL,'1'

; Start with digit "1"

%%DoChar: stosw

AL,'1'

; Note that there's no REP prefix!

 

 

add

; Bump the character value in AL up by 1

 

 

aaa

AL,'0'

; Adjust AX to make this a BCD addition

 

 

add

; Basically, put binary 3 in AL's high nybble

 

 

mov

AH,07

; Make sure our attribute is still 7

%endmacro

loop

%%DoChar ; Go back & do another char until BL goes to 0

 

 

 

Over and above the LOOP instruction, there's a fair amount of new assembly technology at work here that could stand explaining. Let's detour from the string instructions for a bit and take a closer look.

Simple Multiplies with IMUL

Ruler can put its ruler anywhere on the screen at all, to a position passed as ScreenX and ScreenY. It's not using GotoXY, either. It's actually calculating a position in the video refresh buffer where the ruler characters must be placed-and then using STOSW to place them there.

Locations in the text video refresh buffer are always expressed as offsets from a single segment address that is either B000H or B800H. The algorithm for determining the offset in bytes for any given

X and Y values looks like this:

Offset = ((Y * width in characters of a screen line) + X) * 2

Pretty obviously, you have to move Y lines down in the screen buffer, and then move X bytes over from the left margin of the screen to reach your X,Y position.

The trickiest part of implementing the algorithm lies in multiplying the Y value by the screen width. There is an instruction to do the job, IMUL, but it's a little quirky and (as assembly instructions go) not very fast.

It is, however, fast enough for what we're doing here, which is just positioning the ruler somewhere on the screen. The positioning only needs to be done once, not many times within a tight loop. So, even if

IMUL is slow as instructions go (and it's much faster than it used to be on ancient machines like the 8086 and 8088), when you only need to use it to set up something else, it's certainly fast enough.

IMUL always operates in conjunction with the AX register. In every case, the destination for the product value is AX, or else AX and DX for products larger than 32,767.

On the simpler CPUs operating in real mode, there are basically two variations on IMUL, and the difference turns on the size of the operands. If you are multiplying two 8-bit quantities, you can put one in AL and the other in some 8-bit register or memory location. The product will be placed in AX. If you are multiplying two 16-bit quantities, one can be placed in AX and one in a 16-bit register or memory location. The product from multiplying two 16-bit quantities is too large to fit in a single 16-bit register, so the low-order 16 bits are placed in AX, and the high-order 16 bits are placed in DX. You have no

control over the destination; it's either AX or AX:DX. Also, one of the operands must be in AL (for 8-bit multiples) or AX (for 16-bit multiples). You have no control over that; it's impossible to multiply (for example) CX x BX, or DX x DS:[BX].

One very common bug you may commit when using IMUL is simply forgetting that when given 16-bit operands, IMUL changes the value in DX. The easiest way to avoid this problem is to use IMUL in its 8-bit mode whenever possible, which is when both multiplier and multiplicand are less than 256. If either operand is 16 bits in size, DX will be altered.

Here are some examples of some legal forms of IMUL:

imul

byte [bx] ;

multiplies

AL

× byte at DS:[BX]

imul

bh

; multiplies AL × BH

imul

word [bx] ;

multiplies

AX

×

word at DS:[BX]

imul

bx

; multiplies AX ×

BX

In the first two lines, the destination for the product is AX. In the second two lines, the destination for the product is DX:AX

IMUL sets two flags in those cases where the product is larger than the two operands. The flags involved are the Carry flag CF and the Overflow flag OF. For example, if you're multiplying two 8-bit operands and the product is larger than 8 bits, both CF and OF will be set. Otherwise, the two flags will be cleared.

Now, why the final multiplication by 2? Keep in mind that every character position in the screen buffer is represented by 2 bytes: One character byte and one attribute byte. So, moving X characters from the left margin actually moves X × 2 bytes into the screen buffer. You might think of an 80-character line on the screen as being 80 characters long, but it's actually 160 characters long in the screen buffer, to account for the invisible attribute bytes.

This multiplication by 2 is done by using the SHL instruction to shift DI to the left by one bit. As I explained in Chapter 10, this is exactly the same as multiplying DI by 2.

The Limitations of Macro Arguments

There's another problem you will eventually run into if you're like most people. Given the macro header for Ruler:

%macro Ruler 5 ;VidAddress,Length,ScreenW,ScreenX,ScreenY

you might be tempted to write something like this:

mov al,%5

imul %3

No go! The assembler will call you on it, complaining of an illegal immediate. What went wrong? You can freely use constructions like these:

mov al,%5

add di,%4

cmp al,%2

All of these use arguments from the macro header. So what's that assembler complaining about? The problem here is that on the simpler CPUs, the IMUL instruction cannot work with immediate operands. (Starting with the 386, a separate form of IMUL can take an immediate operand.) And this isn't just a problem with IMUL; all instructions that cannot work with immediate operands will reject a macro argument under these circumstances.

And "these circumstances" involve the way that the macro is invoked. In an early test version of the Ruler macro that used the IMUL %3 instruction shown previously, I tried to use the following line, which

invokes the macro to display a ruler:

Ruler VidOrigin,20,80,50,10 ; Draw ruler

It didn't work! Except for the video origin address argument, all of these arguments are numeric literals. A numeric literal, when used in an assembly language instruction, is called immediate data. When the macro is expanded, the argument you pass to the macro is substituted into the actual instruction that uses a macro argument, just as you passed it to the macro.

In other words, if you pass the value 10 in the %5 argument (ScreenY), the instruction MOV AL,%5 becomes MOV AL,10 once the macro is expanded by the assembler. Now, MOV AL,10 is a completely legal instruction. But if you pass the literal value 80 in the %3 argument (ScreenW), you cannot use

IMUL %3, because after expansion this becomes IMUL 80, which is not a legal instruction on anything older than the 386.

The problem is not that you're using macro arguments with IMUL. The problem is that you're passing a numeric literal in a macro argument to an instruction that (in the form we're using it) cannot accept immediate data.

The version of Ruler given in MYLIB.MAC loads the %3 (ScreenW) argument into AH using a MOV instruction. This means that you can use numeric literals when invoking Ruler. Using literals saves memory by making memory variables unnecessary, and if you'd prefer to define a meaningful name for the screen width rather than hard coding the value 80 in the source (which is unwise), you can define a symbol called ScreenWidth as an equate. I explained equates at the end of Chapter 10. The SHOWCHAR.ASM routine defines a ScreenWidth value as an equate.

Adding ASCII Digits

Once the correct offset into the buffer for the ruler's beginning is calculated in DI (and once we set up initial values for CX and AX), we're ready to start making rulers.

Immediately before the STOSW instruction, we load the ASCII digit '1' into AL. Note that the instruction

MOV AL,'1' does not move the numeric value 01 into AL! The '1' is an ASCII character, and the character '1' (the "one" digit) has a numeric value of 31H, or 49 decimal.

This becomes a problem immediately after we store the digit '1' into video memory with STOSW. After digit '1,' we need to display digit '2,' and to do that we need to change the value stored in AL from '1' to '2.'

Ordinarily, you can't just add '1' to '1' and get '2'; 31H + 31H will give you 62H, which (when seen as an ASCII character) is lowercase letter b, not '2'! However, in this case the x86 instruction set comes to the rescue, in the form of a somewhat peculiar instruction called AAA, Adjust AL after BCD Addition.

What AAA does is allow us, in fact, to add ASCII character digits rather than numeric values. AAA is one of a group of instructions called the BCD instructions, so called because they support arithmetic with Binary Coded Decimal (BCD) values. BCD is just another way of expressing a numeric value, somewhere between a pure binary value like 01 and an ASCII digit like '1.' A BCD value is a 4-bit value, occupying the low nybble of a byte. It expresses values between 0 and 9 only. (That's what the "decimal" part of "Binary Coded Decimal" indicates.) It's possible to express values greater than 9 (from 9 to 15, actually) in 4 bits, but those additional values are not valid BCD values. See Figure 11.1.

Figure 11.1: Unpacked BCD digits.

The value 31H is a valid BCD value, because the low nybble contains 1. BCD is a 4-bit numbering system, and the high nybble (which in the case of 31H contains a 3) is ignored. In fact, all of the ASCII digits from '0' through '9' may be considered legal BCD values, because in each case the characters' low 4 bits contain a valid BCD value. The 3 stored in the high four bits of each digit is ignored.

So, if there were a way to perform BCD addition on the 86-family CPUs, adding '1' and '1' would indeed give us '2' because '1' and '2' can be manipulated as legal BCD values.

AAA (and several other instructions I don't have room to discuss here) gives us that ability to perform BCD math. The actual technique may seem a little odd, but it does work. AAA is in fact a sort of a fudge factor, in that you execute AAA after performing an addition using the normal addition instruction

ADD. AAA takes the results of the ADD instruction and forces them to come out right in terms of BCD

math.

AAA basically does these two things:

It forces the value in the low 4 bits of AL (which could be any value from 0 to F) to a value between 0 and 9 if they were greater than 9. This is done by adding 6 to AL and then forcing the high nybble of AL to 0. Obviously, if the low nybble of AL contains a valid BCD digit, the digit in the low nybble is left alone.

If the value in AL had to be adjusted, it indicates that there was a carry in the addition, and thus AH is incremented. Also, the Carry flag CF is set to 1, as is the Auxiliary carry flag AF. Again, if the low nybble of AL contained a valid BCD digit when AAA was executed, AH is not incremented, and the two Carry flags are cleared (forced to 0) rather than set.

AAA thus facilitates base 10 (decimal) addition on the low nybble of AL. After AL is adjusted by AAA, the low nybble contains a valid BCD digit and the high nybble is 0. (But note well that this will be true only if the addition that preceded AAA was executed on two valid BCD operands! And ensuring that those operands are valid is your responsibility, not the CPU's!)

This allows us to add ASCII digits such as '1' and '2' using the ADD instruction. Ruler does this

immediately after the STOSW instruction:

add

AL,'1'

;

Bump the character value in AL up by 1

aaa

 

;

Adjust AX to make this a BCD addition

If prior to the addition the contents of AL's low nybble were 9, adding '1' would make the value 0AH, which is not legal BCD. AAA would then adjust AL by adding 6 to AL and clearing the high nybble. Adding 6 to 0A would give 10, so once the high nybble is cleared the new value in AL would be 00. Also, AH would have been incremented by 1.

In Ruler we're not adding multiple columns but instead are simply rolling over a count in a single column and displaying the number in that column to the screen. Therefore, we just ignore the incremented value in AH and use AL alone.

Adjusting AAA's Adjustments

There is one problem: AAA clears the high nybble to 0. This means that adding '1' and '1' doesn't quite equal '2,' the displayable digit. Instead, AL becomes 02, which in ASCII is the dark "smiley face" character. To make AL a displayable ASCII digit again, we have to add 30H to AL. This is easy to do: Just add '0' to AL, which has a numeric value of 30H. So, adding '0' takes 02H back up to 32H, which is the numeric equivalent of the ASCII digit character '2.' This is the reason for the ADD AL,'0' instruction that immediately follows AAA.

There's a lot more to BCD math than what I've explained here. When you want to perform multiplecolumn BCD math, you have to take carries into account, which involves a new flag called the Auxiliary carry flag AF. There are also the AAD, AAM, and AAS instructions for adjusting AL after BCD divides, multiplications, and subtractions, respectively. The same general idea applies: All the BCD adjustment instructions force the standard binary arithmetic instructions to come out right for BCD operands.

And yet another problem: AAA increments AH whenever it finds a value in the low nybble of AL greater than 9. In Ruler, AH contains the text attribute we're using to display our ruler, and if AH is incremented, the attribute will change and we'll end up displaying parts of the ruler in different colors. This is why we have to do one last adjustment to AAA's adjustments: We reassert our desired text attribute in AH each time we change the ASCII digit in AL.

An interesting thing to do is comment out the ADD AL,'0' instruction in the Ruler macro and then run the RULER.ASM test program. Another interesting thing to do (especially if you work on a color screen, as you almost certainly do) is to comment out the MOV AH,07 instruction in Ruler and then run RULER.ASM. Details count, big time!

Ruler's Lessons

The Ruler macro is a good example of using STOSW without the REP prefix. We have to change the value in AX every time we store AX to memory, and thus can't use REP STOSW. Note that nothing is done to ES:DI or CX while changing the digit to be displayed, and thus the values stored in those registers are held over for the next execution of STOSW. Ruler is a good example of how LOOP works with STOSW to adjust CX downward and return control to the top of the loop. LOOP, in a sense, does outside the CPU what REP does inside the CPU: adjust CX and close the loop. Try to keep that straight in your head when using any of the string instructions!