Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

The Perks of Protected Mode

I've said plenty of times that x86 protected mode is a wonderful thing, but I've never actually come out and said what it gives you. It's a long list, and I can't cover it all in detail, but in truth, while you're just starting out, most of it will be under the covers inside the operating system and not something you can build into your own programs.

In short, from the perspective of beginning assembly programmers, it comes down to more instructions, more versatile registers, a more stable and predictable environment, and no segments! (Can you tell which part I like best?)

You Know You Have a 386 or Better

An excellent and underappreciated thing about protected mode is simply this: You know you're running on a 386 or more advanced Intel processor. There's much less to be concerned about in terms of whether you can use certain instructions as there are when you're running DOS. Nearly every new processor family that Intel has released has added some instructions to the x86 instruction set, but the really big gulf is between the 386 and those CPUs that came before it. Thirty-two-bit protected mode is not present in the 8088, 8086, or 286, so whatever limitations are attached to those processors you can just forget.

No Segments!

I explained the nature of 32-bit flat model in earlier chapters and won't recap too thoroughly here. Segments still exist in 32-bit protected mode, but as each segment can be as large as 4 GB, all the segments are basically in the same memory space, and thus factor out. (This is why we call it "flat.") The 32-bit offset address can be considered the sole address for an item, and it may be contained in a single 32-bit register.

This means that we need no longer be concerned about such things as segment overrides, or recalling whether a data item is addressed relative to DS or ES. This banishes a good deal of complexity from programs, and you'll find that flat model coding is remarkably simple compared to the segment wrestling DOS programmers suffered through starting in 1981.

More Versatile Registers and Addressing

One of the more aggravating limitations of ancient Intel CPUs such as the 8086 and 8088 is that the general-purpose registers weren't exactly general. Addressing memory, for example, was limited to EBX and EBP in most cases, which meant a lot of fancy footwork when several separate items had to be addressed at the same time.

This restriction has pretty much gone away. You can address memory with any of the general-purpose registers. You can even address memory directly with ESP, something that its predecessor SP could not do. (You shouldn't change the value in ESP without considerable care, but ESP can now take part in addressing modes from which the stack pointer was excluded in 16-bit land.)

There's now a general-purpose memory-addressing scheme in which all the GP registers can participate equally, and I've sketched it out in Figure 13.2.

Figure 13.2: Protected mode memory addressing.

When I first saw this, wounds still bleeding from 16-bit 8088-class segmented memory addressing, it looked too good to be true. But it is! Here are the rules:

The base and index registers may be any of the 32-bit general-purpose registers, including ESP.

The displacement may be any 32-bit constant. Obviously, 0, while legal, isn't useful.

The scale must be one of the values 1, 2, 4, or 8. That's it! The value 1 is legal but doesn't do anything useful, so it's never used.

The index register is multiplied by the scale before the additions are done. In other words, it's not (base + index) x scale. Only the index register is multiplied by the scale.

All of the elements are optional and may be used in almost any combination.

This last point is worth enlarging upon. There are several different ways you can address memory, by gathering the components in the figure in different combinations. Examples are shown in Table 13.1.

Table 13.1: Protected Mode Memory-Addressing Schemes

 

 

 

 

 

 

 

 

SCHEME

 

EXAMPLE

 

DESCRIPTION

 

 

 

 

 

 

 

 

 

[BASE]

 

[edx]

 

Base only

 

 

 

 

 

 

 

 

 

[DISP.]

 

[0x4044d72a]

 

Displacement (constant address) only

 

 

 

 

 

 

 

 

 

[BASE + DISP.]

 

[ecx + 17]

 

Base plus displacement

 

 

 

 

 

 

 

 

 

[INDEX × SCALE]

 

[ebx * 4]

 

Index times scale

 

 

 

 

 

 

 

 

 

[INDEX × SCALE + DISP.]

 

[eax * 8 + 65]

 

Index times scale plus displacement

 

 

 

 

 

 

 

 

 

[BASE + INDEX × SCALE]

 

[esp + edi * 2]

 

Base plus index times scale

 

 

 

 

 

 

 

 

 

[BASE + INDEX × SCALE +

 

[esi + ebp * 4 +

 

Base plus index times scale plus

 

 

DISP.]

 

9]

 

displacement

 

 

 

 

 

 

 

 

Note here that the displacement term in an address can be any constant value from 0 to 0xffffffff. (Hey, all those little fs look funny to me, too, but we're in Unixland now, where Capital Letters Are For Engraving In Stone, sheesh.) So, although 0x4044d72a may seem like a different beast than the number 17, they're both legal 32-bit quantities. The numbers are probably used for different things: 0x4044d72a is most likely a full 32-bit address, whereas 17 is probably an offset into a table. However, both are legal and may be considered valid displacement components in a protected mode memory address.

There's a slightly dark flip side to this new and expanded register picture:

Using the 16-bit general-purpose registers AX, BX, CX, DX, SP, BP, SI, and DI will slow you down. Now that 32-bit registers rule, making use of the 16-bit registers is considered a special case that adds to the size of the opcodes that the assembler generates, and slows your code down. Now, note well that by "use" I mean explicitly reference in your source code. The AX register, for example, is still there inside the silicon of the CPU (as part of the larger EAX register) and placing data there won't slow you down. You just can't place data in AX by using "AX" as an operand in an opcode and not slow down. This syntax generates a slow opcode:

mov ax,542

You can do the same thing this way, and the opcode NASM generates will execute much more quickly:

mov eax,542

It's time to kiss those old 16-bit register names good-bye.

More Instructions

Most beginners probably think that the "new" instructions available with the 386 and later processors are the best part of working in 32-bit protected mode, but that's a pretty naïve view. I think of those new instructions as the least of the advantages of protected mode. There are two major reasons for this opinion:

The majority of the new instructions are way-down-deep items of use almost exclusively by those who write system software, that is, device drivers and especially operating systems. These new instructions are in fact the machinery by which protected mode is configured and managed. In most cases the operating system won't let you use them—not that they're especially useful in writing simple applications and utilities.

The really useful new instructions aren't new at all, but are simply more powerful ways of using the old familiar instructions such as PUSH, SHL, and SHR, coupled with the more versatile memory addressing I just finished explaining. Even these are relatively few.

All that being said, there are some useful new instructions that were introduced with the 386, and I'll take a little time to highlight the most useful of them. One thing I won't be covering here are the instructions introduced with the 486 and Pentium family. Why? To use them, you have to be sure you're using a 486, a Pentium, or whatever CPU in which the instructions were first implemented, or later—and that's generally more trouble than it's worth, especially when you're first starting out. (My favorite of those gotchas is this: The Pentium introduced an instruction called CPUID, which tells you what CPU you're using . . . but you have to be sure you have at least a Pentium under the sheet metal before you dare use it!)

More Versatile Pushes and Pops

First of all, you can now push immediate values onto the stack with the PUSH instruction. This is most useful when calling C library functions that expect certain values to be placed on the stack before the call, as we'll see later in this chapter. Here's an example:

push 0x4044d72a

The immediate operand can be any value that fits in 32 bits.

The 386 introduced the ability to push and pop all 32-bit GP registers at once. The PUSHAD instruction pushes EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI onto the stack. The POPAD instruction pops values off the stack into these same registers. (Sixteen-bit equivalents to PUSHAD and POPAD were introduced with the 286, but are not particularly useful in a 32-bit memory model like the one Linux uses.) It's possible to use PUSHAD and POPAD to save and restore the registers coming into and going out of the main programs you write under Linux. However, in creating BOILER.ASM, I stuck with the more limited C calling conventions, which only saves EBX, EBP, ESI, and EDI—and ESP inside EBP. Note that the value pushed onto the stack for ESP is not popped back into ESP by POPAD, but is simply discarded.

The related instructions PUSHFD and POPFD push and pop the EFLAGS register to and from the stack. They are the 32-bit equivalents of PUSHF and POPF, which were available on the 8086/8088. Pushing EFLAGS onto the stack with PUSHFD and then popping the pushed value off the stack into a 32-bit register is one way to get a copy of the EFLAGS register that can be examined at your leisure.

More Versatile Shifts and Rotates

As I said earlier, the best of the new instructions are simply enhancements to instructions you encountered on the 8086/8088. Among the best of these are enhancements to the shift and rotate instructions. There are six such instructions: SHL, SHR, ROL, ROR, RCL, and RCR. (The instructions SAL and SAR are just duplicate names for SHL and SHR.) I dealt with the shift instructions in Chapter 10, as they exist on the 8088 and 8086. For those ancient CPUs, you can express the number of bits by which to shift in one of only two ways:

shl AX,1 ; Shift left by 1

shl AX,CL ; Shift left by number in CL

(Note that this discussion applies to any of the shift/rotate instructions, and not just SHL.) To shift an operand by 1 bit, you could specify the literal value 1. To shift by any greater number of bits greater than 1, you had to first load a count value into the CL register, and then use CL as the second operand. Well, that was the 16-bit world. In 32-bit protected mode you can drop the use of CL and use an immediate value for any legal shift values, 1 or whatever up to 31. It becomes legal to use instructions that look like this:

shl eax,17

Note that the shift count is limited to 31. If you shift a 32-bit operand by 32 or more bits in either direction, you're left with nothing but zeros in the operand, because all significant bits will be shifted completely out of the operand into nothingness. So, for the shift instructions, at least, shifting by more than 31 bits is meaningless.

It's less obviously true for the rotate instructions, but here, too, there's no advantage to rotating a value by more than 31 bits. The rotate instructions, if you recall, rotate bits off one end of the operand and then feed them back into the opposite end of the operand, to begin the trip again. If you mentally follow a single bit through the rotation process, you'll realize that after 32 rotations, any given bit is where it was when you started rotating the value. What's true of one bit is true of them all, so 31 rotations is as much as will be useful on a 32-bit value. This is why, in protected mode programming (and on the 286 as well), the shift-by count is truncated to 5 bits: The largest value expressible in 5 bits is . . . 32!

Looking for 0 Bits with BT

Back in Chapter 10 I introduced the TEST instruction, which allows you to determine whether any given bit in a byte or word is set to 1. As I explained, TEST has its limits: It's not cut out for determining when a bit is set to 0.

The 386 and newer processors have an instruction that allows you to test for either 0 bits or 1 bits. BT (Bit Test) performs a very simple task: It copies the specified bit from the first operand into the Carry flag CF. In other words, if the selected bit was a 1 bit, the Carry flag becomes set. If the selected bit was a 0 bit, the Carry flag is cleared. You can then use any of the conditional jump instructions that examine and act on the state of CF.

BT is easy to use. It takes two operands: The first one is the value containing the bit in question. The second operand is the ordinal number of the bit you want to test, starting from 0:

bt <value containing bit>,<bit number>

Once you execute a BT instruction, you should immediately test the value in the Carry flag and branch based on its value. Here's an example:

bt eax,4

;

Test bit 4 of AX

jnc quit

;

We're all done if bit 4 = 0

Note that we're branching if CF is not set; that's what JNC (Jump if Not Carry) does.

I hate to discuss code efficiency too much in a beginners' book, but there is a caution here: The BT instruction is pretty slow as instructions go—and bit-banging is often something you do a great many times inside tight loops, where instruction speed can be crucial. Using it here and there is fine, but if you're inside a loop, consider whether there might be a better way to test bits. Creaky old TEST is much faster . . . but TEST only tests for 1 bits. Depending on your application, you may be able to test for 0 bits more quickly another way, perhaps shifting a value into the Carry flag with SHL or SHR, using NOT to invert a value . . . There are no hard and fast rules, and everything depends on the dynamics of what you're doing. (That's why I'm not teaching optimization in this book!)

Crash Protection

This sounds wonderful, but you have to understand: The protection in "protected mode" is for the

operating system. Programs that you write will crash right and left, trust me. However, no matter what idiotic things your program might do, either accidentally or on purpose, its chances of bringing down Linux in flames are close to nil. In all the time I've been using Linux, I have never crashed the operating system. Not even once. It is far and away the most robust OS I've ever touched, and that includes Windows NT, which I use every day and have for five years.

On the other hand, this benefit cuts both ways. Linux is a multitasking operating system, and many programs can be executing at the same time. The features of protected mode also serve to protect the other programs from your program—and your programs from the other programs. Bullying is prohibited.

You will encounter the protection mechanism sooner or later, most likely when you try to address a portion of memory for which your program does not have permission. You must keep in mind that although a 32-bit memory address can theoretically run from 0 to 0xffffffff, your program does not have permission to access all of those addresses. And by access I mean write or read! You can't just start from address 0 and inspect every memory location your computer has. Snooping is prohibited too—except for your own little corner of Linux's world.

The message that comes up under Red Hat 6 for protection errors is this:

Segmentation fault (core dumped)

Not very helpful in and of itself, huh? This is why gdb is so crucial—and why I spent so much of Chapter 12 on it. If you're single-stepping through gdb, you will (in most cases) know precisely which instruction causes the problem, because the fault will be thrown as soon as you single-step that instruction. If that instruction references memory, you can probably assume that it references a region of memory for which you don't have permission. You may also discover that a protection fault occurs during a C library call, but what that means is that you passed a bad value of some sort to the C library. This is less common, and you simply have to take a much closer look at what you're passing to the library code.

What does the "core dumped" part of the message mean? When a segmentation fault occurs, Linux creates a kind of postmortem file containing a description of the machine's state when the fault happened, including a snapshot of your program's binary code. This file's name defaults to "core" and it could be useful in debugging except that NASM does not currently embed the same information in its .o files that gcc embeds in its .o files generated from C programs. The core file is therefore much more difficult to interpret for NASM programs than for C programs. The NASM team indicates that this is on its to-do list for the assembler, and with some luck we'll see that feature added soon. All the more reason to watch the NASM Web site for new releases!