Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Assembly Language Step by Step 1992

.pdf
Скачиваний:
145
Добавлен:
17.08.2013
Размер:
7.98 Mб
Скачать

I sometimes think back with some wonderment at the fact that I replaced the carburetor of my first car (a 1968 Chevelle I called "Shakespeare") in front of my mother's house on a freezing, windy day in January of 1974. All this without shelter of any kind, with marginal tools, and with no light but the light from the sky. I had never done it before, but it worked right the first time, and I saved a bundle of money that I didn't have anyway.

One reason that I call my 1984 Plymouth Voyager the "Magic Van" is that, having looked carefully under the hood, I can only conclude that the damned thing runs by magic. I don't think I could replace the carburetor on the Magic Van. If pressed, I'm not even sure I could open the hood and show you where it was. (I'm not, in fact, quite certain that cars even have carburetors anymore!)

This is one reason that I bought a restorable 1969 Chevelle this past winter. I'm not an auto mechanic and have no desire to be, but I enjoyed repairing Shakespeare and tuning him up, because it was simple and straightforward and required no greater skill than I cared to learn. The point I want to make here is that the game of repairing cars has changed drastically since 1968. What was once a simple matter of aligning a timing mark on a pulley with a scratch on the engine block has now become a coordinated effort of getting a half dozen embedded microcontrollers to send signals to complicated electromechanical components at all the correct times. It would take me years to learn how to do all that, and I'd really rather be programming or building radios.

Similarly, the game of programming has begun to change drastically since the end of the 1980s. What I've described in this book so far has been necessary groundwork that everyone should learn in becoming an effective PC programmer. However, until fairly recently, the situation I described in this book has been pretty much the whole story. There was the 8088/8086 CPU and its instruction set, segmented memory, and DOS. If you learned only that (and learned it completely and well, of course), you could write significant software in assembly that was the equal of what you could buy on the open market.

Times change. As with my poor Chevelle, the programs I wrote in the middle 1980s now seem modest to the point of being quaint. Big things have been happening to the PC since 1989 or so, and those changes are by no means complete. They involve both hardware and software, and extend to the core of the assumptions we make when we place machine instructions together on the screen.

This is the final chapter in this book, and I did not want to leave you with a false impression of having "learned it all." There is more, much more to be done. The topics I'll mention here could be addressed in whole volumes. At best, I can give you your bearings. Hold on to your

chair—but let's go.

11.1 A Short History of the CPU Wars

I wrote most of the first edition of this book in 1988, which (as I've suggested above) was a much simpler time. The 80286 microprocessor was the standard CPU, but almost nobody used its special features. It was (and is) used almost exclusively as a fast 8088. The 80386 was around (I jumped in quick and have had one since the end of 1986), but it was still considered a little exotic and was usually pretty expensive. Like the 286, the 386 sat on most people's desks as an even faster 8088.

All this changed in 1990. The most signficant event of that year was the appearance of a second source for Intel's 80386 CPU chip. American Micro Devices (AMD) announced a 386 clone, under a contract with Intel that allowed them to second source Intel CPUs. Intel claimed the contract didn't apply to the 386. They sued—and lost. AMD's 386 didn't hit the market in quantity until early 1991, but its effect on Intel was immediate: they started cutting prices on the 386 to make AMD's clone less profitable.

Free Fall

Suddenly, prices on 386 machines went into free fall. Intel's low-cost 386SX chip appeared in quantity (it was designed as a "286 killer" to take the profit out of AMD's 286 product line), accelerating the plunge in prices. CPU speeds, which had initially been stuck at 16 or 20 Mhz, suddenly started creeping up, first to 25 Mhz, and then to 33 Mhz. RAM prices, which had been high at the end of the 1980s, started to plunge as well. By the beginning of 1991, the standard business desktop machine was a 25 Mhz 386 with four megabytes of RAM—often more. The somewhat slower 386SX machines muscled into the "home and personal use" niche previously held by the 80286, and the 80286 came to be seen as a "kiddie" machine—probably because America's dads gave their 286s to Junior when their Taiwan 386SX boxes arrived.

What happened to the 8088s? I'm not sure. I suspect a lot of them are in closets, up there on the second shelf with a busted VCR atop them, and the ratty guest quilt thrown over the pile until Uncle Mack pays another visit.

It may be true that you still have one, and are still using one-—but this is getting less likely all the time. I've found that most people who have the will to try programming have long become impatient with the 8088 and moved on to something faster—especially now that you can buy a complete 386SX machine at Price Club for less than $1000.

Meet the New Boss

In late 1990, Intel finally turned loose their long-in-coming 80486 CPU, which was even

faster than a fast 80386—and the newcomer initiated yet another shuffle down in prices and status. The 486 is now "The Boss" on corporate desktops, and more and more programmers are picking them up as well. The 386 and the 286 have taken a bump down in status, and the 8088—well, when you're on the bottom, how much farther down can you go? I've seen genuine IBM PC systems on sale for as little as $200 on the used market. The no-name, 8088-based XT clones are considered by most used office equipment dealers to have little if any value at all. How long will the 486 stay on top? That depends on how quickly Intel perfects and releases their 80586 CPU chip. The process will proceed as it has proceeded since the early 1980s—only with less and less time between cycles.

So where are we today, in the early 1990s? Published figures indicate that there are about 70,000,000 "countable" PC-type machines in the world. By countable they mean manufactured by firms who are well-known in the industry and release figures on sales. There is, however, another component to the world PC marketplace: the uncounted and uncountable clone boxes assembled here in the US and elsewhere by small, often family firms and sold in small shops and through the mail. Anybody who scopes out the import process can boat in a container lot of motherboards, clone cabinets, and other parts, and be selling completed and tested systems at the next neighborhood "computer swap meet."

How many of these are there? Maybe 25,000,000 worldwide. Maybe more. Nobody has any way to be sure. Those who talk about the battle between the PC and other machines like the Macintosh or Amiga are thinking most wishfully. The battle is over. The PC won by at least 80,000,000 votes.

11.2 Opening Up the Far Horizon

I've gone through this exercise to point up a fact few people ponder much: the 8088 is now the minority player in the PC world. Absolutely no more than a third of the world's PCs sport 8088 CPUs, and the proportion is probably closer to 20% or 25%. Again, because of the nature of the PC business, nobody has any way to be sure. (And the proportion of active 8088 PCs is even smaller— don't forget what's under the busted VCR on the closet shelf!)

DOS Extenders

Something else that came into its own in 1990 was the DOS extender. DOS extenders are extremely clever programs that place the "extra" features of the 286, 386, and 486 at the disposal of DOS programs. By necessity, DOS extend-ers exclude machines based on the 8086 and 8088. What the more advanced processors bring to the table is more memory (lots more) and something called protected mode, which radically alters the programmer's view of the memory system. It won't be possible for me to explain in detail the mechanics of extended memory or protected mode in this book. The important thing to understand now is that with DOS extenders,

the 286, 386, and 486 CPUs are no longer just faster 8088s. Once you understand what they have to offer, you can do some amazing things at an assembly-language level.

Most amazing is release from the tyranny of the 64K segment. A segment in 386 protected mode can be as large as 4 gigabytes—now that's a far horizon! This greatly simplifies dealing with really big data items, and also (because all of the code from a substantial program can exist in single moose of a code segment) simplifies program design and structure.

In short, when a DOS extender is in control of the machine, an application program can be much larger than the customary 640K of DOS memory, and can manipulate individual data items much larger than 64K.

Windows 3.0

1990's third and final blow to the past came in the form of Microsoft Windows 3.0 (Windows). Microsoft finally got both the big picture and the details right, and launched a graphics-oriented DOS shell that everyone seems to be able to agree on.

Windows is more than just a menuing replacement for the DOS prompt. Windows contains its own limited DOS extender technology, and programs written to make use of Windows' features can be much larger than ordinary DOS programs. Windows can also use the hardware multitasking features of the 386 to allow more than one program to run at once.

In a great many ways, Windows has changed the methodology of PC programming forever. Windows has an enormous influence over the shape of programs that run under it and use its services. This is in part because Windows defines literally hundreds of system calls to do all sorts of things, including graphics drawing, some file I/O, and nearly everything you would want to do to interface with the underlying machine.

This is good, because on the flipside, Windows demands that you use its services and not just go out to the hardware and grab whatever you want, whenever you want it. Nor is Windows just being snotty. Whenever you put two programs in a single machine (somewhat like two tomcats in a closet) there is the potential for some bloody fights. Two programs cannot write blithely to the same place in memory at the same time, and Windows, as reluctant referee, demands that both programs submit to its set of rules for peaceful global coexistence.

Event-Driven Programming

But probably the most significant effect Windows has on the nature of program-ming is that it lays out a whole new conceptual model for how a program should work. This new model is called event-driven programming, and while Windows certainly placed it most brightly in the spotlight, other programming systems (like Turbo Vision and Smalltalk) have been using it for some time.

Event-driven programming is a complicated subject, and I'm not going to be able to cover it in

detail in this chapter. I would like to give you a flavor for it so that you can plan your future explorations as a programmer accordingly.

Event-driven programming is a consequence of our operating system get-ting smarter. DOS and Windows are gradually fusing into a new and more powerful operating system with far more capabilities than DOS's simple list of passive services that you call through a software interrupt. Most tellingly, Windows is now an active partner rather than a passive helper.

In the old world, your program was in the driver's seat, asking for assis-tance from DOS when in need. DOS remained passive but ready, not speaking until spoken to. In the old world, your program would go out and ask DOS, "Has the user pressed a key yet?" If a key had been pressed, DOS would meekly hand the key value up to your program and wait for further orders. Windows, on the other hand, takes a far more active role. Although your program is still nominally calling the shots, Windows governs a lot more of the system, especially those parts of the system that interact with the user. Today, what Windows does is tap your program on the shoulder and say, "Hey boss, the user just pressed a key. What are you going to do about it?" That press of a key or click of a mouse button is called an event, and the flow of control of the programs that run under Windows is dictated by the stream of events that the user sends from the keyboard and mouse to the program. The user has a lot more power under an event-driven system. No more is the user necessarily confined by a rigid menu structure within a single program. Now, with a single mouse click, the user can pre-emptively send the current program into the background and start up another one at will—and still return to the first program whenever he or she chooses.

In an event-driven program, the program and the platform (which is the new term for an operating system combined with a particular screen and keyboard management system like Windows) become nearly equal partners. The program calls on the platform for services, just as programs have been calling on DOS for years. But the platform also calls on the program to respond intelligently to things that happen within the platform, things like user-initiated events and critical errors. Program and platform thus speak back and forth continually, by way of a datahandling protocol called message passing.

It sounds complicated, and it is. On the other hand, event-driven programming makes things possible that simply can't be done using older programming models. With Windows acting as an intelligent proctor, multiple programs can operate at once within the same machine, some in the foreground, some in the background, freely passing data back and forth among them. Windows stan-dardizes the protocol for this data transfer, so that the process (while tricky) becomes one that every program can understand if it was built along the Windows model.

Windows and Assembly Language

Can Windows programs be written in assembly language? Of course. Never forget: assembly language is the language of the underlying machine, and any program that can execute on the

machine may be written in assembly language. The more important question is how much trouble that writing will be, and how much time it will take.

And that answer to that question is, a lot, and a long time. Higher-level languages like Pascal, Smalltalk, and C become a lot more compelling when you have to write complex code like that which speaks to platforms like Windows. An ambitious program like Word for Windows or Excel might take years to perfect in assembly language, even with a crack team of programmers sweating blood day and night over the project. And you just can't take years to write a program anymore. If you do, by the time your program is complete, the rules that you followed when you designed the program will no longer be valid when the program is ready to send to market. Your program will be obsolete before it's even finished.

That's the bad news. The good news is that parts of a Windows program can be written in assembly language, and the improved speed and compactness of the assembly portions may be able to give the program as a whole (which might have been written in Pascal or C) a serious competitive edge.

Windows includes support for a very handy feature called a dynamic link library (DLL), which is simply a collection of subroutines gathered into a file and loaded whenever they're needed. DLLs are vaguely similar to the overlays of times past, which were chunks of code left on disk because the whole program was too large to fit into memory at once. Just as the application would then load chunks of itself into a common area as it needed them (overwriting chunks in that area that it no longer needed), Windows loads a DLL into memory when the code inside the DLL is called. But unlike overlays, DLL code can be used by any Windows program that knows the standard Windows DLL calling conventions.

DLLs can be written in assembly language much more easily than entire Windows programs, and if you want to work under Windows but write assembly code, DLLs are a natural place to begin. Again, I can't explain how to write DLLs in this book (that's a fairly advanced topic), but I want to point out right now that it's certainly possible, and may be one way to make money program-ming in the Windows market. If you write a fast "engine" that accomplishes only one thing (say, data communications or database management) but accomplishes it very well, other Windows programmers may license the DLL containing your engine and use it to enhance their own Windows applications.

I do have some advice about Windows, however: learn to program it first from a higher-level language like Borland's Turbo Pascal for Windows. Learning assembly language is hard enough. Windows presents a lot of new concepts that are confusing enough without having to learn them at the very lowest level. Once you're fluent at creating Windows applications, use your assembly skills to replace time-critical portions of the code with optimized assemblylanguage DLLs.

11.3 Using the "New" Instructions in the 80286

This probably all sounds pretty grim from where you sit now, a novice assem-bly programmer with a desire to go further. Don't despair, though. It's not all bad news. Particularly, there are new registers and instructions in the 286, 386, and 486 that you can learn and use right now, without even going into protected mode. In the next several sections, I'll describe some of these new features and explain how you can use them.

Still 16 Bits

The 286 is a 16-bit processor. Inside of itself, it handles data in 16-bit chunks, and all of its registers are 16 bits wide, just like the 8086 and 8088. Furthermore, it can read and write data from and to the memory system 16 bits at a time. (To get 32-bit registers and 32-bit data transfers, you'll need to get a 386 or 486 machine.)

Now, people sometimes get confused about "how many bits" a processor "is." We call this value the data width of a CPU. Although I took up this issue briefly in Section 2.3, now might not be a bad time to expand on the question, because it will come up again with regard to the 386 and the 386SX. The answer is ... well, it depends on your point of view.

The 286 is a 16-bit processor, both inside and out. Inside the CPU, data can be processed 16 bits at a time. This is made possible by virtue of the 286's general-purpose registers, (AX, BX, CX, and DX) which are all 16 bits wide. You can access the general-purpose registers by 8-bit halves (that is, by using CL and CH rather than CX), but the most you can put in any one register is 16 bits.

There are people who define the data width of a CPU in terms of its general-purpose registers. In truth, however, this is a false indicator. What you really need to look for is the width of the data path that leads from inside the CPU out to the physical memory system.

The original CPU in the IBM PC was the 8088. Its general-purpose registers are all 16 bits wide. However, the 8088 can only move one byte at a time out to the memory system. The 8086 (which was never much of a player in the PC world) can move 16 bits out to the memory system in a single operation.

This is a lot more important, functionally. It's a little like building a big boat in your basement. It's nice to have a big boat, sure—but if you have to dismantle it into several pieces every time you want to take it out to sail, you'll eventually conclude that its bigness is more of a bother than an advantage. Sooner or later, you're going to get a canoe and enjoy it a lot more.

Moving memory into and out of the CPU is one of the most time-consuming things the CPU can do. If at all possible, you want to minimize the number of "fetches" that the CPU must perform. The best way to do this is to choose a CPU with the greatest available data width. The 8086 is inherently faster than the 8088 because it can move twice the data into or out of the CPU chip in one operation.

So why don't the CPU manufacturers just all make 128-bit CPUs (or wider!) and be done with it? Unfortunately, it's harder to manufacturer a "wide" CPU chip. Each of those bits has to go out to the outside world on a pin, (along with a great many other signals) and once you're into the 32bit world, you're talking a lot of pins. The 8088 fits comfortably on a 40-pin 1C (integrated circuit) package, but the 80386 has so many pins it looks like a bed of nails—a little ceramic square whose lower surface is nearly covered by gold pins. Inside the 1C package, each pin has to be connected to the physical silicon chip by a minuscule gold wire, which is difficult enough to do once, let alone literally a hundred or more times.

Wide CPUs cost more to make than narrow ones, because they're physically more difficult to manufacture. There's also more complication on the computer motherboard to support wide CPU chips, which further adds to the cost of the computer.

386DX vs. 386SX

In the late '80s, Intel released its 80386SX chip. Internally, the 386SX was a genuine 386—it had all the registers and instructions supported by the original 386 CPU. However, the 386SX moved data into and out of itself only 16 bits at a time, just like the 8086 and 286. This lowered the cost of the 386SX, which made it cheaper to incorporate into an actual computer. (Intel then renamed the "big" 386 the 386DX to make sure no one got them mixed up.)

So while 386-specific software will run perfectly well on the 386SX, it runs more slowly, because the CPU can only move 16 bits at a time, rather than 32. The 386DX is a 32-bit CPU that moves 32 bits to or from memory in one crack.

At this writing, Intel has released very little information about its as-yet-unannounced 586 CPU. Will it be a 64-bit CPU? We don't know yet, but it's unlikely. People whose opinions I respect believe that 32 bits is the optimum data width for a practical CPU. I suspect they may be wrong—but we'll know soon enough.

Pushing and Popping All the Registers at Once

The 286 added a pair of new instructions to its repertoire: PUSHA and POPA. These instructions move all the general-purpose registers to or from the stack in one blistering operation. The registers affected are AX, CX, DX, BX, SP, BP, SI, and DI.

PUSHA pushes these registers on the stack. You should keep in mind that the registers go onto the stack in the order listed above.

DI is the last register pushed onto the stack, and therefore will be the first popped off the stack when you go back to pop what you pushed.

Something else to keep in mind: the value of SP that is pushed onto the stack is the value that SP held before the PUSHA instruction began pushing everything onto the stack. Don't forget this if you intend to pop registers from the stack piecemeal after pushing the whole crew with PUSHA. If you do something like this, you'll be in for a surprise:

POP DI

POP SI

POP BP

POP SP

Why? The last instruction pops the saved value of SP back into SP. That value, remember, was the value SP had before PUSHA started to work. Once you use an individual POP instruction to pop the SP value off the stack, you 'II no longer be able to pop AX, BX, CX, and DX. The SP value pushed onto the stack points above the AX value pushed by PUSHA.

Most of the time, if you use PUSHA to push all the registers onto the stack, you'll use POPA to pop them off, again as one operation. POPA reverses what PUSHA did, and takes the values off the stack and plugs them into the registers in reverse order:

DI, SI, BP, SP, BX, DX, CX, AX

POPA does something interesting: it simply pops and discards the value pushed onto the stack for SP. This prevents the problem I mentioned above with popping registers piecemeal after using PUSHA. So why push SP at all? In the very peculiar way CPU chips operate internally, it was probably easier to push SP on the stack and ignore the popped value that might have gone into SP than to leave SP out of the process entirely. It's just that PUSHA and POPA "step through" the registers, and it's easier to step through them all than to try and skip one.

So what are PUSHA and POPA good for? You might use them to "frame" a subroutine that makes heavy use of registers. If you push all the registers on entry to a subroutine, you can use all of the registers from inside the subroutine, and not worry about trashing something that the caller will need after you return from the subroutine. By pushing all of the general-purpose registers, you needn't worry about forgetting to save one or another before using it within that subroutine. It's only one instruction, so it adds very little bulk to your code, and it's excellent bug insurance.

PUSHA and POPA are also useful when writing interrupt service routines.

More Versatile Shifts and Rotates

PUSHA and POPA are entirely new instructions, present in the 286 and newer CPUs, but not present at all in the 8086 and 8088. However, not everything that's new with the 286 is a whole new instruction. Some of the 286's enhance-ments are improvements to existing instructions.

For my money, the best of these are enhancements to the shift and rotate instructions. There are six such instructions: SHL, SHR, ROL, ROR, RCL, RCR. (The instructions SAL and SAR are just duplicate names for SHL and SHR.) I dealt with the shift instructions in Chapter 9, as they exist on the 8088 and 8086. If you'll recall from that chapter, you can express the number of bits

by which to shift in one of two ways:

SHL AX.l

; Shift left by 1

SHL AX.CL

; Shift left by number in CL

(The AX register is just for example's sake; obviously, you can replace AX here with any legal operand. Furthermore, this dicussion applies to any of the six shift/rotate instructions.) To shift an operand by 1 bit, you specify the literal value 1. To shift by any greater number of bits greater than 1, you must first load a count value into the CL register, and then use CL as the second operand. Well, this is how it is on the 8086/8088. Starting with the 286, you can drop the use of CL and use an immediate value (that is, a digit like 4 or 7) for shift values greater than 1. It becomes legal to use instructions that look like this:

SHL AX,4

SHL BX,7

It's more than just convenience. Having to load CL with a shift value not only takes time, but it eats up code space as well (the MOV CL,4 or MOV CL,7 instructions have to go somewhere).

Limiting the Shift Count

The 286 and newer CPUs put another, much subtler twist on the shift and rotate instructions: they limit the shift count to 31. This will take a little explaining; I recall having trouble with it when I first encountered the 86-family instruction set.

When you specify the shift count in CL, the assembler will permit you to use any value that will physically fit in CL. This means you can theoretically shift an operand by up to 255 bits, since the largest value you can load into 8-bit CL is 255, Aka 0FFH.

But think about that for a moment. What does it actually mean to shift an operand by 255 bits? The largest operand you can ever shift with any x86 CPU is only 32 bits wide. If you shift a 32bit operand by 32 or more bits in either direction, you're left with nothing but 0's in the operand, because all significant bits will be shifted completely out of the operand into nothingness. So for the shift instructions, at least, shifting by more than 31 bits is meaningless.

It's a little trickier for the rotate instructions. The rotate instructions, if you recall, rotate bits off one end of the operand and then feed them back into the opposite end of the operand, to begin the trip again. Therefore, you could rotate a bit pattern in an operand by 255 and still have bits in the operand, because the bits never really leave the operand. They simply go out the front door and come back in immediately through the back door.

So rotating an operand by 255 could still be meaningful. The question is, is it uniquely