Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

Reading and Changing Registers with DEBUG

Much or most of what defines your assembly language programs lies in your use of registers. Machine instructions act on registers, and registers define how memory is addressed and what is read from or placed there. While you're developing and debugging your programs, a lot of what you'll be looking at is the contents of your registers.

The DOS DEBUG utility provides a handy window into the CPU's hidden world of registers. How DEBUG does this is the blackest of all black arts and I can't begin to explain it in an introductory text. For now, just consider DEBUG a magic box. One thing to keep in mind is that DEBUG is a real mode creature. It doesn't work in protected mode. You can only use it while debugging real mode programs, whether segmented or flat model. Protected mode debuggers do exist, but DEBUG isn't one of them.

Looking at the registers from DEBUG doesn't even require that you load a program into DEBUG. Simply run DEBUG, and at the dash prompt type the single-letter command R. The display will look something very close to this:

- R

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1980 ES=1980 SS=1980 CS=1980 IP=0100 NV UP EI PL NZ NA PO NC

1980:0100 389A5409 CMP [BP+SI+0954],BL SS:0954=8A

I say "something very close" because details of the display will vary depending on what resident programs you have loaded in memory, which version of DOS you're using, and so on. What will vary will be the values listed as present in the various registers, and the machine instruction shown in the third line of the display (here, CMP [BP+SI+0954],BL).

What will not vary is the fact that every CPU register has its place in the display, along with its current value shown to the right of an equals sign. The characters "NV UP EI PL NZ NA PO NC" are a summary of the current values of the flags in the flags register.

The preceding display is that of the registers when no program has been loaded. All of the generalpurpose registers except for SP have been set to 0, and all of the segment registers have been set to the value 1980H. These are the default conditions set up by DEBUG in the CPU when no program has been loaded. (The 1980H value will probably be different for you-it represents the first available segment in memory above DOS, and where that segment falls depends on what else exists in memory both above and below DOS.)

Changing a register is done very simply, again using DEBUG's R command. To change the value of AX, type R AX and press Enter:

-R AX

AX:0000

:0A7B

-

DEBUG will respond by displaying the current value of AX (here, "0000") and then, on the following line, a colon prompt. It will wait for you to either enter a new numeric value for AX, or else for you to press Enter. If you press Enter, the current value of the register will not be changed. In the preceding example, I typed "0A7B" (you needn't type the H indicating hex) followed by Enter.

Once you do enter a new value and then press Enter, DEBUG does nothing to verify that the change has been made. To see the change to register AX, you must display all the registers again using the R command:

- R

AX=0A7B BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000 DS=1980 ES=1980 SS=1980 CS=1980 IP=0100 NV UP EI PL NZ NA PO NC

1980:0100 389A5409 CMP [BP+SI+0954],BL SS:0954=8A

Take a few minutes to practice entering new values for the general-purpose registers, then display the registers as a group to verify that the changes were made. While exploring, you might find that the IP register can be changed, even though I said earlier that it can't be changed directly. The key word is directly; DEBUG knows all the dirty tricks.

Inspecting the Video Refresh Buffer with DEBUG

One good way to help your knowledge of memory addressing sink in is to use DEBUG to take a look at some interesting places in the PC's memory space.

One easy thing to do is look at the PC's video display adapter's text screen video refresh buffer. A video refresh buffer is a region of memory with a difference: Any characters written to buffer memory are instantly displayed on the computer's screen. This is accomplished electrically through special use of the information that comes out of the memory data pins. Precisely how it is done is outside the scope of this book. For now, simply understand that writing a character to your text mode display screen (which is not the Windows graphical UI screen!) can be done by writing the ASCII code for that character into the correct address in the video refresh buffer portion of memory.

The text mode display buffer is the screen that appears when you're running DOS or else working in a DOS window (or "DOS box") from within MS Windows. It consists not of icons or graphical images or figures but simple textual characters, arranged in a matrix typically 25 high and 80 wide. This used to be the mainstay of all computing; now, text screens seem downright quaint to most people.

As with any memory location anywhere within the PC, the video refresh buffer has a segment address. What that segment address is depends on the kind of display installed in the PC. There are two separate possibilities, and which is present is easy enough to determine: If your PC has a color screen, the segment address of the video refresh buffer is 0B800H. If you have a monochrome screen (a situation now becoming vanishingly rare), the segment address is 0B000H instead.

It takes 2 bytes in the buffer to display a character. The first of the two (that is, first in memory) is the ASCII code of the character itself. For example, an A would require the ASCII code 41H; a B would require the ASCII code 42H, and so on. (The full ASCII code set is shown in Appendix D.) The second of the two bytes is the character's attribute. Think of it this way: In the display of a character on the screen, the ASCII code says what and the attribute says how. The attribute dictates the color of a character and its background cell on a color screen. On a monochrome screen, the attribute specifies whether a character is underlined or displayed in reverse video. (Reverse video is a character display mode in which a dark character is shown on a light background, rather than the traditional light character on a dark or black background.) Every character byte has an attribute byte and every attribute byte has its character byte; neither can ever exist alone.

The very first character/attribute pair in the video refresh buffer corresponds to the character you see in the upper-left-hand corner of the text screen. The next character/attribute pair in the buffer is the character on the second position on the top line of the screen, and so on. I've drawn a diagram of the relationship between characters on the screen and byte values in the video refresh buffer in Figure 6.11.

Figure 6.11: The PC's video refresh buffer.

In Figure 6.11, the three letters "ABC" are displayed in the upper-left corner of the screen. Notice that the "C" is underlined. The screen shown in Figure 6.11 is a monochrome screen. The video refresh buffer therefore begins at 0B000:0. The byte located at address 0B000:0 is ASCII code 41H, corresponding to the letter "A." The byte at address 0B00:0001 is the corresponding attribute value of 07H. The 07H value as an attribute dictates normal text in both color and monochrome displays, in which normal means white characters on a black background.

The byte at 0B000:0005 is also an attribute byte, but its value is 01H. On a monochrome screen, 01H makes the corresponding character underlined. On a color display, 01H makes the character blue on a black background.

There is nothing about the video refresh buffer to divide it into the lines you see on the display. The first 160 characters (80 ASCII codes plus their 80 attribute bytes) are shown as the first line, and the subsequent 160 characters are shown on the next line down the screen.

You might rightfully ask what ASCII code is in the video refresh buffer for locations on the screen that show no character at all. The answer, of course, is that there is a character there in every empty space: the space character, whose ASCII code is 20H.

You can inspect the memory within the video refresh buffer directly, through DEBUG. Take the following steps:

1.Clear the screen by entering CLS at the DOS prompt and pressing Enter.

2.Invoke DEBUG.

3.Decide where your video refresh buffer is located, and enter the proper segment address into the ES register through the R command. Remember: Color screens use the 0B800H segment address, while monochrome screens use the 0B000H segment address. (In the year 2000, it's a 98 percent chance that your screen is color and not monochrome.) Note from the following session dump that 0B800H must be entered into DEBUG as "B800," without the leading zero. NASM (your assembler) must have that leading zero, and DEBUG cannot have it. Sadly, no one ever said that all parts of this business had to make perfect sense.

4.Dump the first 128 bytes of the video refresh buffer by entering D ES:0 and pressing Enter.

5.Dump the next 128 bytes of the video refresh buffer simply by entering the D command by itself a second time. (I won't say "press Enter" every time. It's assumed: You must follow a command by pressing Enter.)

What you'll see should look a lot like the following session dump:

C:\ASM>debug

-r es ES 1980 :b800

-d es:0

B800:0000 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0010 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0020 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0030 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0040 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0050 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0060 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0070 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

- d

B800:0080 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:0090 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:00A0 43 07 3A 07 5C 07 41 07-53 07 4D 07 3E 07 64 07 C.:.\.A.S.M.>.d. B800:00B0 65 07 62 07 75 07 67 07-20 07 20 07 20 07 20 07 e.b.u.g. . . . .

B800:00C0 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:00D0 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:00E0 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

B800:00F0 20 07 20 07 20 07 20 07-20 07 20 07 20 07 20 07 . . . . . . . .

The first 80 character/attribute pairs are the same: 20H/07H, which display as plain, ordinary blank space. When you execute the CLS command on most machines, the screen is cleared, and the DOS prompt reappears on the second line from the top of the screen, not the top line. The top line is typically left blank, as is the case here.

You'll see in the second block of 128 dumped bytes the DOS prompt and the invocation of DEBUG in lowercase. Keep in mind when reading DEBUG hex dumps that any character not readily displayed as one of the standard ASCII letters, numbers, or punctuation marks is represented as a period character. This is why the 07H attribute character is shown on the right portion of DEBUG's display as a period character, since the ASCII code 07H has no displayable equivalent.

You can keep dumping further into the video refresh buffer by pressing DEBUG's D command repeatedly.

Reading the Basic Input/Output System Revision Date

Another interesting item that's easy to locate in your PC is the revision date in the ROM BIOS. ROM

(read-only memory) chips are special memory chips that retain their contents when power to the PC is turned off. The BIOS (Basic Input/Output System) is a collection of assembly language routines that perform basic services for the PC: disk handling, video handling, printer handling, and so forth. The BIOS is kept in ROM at the very top of the PC's megabyte of address space.

The BIOS contains a date, indicating when it was declared finished by its authors. This date is always at the same address and can be easily displayed using DEBUG's D command. The address of the date is 0FFFF:0005. The DEBUG session is shown in the following listing. Note again that the hex number 0FFFFH must be entered without its leading zero:

- d ffff:0005

 

 

 

30 34 2F-33 30 2F 39 37 00 FC B8

04/30/97

FFFF:0000

 

 

 

 

 

FFFF:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00

................

FFFF:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00

................

FFFF:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00

................

FFFF:0040 00 00 00 00 00 00

00

00-00

00

00

00

00

00

00

00

................

FFFF:0050 00 00 00 00

00

00

00

00-00

00

00

00

00

00

00

00

................

FFFF:0060 00

00 00 00

00

00

00

00-00

00

00

00

00

00

00

00

................

FFFF:0070 00

00 00 00

00

00

00

00-00

00

00

00

00

00

00

00

.....................

FFFF:0080

00

00

00

00

00

 

 

 

 

 

 

 

 

 

 

One useful peculiarity of DEBUG illustrated here is that when you begin a hex dump of memory at an address that is not evenly divisible by 16, DEBUG spaces the first byte of the dump over to the right so that paragraph boundaries still fall at the left margin.

Another rather peculiar thing to keep in mind while looking at the particular dump shown in the preceding is that only the first line of memory shown in the dump really exists. The segment 0FFFFH begins only 16 bytes before the end of real mode's megabyte of memory space. (See Figure 6.4 for a good illustration of this.) The byte at 0FFFF:000F is the last byte in real mode memory-and DEBUG is a real mode creature. Addresses from 0FFFF:0010 to 0FFFF:0FFFF would require more than 20 address bits to express, so in real mode they might as well not exist. (They do exist-but DEBUG can't see them!)

DEBUG won't tell you that-it'll just give you endless pages of zeroes for memory beyond the real mode megabyte pale. (Several readers have told me that certain versions of DEBUG take a different approach, and wrap their display around to the bottom of memory instead, and begin displaying bytes at 0000:0000 once they run out of high memory. It's something to watch out for, and if memory beyond the FFFF:000F point is not zeros, you're in fact seeing such a wrap to low memory.)

Transferring Control to Machine Instructions in Read-Only Memory

So far we've looked at locations in memory as containers for data. All well and good-but memory contains machine instructions as well. A very effective illustration of a machine instruction at a particular address is also provided by the ROM BIOS-and right next door to the BIOS revision date, at that.

The machine instruction in question is located at address 0FFFF:0. Recall that, by convention, the next machine instruction to be executed is the one whose address is stored in CS:IP. Run DEBUG. Load the value 0FFFFH into code segment register CS, and 0 into instruction pointer IP. Then dump memory at 0FFFF:0.

- r cs

 

CS 1980

 

:ffff

 

- r ip

 

IP 0100

 

:0

 

- r

 

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

 

DS=1980 ES=1980 SS=1980 CS=FFFF IP=0000 NV UP EI PL NZ NA PO NC

 

FFFF:0000 EA5BE000F0 JMP F000:E05B

 

- d cs:0

04/30/87

FFFF:0000 EA 5B E0 00 F0 30 34 2F-33 30 2F 38 37 00 FC B8 .[...

FFFF:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

FFFF:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

FFFF:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

FFFF:0040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

FFFF:0050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

FFFF:0060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

FFFF:0070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

 

Look at the third line of the register display, which we've been ignoring up until now. To the right of the address display FFFF:0000 is this series of five bytes: EA5BE000F0.

These five bytes make up the machine instruction we want. Notice that the first line of the memory dump begins with the same address, and, sure enough, shows us the same five bytes.

Trying to remember what machine instruction EA5BE000F0 is would try anyone's intellect, so DEBUG is a good sport and translates the five bytes into a more readable representation of the machine instruction. This translation is placed to the right of the binary machine code EA5BE000F0. We call this process of translating binary machine codes back into human-readable assembly language mnemonics unassembly or, more commonly, disassembly:

JMP F000:E05B.

What this instruction does, quite simply, is tell the CPU to jump to the address 0F000:0E05B and begin executing the machine instructions located there. If we execute the machine instruction at CS:IP, that's what will happen: The CPU will jump to the address 0F000:0E05B and begin executing whatever machine instructions it finds there.

All IBM-compatible PCs have a JMP instruction at address 0FFFF:0. The address to which that JMP instruction jumps will be different for different makes and models of PC. This is why on your machine you won't necessarily see the exact five bytes EA5BE000F0, but whatever five bytes you find at 0FFFF:0, they will always begin with 0EAH. The 0EAH byte specifies that this instruction will be a JMP instruction. The remainder of the machine instruction is the address to which the CPU must jump. If that address as given in the machine instruction looks a little scrambled, well, it is...but that's the way the x86 CPUs do things. We return to the issue of funny-looking addresses a little later.

DEBUG has a command, G (for Go), that begins execution at the address stored in CS:IP. If you enter the G command and press Enter, the CPU will jump to the address built into the JMP instruction and begin executing machine instructions. What happens then?

If you're running under DOS, your machine will go into a cold boot, just as it would if you powered down and powered up again. (So make sure you're ready for a reboot before you try it!)

This may seem odd. But consider: The CPU chip has to begin execution somewhere. When the CPU "wakes up" after being off all night with the power removed, it must get its first machine instruction from somewhere and start executing. Built into the silicon of the x86 CPU chips is the assumption that a legal machine instruction will exist at address 0FFFF:0. When power is applied to the CPU chip, the first thing it does is place 0FFFH in CS, and 0 in IP. Then it starts fetching instructions from the address in CS:IP and executing them, one at a time, in the manner that CPUs must.

This is why all PCs have a JMP instruction at 0FFFF:0, and why this JMP instruction always jumps to the routines that bring the PC up from stone cold dead to fully operational.

Unfortunately, if you're running in a DOS window under Windows 9x or NT, jumping to 0FFFF:0 won't initiate a cold boot. Under Windows 9x, the JMP will close your DOS window. Under NT, it won't even do that...It'll just exit DEBUG. You see, Windows lives in protected mode, and it's...um...protected from little tricks like idle jumps to 0FFFF:0.

But if you're running DOS-what the heck, go ahead: Load 0FFFFH into CS and 0 into IP, and press G. Feel good?

It's what we call the feeling of power.

Chapter 7: Following Your Instructions Meeting

Machine Instructions up Close and Personal

Overview

The most visible part of any assembly language program is its machine instructions, those atoms of action that are the steps a program must take to get its work done. The collection of instructions supported by a given CPU is that CPU's instruction set. For example, the 8086 and 8088 CPUs share the same instruction set, which is why most people consider them the same CPU.

This cannot be said for the later CPUs in the family, all of which offer additional instructions not found in the original 8086/8088. I can't cover all the x86 machine instructions in this book, even the original set introduced with the 8086. Those that I will describe are the most common and the most useful, and the easiest for newcomers to understand. It's not just a space issue, either. Some of the instructions (and for the most recent CPUs, such as the Pentium, a good many of them) are dedicated to way-down- deep functions that support the workings of protected mode operating systems and virtual memory. I could spend a whole book the size of this one just explaining the concepts that go into such operating systems and would have to before I could explain the instructions from which one builds them.

Nor will I abandon the discussion of memory addressing begun in the last chapter. As I've said before, understanding how the CPU and its instructions address memory is more difficult but probably more important than understanding the instructions themselves. In and around the descriptions of the machine instructions I'll present from this point on there will be discussions and elaboration on memory addressing. Pay attention! If you don't learn that, memorizing the entire instruction set will do you no good at all.