Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Assembly Language Step by Step 1992

.pdf
Скачиваний:
143
Добавлен:
17.08.2013
Размер:
7.98 Mб
Скачать

habit of using an initial zero on any hex number beginning with the hex digits A through F.)

The addresses in a megabyte of memory, then, run from 00000H to 0FFFFFH. In binary notation, that is equivalent to the range of 000000000000000000000B to 11111111111111111111B. That's a lot of bits—20, to be exact. If you'll look back to Figure 2.3 in Chapter 2, you'll see that a megabyte memory bank has 20 address lines. One of those 20 bits is routed to each of those 20 address lines, so that any address expressed as 20 bits will identify one and only one of the 1,048,576 bytes contained in the memory bank.

That's what a megabyte of memory is: some arrangement of memory chips within the computer, connected by an address bus of 20 lines. A 20-bit address is fed to those 20 address lines to identify one byte out of the megabyte.

16-Bit Blinders

The 8088 and 8086 can "see" a full megabyte. That is, the CPU chips have 20 address pins, and can pass a full 20-bit address to the memory system. From that perspective, it seems pretty simple and straightforward. However...the bulk of all the trouble you're ever likely to have in understanding the 86-family CPUs stems from this fact: although the CPUs can see a full megabyte of memory, they are constrained to look at that megabyte through 16-bit blinders.

You may call this peculiar. (Later on, you'll probably call it much worse.) But you must understand it, and understand it thoroughly.

The blinders metaphor is closer to literal than you might think. Look at Figure 5.1. The long rectangle represents the megabyte of memory that the 8088 can address. The CPU is off to the right. In the middle is a piece of metaphorical cardboard with a slot cut in it. The slot is one byte wide and 65,536 bytes long. The CPU can slide that piece of cardboard up and down the full length of its memory system. However, at any one time, it can only access 65,536 bytes.

The CPU's view of memory is peculiar. It is constrained to look at memory in chunks, where no chunk can be larger than 65,536 bytes in length.

The number 64K is important, just as 1Mb is. (We call 65,536 64K for the same reason that we call 1,048,576 "1Mb"—it's just shorthand for what is actually a binary number that "comes out even.") In fact, 64K is more important in assembly language programming than 1Mb; This is the number that circumscribes almost everything that an assembly-language programmer needs to do with the 86-family CPUs. It is, for one

thing, the largest single number that the CPU can actually count and remember as an integral whole. You'll encounter it again and again and again.

Remember: 65,536 in binary is 10000000000000000B; in hex it's 10000H. The important characteristic of 64K is that the number can be expressed in 16 bits. As a multiple of one byte, 16 bits carries with it some of the magic quality of the byte as data atom in our computer universe. The 8088 and 8086 are often called 16-bit computers, because they typically and most efficiently process 16 bits at once crunch. As we begin to discuss CPU registers, you'll come to fully understand just why the magical number 65,536 is as important and all-pervasive as it is.

5.2 "They're Diggin' It up in Choonks!"

That's what Ray Walston shouted jubilantly in the marvelous film version of Paint Your Wagon. He was referring to gold being mined somewhere else (of course), but the metaphor to 86-family memory manipulation is apt. As we pointed out in the last section, the 8088 and its brothers only dig memory in chunks—that's how they're made. Furthermore, it may not be as bad an idea as most programmers think.

To cement my point, let's talk about another type of nugget: native copper. The better part of a mile under the Mesabe range in upper Michigan is an enormous nugget of native copper the size of a freight locomotive. It may even be larger; the mining company that discovered it isn't entirely sure how large it is. This super nugget was discovered before World War II and is still down there at the end of a long tunnel, basically forgotten.

Why leave a fortune in copper sitting where it was found, you ask? OK, wise guy—how do you get it out? Pure copper is a notoriously intractable metal. While not horribly hard, it is tough in ways that make cutting tools become dull and cause them to get stuck in their holes. The truth is that cutting the giant nugget up into manageable pieces would literally cost more than the copper would be worth at today's prices. Hauling out easilycrushed copper ore in fist-sized chunks is enormously easier on men and equipment so supernugget remains in its hole, a curiosity and nothing more.

The lesson here is twofold: first of all, just as most mining companies do not encounter locomotive-sized nuggets every day (or even every century) most jobs a computer has to do not involve enormous quantities of memory at one time. Second, even on computers that don't have a set of 64K blinders playing with a megabyte all at once is hard work, and costly in machine performance.

It may be that the 86-family's blinders enable it to work more quickly and efficiently within its megabyte of memory. Whether true or not, this notion of seeing memory as a number of chunks, called segments, is key to understanding the 86-family CPUs as well.

The Nature of Segments

In 86-parlance, a segment is a region of memory that begins on a paragraph boundary and extends for some number of bytes less than or equal to 64K (65,536). We've spoken of the number 64K before. But paragraphs?

Time out for a lesson in 86-family trivia. A paragraph is a measure of memory equal to 16 bytes. It is one of numerous technical terms used to describe various quantities of memory. We've spoken of some of them before, and all of them are even multiples of one byte. Bytes are data atoms, remember; loose memory bits never exist in the absence of a byte of memory to contain them. Table 5.1 lists the terms you should be aware of. Table 5.1 lists two names for each term. One is the technical term that you and I and all the rest of the humans use in speaking. However, the assembler has its own names for these terms, which you will have to use when writing assembly-language programs. Some of these terms, like ten byte, occur very rarely, and others, like page, occur almost never. The term paragraph is almost never used, except in connection with the places where segments may begin.

Table 5. 1 . Collective terms for memory

NAME

 

 

SIZE

Technical

Assembler

Decimal

Hex

Byte

BYTE

1

01H

Word

WORD

2

02H

Double word

DWORD

4

04H

Quad word

QWORD

8

08H

Ten byte

TBYTE

10

OAH

Paragraph

PARA

16

10H

Page

PAGE

256

100H

Segment

SEGMENT

65,536

10000H

Any memory address evenly divisible by 16 is called a paragraph boundary. The first paragraph boundary is address 0. The second is address 10H; the third address 20H, and so on. (Remember that 10H is equal to decimal 16.) Any paragraph boundary may be considered the start of a segment.

This doesn't mean that a segment actually starts every 16 bytes up and down throughout that megabyte of memory. A segment is like a shelf in one of those modern adjustable bookcases. On the back face of the bookcase are a great many little slots spaced one-half inch apart. A shelf bracket can be inserted into any of the little slots. However, there aren't hundreds of shelves, but only four or five. Most of the slots are empty. They exist so that a much smaller number of shelves may be adjusted up and down the height of the bookcase as needed.

In a very similar manner, paragraph boundaries are little slots at which a segment may start. An assembly-language program may make use of only four or five segments, but each of those segments may begin at any of the 65,536 paragraph boundaries existing in the 8088's megabyte of memory.

There's that number again: 65,536; our beloved 64K. There are 64K different paragraph boundaries where a segment may begin. Each paragraph bound-ary has a number. As always, the numbers begin from 0, and go to 64K minus one; in decimal 65,535, or in hex 0FFFFH. Because a segment may begin at any paragraph boundary, the number of the paragraph boundary at which a segment begins is called the segment address of that particular segment. We rarely, in fact, speak of paragraphs or paragraph boundaries at all. When you see the term "segment address," keep in mind that each segment address is 16 bytes (one paragraph) farther along in memory than the segment address before it. See Figure 5.2.

In short, segments may begin at any segment address. There are 65,536 segment addresses evenly distributed across the 8088's full megabyte of memory, 16 bytes apart. A segment address is more a permission than a compulsion; for all the 64K possible segment addresses, only five or six are ever actually used to begin segments at any one time. Think of segment addresses as slots where segments may be placed.

So much for segment addresses; now, what of segments themselves? A segment may be up to 64K bytes in size, but it doesn't have to be. A segment may be only 1 byte long, or 256 bytes long, or 21,378 bytes long, or any length at all short of 64K bytes.

A Horizon, Not a Place

You define a segment primarily by stating where it begins. What, then, defines how long a segment is? Nothing, really—and we get into some really tricky semantics here. A segment is more a horizon than a place. Once you define where a segment begins. that segment can encompass any location in memory between that starting place and the horizon, which is 65,536 bytes down the line.

Nothing says, of course, that a segment must use all of that memory. In most cases, when

you define a segment to exist at some segment address, you only end up considering the next few hundred bytes as part of that segment, until you get into some truly world-class programs. Most beginners read about segments and think of them as some kind of memory allocation, a protected region of memory with walls on both sides, reserved for some specific use.

This is about as far from true as you can get. Nothing is protected within a segment, and segments are not reserved for any specific register or access method. Segments can

overlap. Segments don't really exist, in a very real sense, except as horizons beyond which a certain type of reference cannot go. It comes back to that set of 64K blinders the CPU wears, as I drew in Figure 5.1. I think of it this way. a segment is the location in memory at which the CPU's 64K blinders are positioned. In looking at memory through the blinders, you can see bytes starting at the segment address, and going on until the blinders cut you off, 64K bytes down the way.

The key to understanding this admittedly metaphysical definition of a segment is knowing how segments are used. And coming to understand that finally brings us to the subject of registers.

Making 20-Bit Addresses out of 16-Bit Registers

The 8088 and 8086 are often called 16-bit CPUs because their internal registers are almost all 16 bits in size. A register, as I've hinted before, is a memory location inside the CPU chip rather than outside in a memory bank. The 86 family has a fair number of registers, and they are an interesting crew indeed.

Registers do many jobs, but one of their more important jobs is holding addresses of important locations in memory. If you'll recall, the 8088 has 20 address pins, and its megabyte of memory requires addresses 20 bits in size.

How do you put a 20-bit memory address in a 16-bit register? Easy. You don't.

You put a 20-bit address in two 16-bit registers.

What happens is this: all locations within the 8088's megabyte of memory have not one address but two. Every byte in memory is assumed to reside in a segment. A byte's complete address, then, consists of the address of its seg-ment, along with the distance of the byte from the start of that segment. The address of the segment is (as we said before) the byte's segment address. The byte's distance from the start of the segment is the byte's offset address. Both addresses must be specified to completely describe any single byte's location within the full megabyte of memory. When written, the segment address comes first, followed by the offset address. The two are separated with a colon. Segment:offset addresses are always written in hexadecimal. Make sure the colon is there so that people know you're specifying an address and not just a couple of numbers!

I've drawn Figure 5.3 to help make this a little clearer. A byte of data we'll call "MyByte" exists in memory at the location marked. Its address is given as 0001:001D. This means that MyByte falls within segment 0001H, and is located 001DH bytes from the start of that segment. Note that when two numbers are used to specify an address with

a colon between them, you do not end each of the two numbers with the hexadecimal suffix.

You can omit leading zeroes if you like; however, remember the assembly-language policy of never allowing a hex number to begin with the hex digits A through F. For example, the address 00B2:0004 could be written 0B2:4. As a good rule of thumb, however, I recommend using all four hex digits in both components of the address except when all four digits are zero. In other words, you can abbreviate 0000:0061 to 0:0061 or

0B00:0000 to 0B00:0.

The universe is perverse, however, and clever eyes will perceive that MyByte can have two other perfectly legal addresses: 0:002D and 0002:000D. How so? Keep in mind that a segment may start every 16 bytes throughout the full megabyte of real memory. A segment, once begun, embraces all bytes from its origin to 65,535 bytes further up in memory. There's nothing wrong with segments overlapping, and in Figure 5.3 we have three overlapping segments. MyByte is 2DH bytes into the first segment, which begins at segment address 0000H. MyByte is IDH bytes into the second segment, which begins at segment address 0001H. It's not that MyByte is in two or three places at once. It's in only one place, but that one place may be described in any of three ways.

It's a little like Chicago's street number system. Howard Street is 76 blocks from Chicago's "origin," Madison Street. Howard Street is, however, only 4 blocks from Touhy Avenue. You can describe Howard Street's location relative to either Madison Street or Touhy Avenue, depending on what you want to do.

An arbitrary byte somewhere in the middle of the 8086's megabyte of memory may fall within literally tens of thousands of different segments. Which segment the byte is actually in is strictly a matter of convention.

This problem appears in real life to confront programmers of the IBM PC. The PC keeps its time and date information in a series of memory bytes that starts at address 0040:006C. There is also a series of memory bytes containing PC timer information located at 0000:046C. You guessed it—we're talking about exactly the same starting byte. Different writers speaking of that same byte may give its address in either of those two ways, and they'll all be completely correct.

The way, then, to express a 20-bit address in two 16-bit registers is to put the segment address into one 16-bit register, and the offset address into another 16-bit register. The two registers taken together identify one byte among all 1,048,576 bytes in a megabyte.

5.3 Registers and Memory Addresses

Think of the segment address as the starting position of the 8086/8088's 64K blinders. Typically, you'll move the blinders to encompass the location where you wish to work, and then leave the blinders in one place while moving around within their 64K limits. This is exactly how registers tend to be used in 8086/8088 assembly language. The 8088, 8086, and 80286 have exactly four segment registers specifically designated as

holders of segment addresses. (The 386 and 486 have two more—but we'll return to that in Chapter 11.) Each segment register is a 16-bit memory location existing within the CPU chip itself. No matter what the CPU is doing, if it's addressing some location in memory, the segment address of that location is present in one of the four segment registers.

The segment registers have names that reflect their general functions: CS DS, SS, and ES.

• CS stands for Code Segment. Machine instructions exist at some offset into a code segment. The segment address of the code segment of the currently executing instruction is contained in CS.

• DS stands for Data Segment. Variables and other data exist at some offset into a data segment. There may be many data segments, but the CPU may only use one at a time, by placing the segment address of that segment in register DS.

• SS stands for Stack Segment. The stack is a very important component of the CPU used for temporary storage of data and addresses. I'll explain how the stack works a little later; for now simply understand that, like everything else within the 8086/8088's megabyte of memory, the stack has a segment address, which is contained in SS.

• ES stands for Extra Segment. The extra segment is exactly that: a spare segment that may be used for specifying a location in memory.

General-Purpose Registers

The segment registers exist only to hold segment addresses. They can be forced to do a few other things, but by and large segment registers should be considered specialists in "segment address containing." The 8086/8088 CPU has a crew of generalist registers to do the rest of the work of assembly-language computing. Among many other things, these general-purpose registers are used to hold the offset addresses that must be paired with segment addresses to pin down a single location in memory.

Like the segment registers, the general-purpose registers are memory loca-tions existing inside the CPU chip itself. They all have names rather than numeric addresses: AX, BX, CX, DX, SP, BP, SI, and DI. The general-purpose registers really are generalists in that all of them share a large suite of capabilities. However, each of the general-purpose registers also has what I call its "hidden agenda": a task or set of tasks that only it can perform.

I'll explain all these hidden agendas as I go. For now, we'll concentrate on the role of the general-purpose registers in addressing memory.