Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

The Nature of Segments

We've spoken informally of segments so far as chunks of memory within the larger megabyte memory space that the CPU can see and use in real mode segmented model. More formally, a segment is a region of memory that begins on a paragraph boundary and extends for some number of bytes. In real mode segmented model this number is less than or equal to 64K (65,536). We've spoken of the number 64K before. But paragraphs?

Time out for a lesson in 86-family trivia. A paragraph is a measure of memory equal to 16 bytes. It is one of numerous technical terms used to describe various quantities of memory. We've spoken of some of them before, and all of them are even multiples of 1 byte. Bytes are data atoms, remember; loose memory bits never exist in the absence of a byte of memory to contain them. These terms are of uneven usefulness, but you should be aware of all of them, which are given in Table 6.1.

Table 6.1: Collective Terms for Memory

 

 

 

 

 

 

 

 

 

 

 

VALUE

 

 

 

 

 

 

 

 

 

 

 

NAME

 

DECIMAL

 

 

HEX

 

 

 

 

 

 

 

 

 

Byte

 

1

 

 

01H

 

 

 

 

 

 

 

 

 

Word

 

2

 

 

02H

 

 

 

 

 

 

 

 

 

Double word

 

4

 

 

04H

 

 

 

 

 

 

 

 

 

Quad word

 

8

 

 

08H

 

 

 

 

 

 

 

 

 

Ten byte

 

10

 

 

0AH

 

 

 

 

 

 

 

 

 

Paragraph

 

16

 

 

10H

 

 

 

 

 

 

 

 

 

Page

 

256

 

 

100H

 

 

 

 

 

 

 

 

 

Segment

 

65,536

 

 

10000H

 

 

 

 

 

 

 

 

 

Table 6.1 lists two names for each term. Some of these terms, such as ten byte, occur very rarely, and others, such as page, occur almost never. The term paragraph is almost never used, except in connection with the places where segments may begin.

Any memory address evenly divisible by 16 is called a paragraph boundary. The first paragraph boundary is address 0. The second is address 10H; the third address 20H, and so on. (Remember that 10H is equal to decimal 16.) Any paragraph boundary may be considered the start of a segment.

This doesn't mean that a segment actually starts every 16 bytes up and down throughout that megabyte of memory. A segment is like a shelf in one of those modern adjustable bookcases. On the back face of the bookcase are a great many little slots spaced one-half inch apart. A shelf bracket can be inserted into any of the little slots. However, there aren't hundreds of shelves, but only four or five. Nearly all of the slots are empty and unused. They exist so that a much smaller number of shelves may be adjusted up and down the height of the bookcase as needed.

In a very similar manner, paragraph boundaries are little slots at which a segment may be begun. An assembly language program may make use of only four or five segments, but each of those segments may begin at any of the 65,536 paragraph boundaries existing in the megabyte of memory available in the real mode segmented model.

There's that number again: 65,536-our beloved 64K. There are 64K different paragraph boundaries where a segment may begin. Each paragraph boundary has a number. As always, the numbers begin from 0, and go to 64K minus one; in decimal 65,535, or in hex 0FFFFH. Because a segment may begin at any paragraph boundary, the number of the paragraph boundary at which a segment begins is called the segment address of that particular segment. We rarely, in fact, speak of paragraphs or paragraph boundaries at all. When you see the term segment address, keep in mind that each segment address is 16 bytes (one paragraph) farther along in memory than the segment address before it. See Figure 6.4.

Figure 6.4: Memory addresses versus segment addresses.

In short, segments may begin at any segment address. There are 65,536 segment addresses evenly distributed across real mode's full megabyte of memory, 16 bytes apart. A segment address is more a permission than a compulsion; for all the 64K possible segment addresses, only five or six are ever actually used to begin segments at any one time. Think of segment addresses as slots where segments may be placed.

So much for segment addresses; now, what of segments themselves? The most important thing to understand is that a segment may be up to 64K bytes in size, but it doesn't have to be. A segment may be only 1 byte long, or 256 bytes long, or 21,378 bytes long, or any length at all short of 64K bytes.

A Horizon, Not a Place

You define a segment primarily by stating where it begins. What, then, defines how long a segment is? Nothing, really-and we get into some really tricky semantics here. A segment is more a horizon than a place. Once you define where a segment begins, that segment can encompass any location in memory between that starting place and the horizon-which is 65,536 bytes down the line.

Nothing says, of course, that a segment must use all of that memory. In most cases, when you define a segment to exist at some segment address, you only end up considering the next few hundred bytes as part of that segment, until you get into some truly world-class programs. Most beginners read about segments and think of them as some kind of memory allocation, a protected region of memory with walls on both sides, reserved for some specific use.

This is about as far from true as you can get. In real mode nothing is protected within a segment, and segments are not reserved for any specific register or access method. Segments can overlap. (People often don't think about or realize this.) In a very real sense, segments don't really exist, except as horizons beyond which a certain type of memory reference cannot go. It comes back to that set of 64K blinders that the CPU wears, as I drew in Figure 6.3. I think of it this way: A segment is the location in memory at which the CPU's 64K blinders are positioned. In looking at memory through the blinders, you can see bytes starting at the segment address and going on until the blinders cut you off, 64K bytes down the way.

The key to understanding this admittedly metaphysical definition of a segment is knowing how segments are used. And coming to understand that finally brings us to the subject of registers.

Making 20-Bit Addresses out of 16-Bit Registers

A register, as I've hinted before, is a memory location inside the CPU chip rather than outside the CPU in a memory bank somewhere. The 8088, 8086, and 80286 are often called 16-bit CPUs because their internal registers are almost all 16 bits in size. The 80386 and its successors are called 32-bit CPUs because most of their internal registers are 32 bits in size. The x86 CPUs have a fair number of registers, and they are an interesting crew indeed.

Registers do many jobs, but one of their more important jobs is holding addresses of important locations in memory. If you'll recall, the 8086 and 8088 have 20 address pins, and their megabyte of memory (which is the real mode segmented memory we're talking about) requires addresses 20 bits in size.

How do you put a 20-bit memory address in a 16-bit register?

Easy. You don't.

You put a 20-bit address in two 16-bit registers.

What happens is this: All memory locations in real mode's megabyte of memory have not one address but two. Every byte in memory is assumed to reside in a segment. A byte's complete address, then, consists of the address of its segment, along with the distance of the byte from the start of that segment. The address of the segment is (as we said before) the byte's segment address. The byte's distance from the start of the segment is the byte's offset address. Both addresses must be specified to completely describe any single byte's location within the full megabyte of real mode memory. When written out, the segment address comes first, followed by the offset address. The two are separated with a colon. Segment:offset addresses are always written in hexadecimal. Make sure that the colon is there so that people know you're specifying an address and not just a couple of numbers!

I've drawn Figure 6.5 to help make this a little clearer. A byte of data we'll call MyByte exists in memory at the location marked. Its address is given as 0001:001D. This means that MyByte falls within segment 0001H and is located 001DH bytes from the start of that segment. Note that when two numbers are used to specify an address with a colon between them, you do not end each of the two numbers with an H for hexadecimal.

Figure 6.5: Segments and offsets.

You can omit leading zeros if you like; that is, instead of saying 00B2:0004 you could write 0B2:4. (The leading zero is retained in front of the B in keeping with assembly language policy of never allowing a hex number to begin with the hex digits A through F.) As a good rule of thumb, however, I recommend using all four hex digits in both components of the address except when all four digits are zeros. In other words, you can abbreviate 0000:0061 to 0:0061 or 0B00:0000 to 0B00:0.

The universe is perverse, however, and clever eyes will perceive that MyByte can have two other perfectly legal addresses: 0:002D and 0002:000D. How so? Keep in mind that a segment may start every 16 bytes throughout the full megabyte of real memory. A segment, once begun, embraces all bytes from its origin to 65,535 bytes further up in memory. There's nothing wrong with segments overlapping, and in Figure 6.3 we have three overlapping segments. MyByte is 2DH bytes into the first segment, which begins at segment address 0000H. MyByte is 1DH bytes into the second segment, which begins at segment address 0001H. It's not that MyByte is in two or three places at once. It's in only one place, but that one place may be described in any of three ways.

It's a little like Chicago's street-numbering system. Howard Street is 76 blocks north of Chicago's "origin," Madison Street. Howard Street is, however, only 4 blocks north of Touhy Avenue. You can describe Howard Street's location relative to either Madison Street or Touhy Avenue, depending on what you want to do.

An arbitrary byte somewhere in the middle of real mode's megabyte of memory may fall within literally tens of thousands of different segments. Which segment the byte is actually in is strictly a matter of convention.

This problem appears in real life to confront programmers of the IBM PC. The PC keeps its time and date information in a series of memory bytes that starts at address 0040:006C. There is also a series of memory bytes containing PC timer information located at 0000:046C. You guessed it-we're talking about exactly the same starting byte. Different writers speaking of that same byte may give its address in either of those two ways, and they'll all be completely correct.

The way, then, to express a 20-bit address in two 16-bit registers is to put the segment address into one 16-bit register, and the offset address into another 16-bit register. The two registers taken together identify 1 byte among all 1,048,576 bytes in real mode's megabyte of memory.

Is this awkward? You bet. But it was the best we could do for a good many years.