Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

Assembling and Running EAT.ASM

To assemble and run EAT.ASM, we can load it into NASM-IDE, and then let NASM-IDE invoke NASM. That's how we're going to do it here. You should understand, however, that NASM-IDE is simply a "place to stand." NASM is what actually does the work of assembling the file.

Here's the sequence:

1.Run NASM-IDE.

2.Select the Open item from the File menu. (We would say this, in shorthand form, "Select File|Open.")

3.Highlight the name of file EAT.ASM, and click on the OK button. EAT.ASM will load and be displayed in a window. If EAT.ASM isn't in the same directory as NASM-IDE, you may have to navigate to the directory where EAT.ASM lives by clicking on directory names in the dialog box.

4.Select Assemble|Assemble. The Error window will appear in the lower half of the display, even if only to tell you, "No errors occurred."

5.Assuming no errors occurred, select Assemble|Run. The display will clear, and EAT's message will be displayed in the upper-left corner of the display. Beneath it you'll see DOS's message, "Press any key to continue…" Press any key, and the display will return to NASM-IDE, showing EAT.ASM.

Assembler and Interactive Development Environment

There it is: You've assembled and run an assembly language program. It's important at this point to ponder who's doing what on your system. If you read Chapter 5, you know that NASM-IDE is an interactive development environment (IDE) containing a source code editor and a few other tools. NASM-IDE is not the assembler. The assembler is called NASM, and NASM is a separate program that does not actually require NASM-IDE for its use. When you select Assemble | Assemble in NASM-IDE, the NASM-IDE program invokes the NASM assembler behind the scenes and passes the name of the program you're working on to NASM, which assembles it and writes the file EAT.COM to disk. Later, when you select Assemble | Run in NASM-IDE, the NASM-IDE program runs EAT.COM for you.

Technically, you don't need NASM-IDE. You can invoke the assembler yourself from the DOS command line, and you can of course run the generated EAT.COM file by naming it on the command line as well. NASM-IDE is there to save you time and let you make changes and reassemble your program quickly and with less keyboarding.

You should, however, understand what NASM-IDE is doing. One major thing it's doing for you is constructing a proper command line by which to invoke NASM. To treat EAT.ASM as a program written for real mode flat model and to generate EAT.COM from EAT.ASM, the following command line has to be used to invoke NASM:

C:\>NASM16 EAT.ASM -F BIN -O EAT.COM

It's certainly easier just selecting Assemble | Assemble, no? Still, over time you should study the various command-line options that NASM supports so that you can begin to do more advanced things than NASM-IDE is capable of doing. They're all described in detail in NASM's documentation, which is present on the CD ROM for this book.

This particular command line is fairly easy to explain:

1.NASM16 is the name of the version of NASM intended for use with real mode programs under DOS. On your disk it will be NASM16.EXE.

2.EAT.ASM is the name of the source code file you wish to assemble.

3.-F BIN indicates the output format. There are many different types of files that NASM is capable of producing. The one we want is the .COM file, which is generated as a simple binary image of the generated machine-code bytes and any initialized data. The "BIN" indicates "binary image."

3.

The other key thing about .COM files is the 0100H code origin, but that's handled in the source

code, as I explained earlier.

4.-O EAT.COM is the name of the output file. You can call the generated code file anything you want. If you don't specify the name of the output file, NASM will just lop off the ".ASM" and call the file EAT. Unfortunately, the name "EAT" doesn't indicate to DOS that it's a runnable program, so DOS won't know what to do with it. That's why you have to specify the full output file name "EAT.COM" on the command line.

Later on in this book, we're going to invoke NASM from the command line to produce a type of file that NASM-IDE won't be able to tell NASM how to produce. Therefore, we'll have to do it ourselves.

What Happens When a .COM File Runs

It's often useful to know just what happens when you run a program of your own creation. DOS treats its two kinds of executable programs a little differently when it runs them. .COM files are the simpler of the two. (I speak of .EXE files a little later in this chapter.) .COM files are a simple image of the instructions and data assembled out of the source code file. When you execute a .COM program from the DOS command line, here's what happens:

1.The .COM file is loaded into memory at a location of DOS's choosing. It doesn't change the file when it loads the file; the file is loaded exactly as it was saved to disk by the assembler.

2.AX, BX, DX, BP, SI, and DI are set to 0.

3.The instruction pointer IP is set to 0100H.

4.The number of bytes loaded from disk and stored into memory is placed in the CX register.

5.The stack pointer is set to the highest address in the program's segment, minus one.

6.All four segment registers CS, SS, DS, and ES are set to the same value: the segment address of the single segment in which the .COM program must run. DOS chooses this value.

7.DOS transfers control to the instruction at CS:IP, and your program is off and running!

You can see this very plainly by loading EAT.COM with DEBUG. Here's a dump of the registers

immediately after loading EAT.COM into memory:

-r

AX=0000 BX=0000 CX=001C DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000 DS=1470 ES=1470 SS=1470 CS=1470 IP=0100 NV UP EI PL NZ NA PO NC 1470:0100 BA0C01 MOV DX,010C

You'll sometimes hear the real mode flat model referred to as the Tiny model. This is a term that originated in the C programming community, which has separate names for several different arrangements of code and data, depending on whether there is a single segment for code and data or multiple segments.

The real mode flat model is simplicity itself—so simple, in fact, that it doesn't teach you much about segments. Maybe you don't need to know that much about segments to craft useful programs these days (especially in protected mode flat model), but I've found it very useful to know just how our CPUs evolved, and segments are a big part of that. So, that said, let's look at EAT.ASM crafted for the real mode segmented model.

One Program, Three Segments

The main problem with real mode flat model is that everything you do must fit into 64K of memory. This isn't much of a pinch for learning assembly language and just playing around writing small utilities, but once you try to create something ambitious-say, a word processor or database-driven e-mail client-you find that code and data begin to crowd one another in a big hurry. So, for all its trouble, real mode segmented model was the only way to make full use of real mode's megabyte of memory.

Today, of course, you'd either create a Windows application (which you would probably not attempt in assembly) or you'd work in protected mode flat model under an implementation of Unix for the Intel x86 CPUs. Nonetheless, if you understand segments, you have it in yourself to understand every other aspect of assembly work.

Let's do the Land Shark HyperBike trick again, this time with a version of EAT.ASM specifically written to use the real mode segmented model. Here's the bike-and then we'll take it apart just like last time:

; Source name

: EATSEG.ASM

; Executable name : EATSEG.EXE

; Code model:

: Real mode segmented model

; Version

: 1.0

; Created date

: 9/10/1999

; Last update

: 9/10/1999

; Author

: Jeff Duntemann

; Description

: A simple example of a DOS .EXE file programmed for

;

real mode segmented model, using NASM-IDE 1.1,

;

NASM 0.98, and ALINK. This program demonstrates

;

how segments are defined and initialized using NASM.

[BITS 16]

; Set 16 bit code generation

SEGMENT junk

; Segment containing code

..start:

; The two dots tell the linker to Start Here.

 

; Note that this is a special symbol and MUST

 

; be in lower case! "..start:" "..START:"

;SEGMENT SETUP

;

;In real mode segmented model, a program uses three segments, and it must

;set up the addresses in the three corresponding segment registers. This

;is what the ASSUME directive does in MASM; we ASSUME nothing in NASM!

;Each of the three segments has a name (here, code, data, and stack) and

;these names are identifiers indicating segment addresses. It is the

;appropriate segment address that is moved into each segment register.

;Note that you can't move an address directly into a segment register;

;you must first move the address into a general purpose register. Also

;note that we don't do anything with CS; the ..start: label tells the

;linker where the code segment begins.

mov

ax,data

; Move segment address of data segment into AX

mov

ds,ax

; Copy address from AX into DS

mov

ax,stack

; Move segment address of stack segment into AX

mov

ss,ax

; Copy address from AX into SS

mov

sp,stacktop

; Point SP to the top of the stack

mov

dx,eatmsg

; Mem data ref without [] loads the ADDRESS!

mov

ah,9

; Function 9 displays text to standard output.

int

21H

; INT 21H makes the call into DOS.

mov

ax, 04C00H

; This DOS function exits the program

int

21H

; and returns control to DOS.

SEGMENT data ; Segment containing initialized data

eatmsg

db "Eat at Joe's!", 13, 10, "$" ;Here's our message

SEGMENT stack stack ;This means a segment of *type* "stack"

 

; that is also *named* "stack"! Some

 

; linkers demand that a stack segment

resb 64

; have the explicit type "stack"

; Reserve 64 bytes for the program stack

stacktop:

; It's significant that this label points to

 

; the *last* of the reserved 64 bytes, and

 

; not the first!

Three Segments

Assembly language programs written for real mode segmented model must contain at least three segments: One for code, one for data, and one for the stack. Larger programs may contain more than one code segment and more than one data segment, but real mode programs may contain only one stack segment at a time.

EATSEG.ASM has those three necessary segments. Each segment has a name: stack,data, and code, which indicate pretty clearly what the segment is for. The code segment, pretty obviously, contains the machine instructions that do the program's work. The data segment contains initialized variables.

The stack segment contains the program's stack. I haven't explained stacks just yet, and because you don't really need to understand stacks in order to understand how EATSEG.ASM works, I'm going to hold off just a little while longer. In short, a stack is simply an ordered place to stash things for the short term-and that will have to do until we cover the concept in depth in the next section.

Each of the three segments is declared using the SEGMENT directive, which is a command that tells NASM that a segment begins here. The SEGMENT directive must be followed by the segment's name. You can name the segments whatever you like, but custom suggests that when you have only three segments, they be called stack,data, and code. Why obscure the meaning of what you're writing?

The segment containing the stack has some special considerations attached to it, especially regarding the linking of several files together into one executable program. One of these considerations is that the stack have the type "stack" attached to it. This tells the linker (as I explain later) that this particular segment is special-it's a stack segment and not just a data segment. Hence the line:

SEGMENT stack stack

Nobody's stuttering here. The SEGMENT directive is creating a stack named "stack" that is of the type "stack." The first identifier is the name; the second is the type. You could change the name of the segment to MyStack or

GreasedPig if you like, but it's important to let the type of the stack segment be precisely stack. More on this after we explain something else.

Don't ASSUME…

If you remember, in the real mode flat model, the operating system sets all four segment registers to the same value (one that it selects) when the program is loaded into memory and run. In the real mode segmented mode, the different segments are indeed different and distinct regions of memory and are not all the same place. When the program begins running, DOS doesn't set the segment registers to anything. Your program must do that on its own. (DOS does, of course, set CS to the start of the code segment before giving control to your program. The other segment registers it leaves alone.)

This is what the first part of EATSEG.ASM does: It takes the addresses represented by the segment names for the data and stack segments and loads them into DS and SS, the segment registers governing those segments:

mov

ax,data

; Move segment address of data segment into AX

mov

ds,ax

; Copy address from

AX into

DS

mov

ax,stack

;

Move

segment

address of stack segment into AX

mov

ss,ax

;

Copy

address

from

AX into

SS

Keep in mind that you can only load a segment register from a general-purpose register-you can't load it from anything else, either immediate data or memory data. This is why the segment addresses have to pass through AX to get into DS and SS. (Because we're not using ES to govern a segment defined at assembly time right there in our program, we don't need to load ES with anything right off the bat.)

This is a good place to point out a crucial difference between NASM (the assembler that we're using in this book) and Microsoft's extremely popular MASM, which is probably the most-used assembler in history: MASM attempts to associate segment names with segment types. NASM does not.

With one small exception done as a courtesy to the linker, NASM does not know which segment is the code segment, nor which segment is the data segment, nor which segment is the stack segment. You define a segment by name:

SEGMENT data

; Segment containing initialized data

The name "data," however, tells you that it's the data segment. The assembler doesn't look for the string "data" and note somewhere that the segment named data is the data segment. This is why you could change the preceding line to this:

SEGMENT GreasedPig ; Segment containing initialized data

Nothing would change. GreasedPig is an odd name for a segment, but a completely legal one.

In MASM, Microsoft defines the ASSUME directive, which associates segment names with segment registers. This allows MASM to generate segment prefixes automatically when it creates the opcodes called out by a particular mnemonic in your source code. This is a tricky and subtle business, so to make this clearer, imagine a memory variable defined in a segment that is addressed via ES:

SEGMENT JunkSegment

JunkChunk DW 0FFA7H

At the beginning of the program, you have to make sure ES is loaded with the segment address of JunkSegment:

MOV AX, JunkSegment ; Load segment address of JunkSegment into ES via AX

MOV ES, AX

Ordinarily, using NASM, you have to specify when a piece of memory data is located relative to the ES register,

because the default is DS:

MOV AX,[ES:JunkChunk] ; Move word variable JunkChunk from JunkSegment (ES) in

That's the NASM way. Using Microsoft's MASM, you can associate a segment name with ES using the ASSUME

directive:

ASSUME ES:JunkSegment

Having associated ES and JunkSegment this way, you could now write the MOV instruction without explicitly

including the ES: segment prefix:

MOV AX,[JunkChunk] ; Move word variable JunkChunk from JunkSegment (ES) into

Thanks to ASSUME, MASM knows that the variable JunkChunk is located in extra segment ES, so it inserts the ES: prefix behind the scenes as it generates the opcode for this mnemonic. Many of us (NASM's authors included) don't think this is a particularly good idea. It makes the source code less specific and hence less readable-a person not familiar with the program might assume (heh-heh) that JunkChunk is in the data segment associated with DS because there's no ES: prefix and DS is the default for memory variable references like that.

So, NASM has nothing like ASSUME. When you move away from the default addressing of memory variables relative to DS, you must include the segment register prefix inside the square brackets of all memory variable

references!

Naming the Stack Segment

The exception I noted earlier is that NASM allows you to say which segment is the stack segment:

SEGMENT MyStack stack

Here, MyStack is the name of the segment (which can be any legal identifier) and stack is the type. This is not for NASM's benefit-it will not take any action of its own based on knowing that the segment named MyStack is in fact the stack segment. But some linkers need to know that there is a stack segment defined in the program. Stack segments are special as segments go, at least in part (kind of like Tigger) there can be only one-but there must be one! Some linkers check to see whether there is a segment in a program designated as the stack segment, and to keep such linkers quiet NASM allows you to give the stack type to a segment defined with SEGMENT.

This is a good idea and I recommend that you do it.

Choosing a Starting Point

There are no jumps, loops, or subroutines in EATSEG.ASM. If you've a smattering of assembly language smarts you may wonder if the ..start: label at the beginning of the code segment is unnecessary except for readability purposes. After all, start is not referenced anywhere within the program.

On the other hand, code execution has to begin somewhere, and you need to tell the assembler (and especially the linker) where code execution must begin. This is the purpose of the ..start: label.

The issue is this: DOS needs to know at what address to begin execution when it loads and runs the program. (DOS sets code segment register CS when it loads your program into memory prior to executing it.) You might think DOS could assume that execution would begin at the start of the code segment, but there may be more than one code segment, and under most circumstances the programmer does not specify the order of multiple code segments within a single program. (The linker has the power to rearrange multiple code segments for reasons that I can't explain in this book.) Better to have no doubt about it, and for that reason you the programmer should pick a starting point and tell the assembler what it is.

You may notice that leaving out ..start: won't keep NASM from assembling a program, and while the linker will complain about the lack of a starting point, the linker will default to starting execution at the beginning of the code segment-which in our case is the only code segment, so there's no ambiguity there.

Nonetheless, it's bad practice to leave out the starting point label.

Assembling and Linking EATSEG.ASM

Although NASM can generate a .COM file (for a real mode flat model program) directly, it can't generate a .EXE file for a real mode segmented model program in the same way. Once you move away from a single segment in real mode flat model, NASM needs the help of a linker to generate the final .EXE file.

I've obtained permission to distribute an excellent free linker with this book's CD-ROM. The linker is ALINK, written by Anthony Williams. It's on the CD-ROM, and if you copied the executable file ALINK.EXE to your hard drive along with everything else, you can invoke it simply by naming it.

NASM-IDE was intended for writing programs in real mode flat model, so it relies exclusively on NASM and does not have any machinery for invoking a linker. That means that NASM-IDE won't be able to do the assemble and link tasks for us. It's time to face the fiendish command line.

If you're working from DOS you can simply assemble and link from the DOS command line. If you're working with NASM-IDE in a DOS box under Windows, it's probably easier to "shell out" to the DOS command line from inside NASM-IDE. This is done by selecting the menu item File|DOS Shell. You will see NASM-IDE vanish and be replaced by a blank screen with the DOS prompt. When you're done with the DOS shell, type EXIT followed by Enter to return to NASM-IDE.

Assembling EATSEG.ASM is done with the following command line:

C:\>NASM16 EATSEG.ASM -f obj -o EATSEG.OBJ

This command line will assemble EATSEG.ASM to the file EATSEG.OBJ, in the standard .OBJ linkable file format. Linking is even easier:

C:\>ALINK EATSEG.OBJ

Here, ALINK will convert EATSEG.OBJ into EATSEG.EXE. I explain more about linkers and what they do in the next chapter. Here, ALINK is acting more as a file format converter than anything else, since there's only one file to be linked. Later on, we'll see how ALINK can connect multiple .OBJ files into a single executable file.

After ALINK runs, you'll have the file EATSEG.EXE on your hard disk. That's the file that you can name at the DOS command line to run EATSEG.