Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

The Assembly Language Development Process

As you can see, there are a lot of different file types and a fair number of programs involved in the cycle of writing, assembling, and testing an assembly language program. The cycle itself sounds more complex than it is. I've drawn you a map to help you keep your bearings during the discussions in this chapter. Figure 4.5 shows the most complex form of the assembly language development process in a "view from a height." At first glance it may look like a map of the LA freeway system,

Figure 4.5: The assembly language development process.

but in reality the flow is fairly straightforward. And NASM allows you to remove a certain amount of the complexity-the separate linker operation-for simple, single-source-file programs like those you'll write while learning your way around the instruction set. Finally, NASM-IDE helps even further by invoking some of the utilities automatically, so you're not constantly hammering on the keyboard.

Nonetheless, if you pursue professional-level assembly language programming, this is the map you'll need to follow. Let's take a quick tour.

Assembling the Source Code File

The text editor first creates a new text file, and later changes that same text file, as you extend, modify, and perfect your assembly language program. As a convention, most assembly language source code files are given a file extension of .ASM. In other words, for the program named FOO, the assembly language source code file would be named FOO.ASM.

It is possible to use file extensions other than .ASM, but I feel that using the .ASM extension can eliminate some confusion by allowing you to tell at a glance what a file is for-just by looking at its name. All told, about nine different kinds of files can be involved during assembly language development-more if you take the horrendous leap into Windows software development. (We're only going to speak of four or five in this book.) Each type of file will have its own standard file extension. Anything that will help you keep all that complexity in line will be worth the (admittedly) rigid confines of a standard naming convention.

As you can see from the flow in Figure 4.5, the editor produces a source code text file, which we show

as having the .ASM extension. This file is then passed to the assembler program itself, for translation to a relocatable object module file with an extension of .OBJ.

When you invoke the assembler, DOS will load the assembler from disk and run it. The assembler will open the source code file you named after the name of the assembler and begin processing the file. Almost immediately afterward, it will create an object file with the same name as the source file, but with an .OBJ extension.

As the assembler reads lines from the source code file, it will examine them, construct the binary machine instructions the source code lines represent, and then write those machine instructions to the object code file.

When the assembler comes to the end of the source code file, it will close both source code file and object code file and return control to DOS.

Assembler Errors

Note well: The previous paragraphs describe what happens if the .ASM file is correct. By correct, I mean that the file is completely comprehensible to the assembler and can be translated into machine instructions without the assembler getting confused. If the assembler encounters something it doesn't understand when it reads a line from the source code file, we call the misunderstood text an error, and the assembler displays an error message.

For example, the following line of assembly language will confuse the assembler and summon an error message:

MOV AX,VX

The reason is simple: There's no such thing as a "VX." What came out as "VX" was actually intended to be "BX," which is the name of a register. (The V key is right next to the B key and can be struck by mistake without your fingers necessarily knowing that they done wrong.)

Typos like this are by far the easiest kind of error to spot. Others that take some study to find involve transgressions of the assembler's many rules. For example:

MOV ES,0FF00H

This looks like it should be correct, since ES is a real register and 0FF00H is a real 16-bit quantity that will fit into ES. However, among the multitude of rules in the fine print of the 86-family of assemblers is one that states you cannot directly move an immediate value (any number like 0FF00H) directly into a segment register like ES, DS, SS, or CS. It simply isn't part of the CPU's machinery to do that. Instead, you must first move the immediate value into a register like AX, and then move AX into ES.

You don't have to remember the details here; we'll go into the rules later on when we discuss the individual instructions. For now, simply understand that some things that look reasonable are simply against the rules for technical reasons and are considered an error.

There are much, much more difficult errors that involve inconsistencies between two otherwise legitimate lines of source code. I won't offer any examples here, but I wanted to point out that errors can be truly ugly, hidden things that can take a lot of study and torn hair to find. Toto, we are definitely not in Basic anymore...

The error messages vary from assembler to assembler, and they may not always be as helpful as you might hope. The error NASM displays upon encountering the "VX" typo follows:

testerr.asm:20: symbol 'vx' undefined

This is pretty plain, assuming you know what a "symbol" is. The error message NASM will present when you try to load an immediate value into ES is far less helpful:

Testerr.asm:20: invalid combination of opcode and operands

It'll let you know you're guilty of performing illegal acts with an opcode and its operands, but that's it.

You have to know what's legal and what's illegal to really understand what you did wrong. As in running a stop sign, ignorance of the law is no excuse.

Assembler error messages do not absolve you from understanding the CPU's or the assembler's rules.

I hope I don't frighten you too terribly by warning you that for more abstruse errors, the error messages may be almost no help at all.

You may make (or will make-let's get real) more than one error in writing your source code files. The assembler will display more than one error message in such cases, but it may not necessarily display an error for every error present in the source code file. At some point, multiple errors confuse the assembler so thoroughly that it cannot necessarily tell right from wrong anymore. While it's true that the assembler reads and translates source code files line by line, there is a cumulative picture of the final assembly language program that is built up over the course of the whole assembly process. If this picture is shot too full of errors, in time the whole picture collapses.

The assembler will stop and return to DOS, having printed numerous error messages. Start at the first one and keep going. If the following ones don't make sense, fix the first one or two and assemble again.

Back to the Editor

The way to fix errors is to load the .ASM file back into your text editor and start hunting up the error. This loopback is shown in Figure 4.5.

The error message will almost always contain a line number. Move the cursor to that line number and start looking for the false and the fanciful. If you find the error immediately, fix it and start looking for the next.

Here's a little logistical snag: How do you make a list of the error messages on paper so that you don't have to memorize them or scribble them down on paper with a pencil? You may or may not be aware that you can redirect the assembler's error message displays to a DOS text file on disk.

It works like this: You invoke the assembler just as you normally would, but add the redirection operator

">" and the name of the text file to which you want the error messages sent. If you were assembling FOO.ASM with NASM and wanted your error messages written out to a disk file named ERRORS.TXT, you would invoke NASM this way:

C:\ASM>NASM FOO > ERRORS.TXT

(I've omitted certain command-line parameters for simplicity's sake.) Here, error messages will be sent to ERRORS.TXT in the current DOS directory C:\ASM. When you use redirection, the output does not display on the screen. The stream of text from NASM that you would ordinarily see is quite literally steered in its entirety to another place, the file ERRORS.TXT.

Once the assembly process is done, the DOS prompt will appear again. You can then print the ERRORS.TXT file on your printer and have a handy summary of all that the assembler discovered was wrong with your source code file.

Note well that if you're using an interactive development environment like NASM-IDE (which is provided on this book's CD-ROM and described in detail in the next chapter), you won't have to bother with redirection to a file or to the printer. NASM-IDE and other development environments accumulate error messages in a separate window that you can keep on display while you go back and edit your .ASM file. This is a much more convenient way to work, and I powerfully recommend it.

Assembler Warnings

As taciturn a creature as an assembler may appear to be, it genuinely tries to help you any way it can. One way it tries to help is by displaying warning messages during the assembly process. These warning messages are a monumental puzzle to beginning assembly language programmers: Are they

errors or aren't they? Can I ignore them or should I fool with the source code until they go away?

Alas, there's no clean answer. Sorry about that.

Warnings are the assembler acting as experienced consultant, and hinting that something in your source code is a little dicey. Now, in the nature of assembly language, you may fully intend that the source code be dicey. In an 86-family CPU, dicey code may be the only way to do something fast enough, or just to do it at all. The critical factor is that you had better know what you're doing. (And if you're reading this book, my guess is that you probably don't.)

The most common generator of warning messages is doing something that goes against the assembler's default conditions and assumptions. If you're a beginner doing ordinary, 100-percent-by- the-book sorts of things, you should crack your assembler reference manual and figure out why the assembler is tut-tutting you. Ignoring a warning may cause peculiar bugs to occur later on during program testing. Or, ignoring a warning message may have no undesirable consequences at all. I feel, however, that it's always better to know what's going on. Follow this rule:

Ignore an assembler warning message only if you know exactly what it means.

In other words, until you understand why you're getting a warning message, treat it as though it were an error message. Only once you fully understand why it's there and what it means should you try to make the decision whether to ignore it or not.

In summary: The first part of the assembly language development process (as shown in Figure 4.5) is a loop. You must edit your source code file, assemble it, and return to the editor to fix errors until the assembler spots no further errors. You cannot continue until the assembler gives your source code file a clean bill of health.

When no further errors are found, the assembler will write an .OBJ file to disk, and you will be ready to go on to the next step.

Linking

As I explain shortly, there's nothing to prevent an assembler from generating an executable program file (that is, an .EXE or .COM file) direct from your source code file. NASM can do this, and we'll take advantage of that shortcut while we're getting started. However, in traditional assembly language work, what actually happens is that the assembler writes an intermediate object code file with an .OBJ extension to disk. You can't run this .OBJ file, even though it generally contains all the machine instructions that your assembly language source code file specified. The .OBJ file needs to be processed by another translator program, the linker.

The linker performs a number of operations on the .OBJ file, most of which would be meaningless to you at this point. The most obvious task the linker does is to weave several .OBJ files into a single

.EXE executable program file. Creating an assembly language program from multiple .ASM files is called modular assembly, and I explain how to do it (with an example) in Chapter 9.

Why create multiple .OBJ files when writing a single executable program? One of two major reasons is size. A middling assembly language application might be 50,000 lines long. Cutting that single monolithic .ASM file up into multiple 8,000-line .ASM files would make the individual .ASM files smaller and much easier to understand.

The other reason is to avoid assembling completed portions of the program every time any part of the program is assembled. One thing you'll be doing is writing assembly language procedures, which are small detours from the main run of steps and tests that can be taken from anywhere within the assembly language program. Once you write and perfect a procedure, you can tuck it away in an .ASM file with other completed procedures, assemble it, and then simply link the resulting .OBJ file into the working .ASM file. The alternative is to waste time by reassembling perfected source code over and over again every time you assemble the main portion of the program.

This is shown in Figure 4.5. In the upper-right corner is a row of .OBJ files. These .OBJ files were assembled earlier from correct .ASM files, yielding binary disk files containing ready-to-go machine instructions. When the linker links the .OBJ file produced from your in-progress .ASM file, it adds in the previously assembled .OBJ files, which are called modules. The single .EXE file that the linker writes to

disk contains the machine instructions from all of the .OBJ files handed to the linker when then linker is invoked.

Once the in-progress .ASM file is completed and made correct, its .OBJ module can be put up on the rack with the others and added to the next in-progress .ASM source code file. Little by little you construct your application program out of the modules you build one at a time.

A very important bonus is that some of the procedures in an .OBJ module may be used in a future assembly language program that hasn't even been begun yet. Creating such libraries of "toolkit" procedures can be an extraordinarily effective way to save time by reusing code over and over again, without even passing it through the assembler again!

Many traditional assemblers, such as MASM and TASM, require that you use a linker to process even a single small .OBJ file into an executable .COM or .EXE file. Connecting multiple modules is only one of several things the linker is capable of doing. More recent assemblers such as NASM can often take up some of the work that a linker normally does and allow you to create simple executable programs from single .ASM files. But keep in mind that to produce an executable .EXE file from multiple assembly language source code files, you must invoke the linker.

The linker I discuss in this book is called ALINK, and like NASM and NASM-IDE, it's a free utility created by a dedicated assembly programmer. Anthony Williams is its creator, and he was kind enough to allow me to redistribute ALINK on the CD-ROM for this book.

Invoking the linker is again done from the DOS command line. Linking multiple files involves naming each file on the command line. With ALINK, you simply name each .OBJ file on the command line after the word ALINK, with a space between each file name. You do not have to include the .OBJ extensionALINK assumes that all modules to be linked end in .OBJ:

C:\ASM>ALINK FOO BAR BAS

There are many different options and commands that may be entered along with the file names, to do things slightly (or considerably) fancier. We use ALINK a lot more later on in this book, and at that point I'll explain its command syntax in greater detail.

Linker Errors

As with the assembler, the linker may discover problems as it weaves multiple .OBJ files together into a single .EXE file. Linker errors are subtler than assembler errors and are usually harder to find. Fortunately, they are rarer and not as easy to make.

As with assembler errors, when you are presented with a linker error you have to return to the editor and figure out what the problem is. Once you've identified the problem (or think you have) and changed something in the source code file to fix the problem, you must reassemble and relink the program to see if the linker error went away. Until it does, you have to loop back to the editor, try something else, and assemble/link once more.

If possible, avoid doing this by trial and error. Read your assembler and linker documentation. Understand what you're doing. The more you understand about what's going on within the assembler and the linker, the easier it will be to determine who or what is giving the linker fits.

Testing the .EXE File

If you receive no linker errors, the linker will create and fill a single .EXE file with the machine instructions present in all of the .OBJ files named on the linker command line. The .EXE file is your executable program. You can run it by simply naming it on the DOS command line and pressing Enter:

C:\ASM>FOO

When you invoke your program in this way, one of two things will happen: The program will work as you intended it to, or you'll be confronted with the effects of one or more program bugs. A bug is anything in a program that doesn't work the way you want it to. This makes a bug somewhat more subjective than

an error. One person might think red characters displayed on a blue background is a bug, while another might consider it a clever New Age feature and be quite pleased. Settling bug versus feature conflicts like this is up to you. Consensus is called for here, with fistfights only as a last resort.

There are bugs and there are bugs. When working in assembly language, it's quite common for a bug to completely "blow the machine away," which is less violent than some think. A system crash is what you call it when the machine sits there mutely and will not respond to the keyboard. You may have to press Ctrl+Alt+Delete to reboot the system, or (worse) have to press the Reset button, or even power down and then power up again (that is, flip the power switch off, wait 10 seconds, and switch it on again). Be ready for this-it will happen to you, sooner and oftener than you will care for.

Figure 4.5 announces the exit of the assembly language development process as happening when your program works perfectly. A very serious question is this: How do you know when it works perfectly? Simple programs assembled while learning the language may be easy enough to test in a minute or two. But any program that accomplishes anything useful at all will take hours of testing at minimum. A serious and ambitious application could take weeks-or months-to test thoroughly. A program that takes various kinds of input values and produces various kinds of output should be tested with as many different combinations of input values as possible, and you should examine every possible output every time.

Even so, finding every last bug is considered by some to be an impossible ideal. Perhaps-but you should strive to come as close as possible, in as efficient a fashion as you can manage. I have a lot more to say about bugs and debugging throughout the rest of this book.

Errors versus Bugs

In the interest of keeping the Babel effect at bay, I think it's important to carefully draw the distinction between errors and bugs. An error is something wrong with your source code file that either the assembler or the linker kicks out as unacceptable. An error prevents the assembly or link process from going to completion and will thus prevent a final .EXE file from being produced.

A bug, by contrast, is a problem discovered during execution of a program under DOS. Bugs are not detected by either the assembler or the linker. Bugs can be benign, such as a misspelled word in a screen message or a line positioned on the wrong screen row; or a bug can make your DOS session run off into the bushes and not come back.

Both errors and bugs require that you go back to the text editor and change something in your source code file. The difference here is that most errors are reported with a line number telling you where to go in your source code file to fix the problem. Bugs, on the other hand, are left as an exercise for the student. You have to hunt them down, and neither the assembler nor the linker will give you much in the line of clues.

Debuggers and Debugging

The final, and almost certainly the most painful, part of the assembly language development process is debugging. Debugging is simply the systematic process by which bugs are located and corrected. A debugger is a utility program designed specifically to help you locate and identify bugs.

Debugger programs are among the most mysterious and difficult to understand of all programs. Debuggers are part X-ray machine and part magnifying glass. A debugger loads into memory with your program and remains in memory, side by side with your program. The debugger then puts tendrils down into both the operating system (for our purposes, DOS, and later Linux) and into your program and enables some truly peculiar things to be done.

One of the problems with debugging computer programs is that they operate so quickly. Thousands of machine instructions can be executed in a single second, and if one of those instructions isn't quite right, it's past and gone long before you can identify which one it is by staring at the screen. A debugger allows you to execute the machine instructions in a program one at a time, allowing you to pause indefinitely between each one to examine the effects of the last instruction on the screen. The debugger also lets you look at the contents of any location in memory, and the values stored in any register, during that pause between instructions.

Commercial assemblers such as MASM are generally packaged with their own advanced debuggers. MASM's CodeView is a brutally powerful (and hellishly complicated) creature that I don't recommend to beginners. For this reason, I won't try to explain how to use CodeView in this book.

Besides, CodeView is bundled with expensive Microsoft development tools and thus costs a fair amount of money. Very fortunately, every copy of DOS and Windows, irrespective of version, is shipped with a more limited but perfectly good debugger called DEBUG. DEBUG can do nearly anything that a beginner would want from a debugger, and in this book we'll do all our DOS debugging with DEBUG.

Because DEBUG is included with your operating system, it's not one of the provided tools on the CDROM included with this book. And because its location has changed from version to version of DOS and Windows, I recommend looking around in your system directories until you locate it. In older versions of DOS it's called DEBUG.COM. In newer versions of DOS and all versions of Windows, it's DEBUG.EXE.

I've found that on most systems, DEBUG is already on your path, and you can invoke it from any directory you happen to be in. Try invoking DEBUG before you suffer too much looking for it. If a dash character ("-") prompt appears, DEBUG is on your path and you don't need to know precisely where it is. (Type a Q to quit DEBUG.)