Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Assembly Language Step by Step 1992

.pdf
Скачиваний:
143
Добавлен:
17.08.2013
Размер:
7.98 Mб
Скачать

file may be divided, (using your text editor) into numerous smaller source-code files to keep them manageable in size and complexity. The assem-bler assembles the various component fragments separately, and the several resulting .OBJ files are woven together into a single, executable program file. This process is shown in Figure 3.4.

When you're first starting out, it's unlikely that you will be writing large programs spread out across several source-code files. Even if you only have a small source-code file that produces a single .OBJ file, you must still use the linker to change the single .OBJ file into an executable program file, as I'll explain a little later.

The larger your programs become, however, the more time can be saved by cutting them up into components. There are two reasons for this:

1.You can move tested, proven routines into separate libraries and link them into any program you write that might need them. This way, you can reuse code over and over again and not build the same old wheels every time you begin a new programming project in assembly language.

2.Once portions of a program are tested and found to be correct, there's no need to waste time assembling them over and over again along with newer, untested portions of a program. Once a major program gets into the tens of thousands of lines of code (and you'll get there sooner than you might think), you can save an enormous amount of time by only assembling the portion of a program that you are currently working on, and linking the finished portions into the final program without reassembling the whole thing every time.

Executable Program Files

The linker program may be seen as a kind of translator program, but its major role is in combining multiple object code files into a single executable program file. This executable file is sometimes called an .EXE file from the file extension that the linker appends to the file's name. For example, a source code file named FOO.ASM would be assembled to an object code file named FOO.OBJ, which would then be processed by the linker to an executable program file called FOO.EXE.

The executable file can be run by typing its name (without the .EXE extension) at the DOS prompt (for example, C:\>FOO) and then pressing Enter.

Real Assembler Products: MASM and TASM

For quite a few years there was only one assembler product in general use for the PC: Microsoft's Macro Assembler, better known as MASM. MASM is still an enormously popular program, and has established a standard for assembler operation on the PC. The source code in this book is all designed to be assembled by MASM.

MASM is by no means perfect, however, and in 1988 Borland International released their answer to MASM in the form of Turbo Assembler, which was quickly christened TASM by syllable-conserving programmers. TASM is a great deal faster than MASM, and has numerous advanced features that I won't be able to utilize in this book. However, at the level we're describing in this book, MASM and TASM are totally compatible in that they will assemble identical source code files identically. MASM and TASM are the two most popular assemblers for Intel's 86-family of CPUs, and the information in this book can be applied to either assembler.

I won't, however, attempt to describe the two assemblers' operation in detail. There are many differences in the ways the two assemblers function, and you'll have to delve into the manuals to get the full story. Very fortunately, when you're first starting out, there isn't a whole lot to using either TASM or MASM, and I'll describe the simple commands for invoking each assembler where appropriate.

The most recent releases of MASM now come with their own text editor, but for years MASM was "editor less" and you had to supply your own editor. Currently, TASM does not come with a text editor, so if you're a TASM user, you'll have to come up with a text editor on your own. I recommend using something simple, like my JED editor described in Chapter 4.

Both MASM and TASM come with their own special debugging tools, called debuggers. MASM's debugger is called CodeView, and TASM's debugger is called Turbo Debugger. Both are enormously sophisticated programs, and I won't be discussing either in this book, due in part to their complexity but mostly because there is a debugger shipped with every copy of PC DOS. This debugger, simply named DEBUG, is more than enough debugger to cut your teeth on, and will get you familiar enough with debugger theory to move up to CodeView or Turbo Debugger later on.

I'll be describing DEBUG much more fully in Section 3.5.

Setting Up a Working Subdirectory

The process of creating and perfecting assembly-language programs involves a lot of different kinds of DOS files and numerous separate software tools. Unlike the tidy, fullyintegrated environments offered by the Turbo and Quick languages, assembly language development conies in a great many pieces with some assembly required.

I recommend setting up a development subdirectory on your hard disk and putting all of the various pieces in that subdirectory. Create, then change to a subdirectory called ASM by using these DOS commands:

MD ASM

CD ASM

Then, from within the ASM subdirectory, copy the following files or groups of files into the subdirectory:

Your text editor. If you're using JED (see Chapter 4), you need only copy the file JED.EXE. If you're using a memory-resident editor like Sidekick's notepad, you may not need to copy any editor program into your development subdirectory, because it will be memory resident when you boot. For other editors like Brief, you'll need to consult the documentation.

Your assembler. Again, consult the documentation to see what files are necessary to assembler a source code file. Usually, there is a single execut-able file like MASM.EXE or TASM.EXE and perhaps some help files or configuration files. The older versions of MASM stood alone as MASM.EXE and needed nothing else in the subdirectory to operate. Similarly, the first release of TASM allows the file TASM.EXE to work alone.

Your linker. Both MASM and TASM include their own linkers. MASM's linker program is LINK.EXE. TASM's linker is TLINK.EXE. Copy the appropri-ate file. Both linkers stand alone and do not require any support files.

DEBUG. On your DOS distribution disk (or in your DOS subdirectory, if you have a DOS subdirectory) there should be a file called DEBUG.COM. Files with a .COM

extension are, like .EXE files, executable programs. .COM programs are slightly oldfashioned and not much used anymore since Turbo Pascal 3.0 was supplanted by version 4.0 in 1987. Copy DEBUG.COM into your development subdirectory.

• Odds and ends. A source code listing program, while not essential, can be very helpful—such programs print out neatly formatted listings on your printer. (I have written a useful one called JLIST10 that I have placed on the listings diskette for this book—but it only operates with the LaserJet laser printers.) Add anything else that may be helpful, keeping in mind that a lot of files are generated during assembly language development, and you should strive to keep unnecessary clutter to a minimum.

3.4 The Assembly-Language Development Process

As you can see, there are a lot of different file types and a fair number of programs

involved in the cycle of writing, assembling, and testing an assembly-language program. The cycle itself sounds more complex than it is. I've drawn you a map to help you keep your bearings during the discussions in this chapter. Figure 3.5 shows the assemblylanguage development process in a "view from a height." At first glance it may look like a map of the L.A. freeway system, but in reality the flow is fairly straightforward. Follow along on a quick tour.

Assembling the Source-Code File

You use the text editor to first create a new text file and then to edit that same text file, as you perfect your assembly language program. As a convention, most assembly language source code files are given a file extension of .ASM. In other words, for the program named FOO, the assembly language source code file would be named FOO.ASM.

It is possible to use file extensions other than .ASM, but I feel that using the .ASM extension can eliminate some confusion by allowing you to tell at a glance what a file is for—just by looking at its name. All tolled, about nine different kinds of files can be involved during assembly language development.

We're only going to speak of four or five in this book.) Each type of file will have its own standard file extension. Anything that will help you keep all that complexity in line will be worth the (admittedly) rigid confines of a standard naming convention.

As you can see from the flow in Figure 3.5, the editor produces a source code text file, which we show as having the .ASM extension. This file is then passed to the assembler program itself, for translation to a relocatable object module file with an extension of

.OBJ.

Invoking the assembler is very simple. For small standalone assembly-language programs in Turbo Assembler, it's nothing more than the name of the assembler followed by the name of the program to be assembled (for example, C:\ASM>TASM FOO).

For Microsoft's MASM, you need to put a semicolon on the end of the command. This tells MASM that no further prompts are necessary (for example C:\ASM>MASM FOO). If you omit the semicolon, nothing bad will happen, but MASM will ask you for the names of several other files, and you will have to press Enter several times to select the defaults.

DOS will load the assembler from disk and run it. The assembler will open the source code file you named after the name of the assembler, and begin processing the file. Almost immediately afterward, it will create an object file with the same name as the

source file, but with the .OBJ extension.

As the assembler reads lines from the source code file, it will examine them, construct the binary machine instructions the source code lines represent, and then write those machine instructions to the object code file.

When the assembler detects the EOF marker signaling the end of the source code file, it will close both source code file and object code file and return control to DOS

.

Assembler Errors

The previous three paragraphs describe what happens if the .ASM file is correct. By correct, I mean the file is completely comprehensible to the assembler, and can be translated into machine instructions without the assembler getting confused. If the assembler encounters something it doesn't understand when it reads a line from the source code file, we call the misunderstood text an error, and the assembler displays an error message.

For example, the following line of assembly language will confuse the assembler and summon an error message:

MOV AX.VX

The reason is simple: there's no such thing as a "VX." What came out as "VX" was actually intended to be "BX," which is the name of a register. (The V key is right next to the B key and can be struck by mistake without your fingers necessarily knowing that they done wrong.)

Typos are by far the easiest kind of error to spot. Others that take some study to find usually involve transgressions of the assembler's rules. Take for example the line:

MOV ES,OFFOOH

This looks like it should be correct, since ES is a real register and 0F000H is a real, 16-bit quantity that will fit into ES. However, among the multitude of rules in the fine print of the 86-family of assemblers is that you cannot directly move an immediate value (any number like 0FF00H) directly into a segment register like ES,DS;SS, or CS. It simply isn't part of the CPU's machinery to do that.

Instead, you must first move the immediate value into a register like AX, and then move AX into ES.

You don't have to remember the details here; we'll go into the rules later on. For now, simply understand that some things that look reasonable are simply "against the rules" and are considered an error.

There are much, much more difficult errors that involve inconsistencies between two otherwise legitimate lines of source code. I won't offer any examples here, but I wanted to point out that errors can be truly ugly, hidden things that can take a lot of study and torn hair to find. Toto, we are definitely not in BASIC anymore...

The error messages vary from assembler to assembler, but they may not always be as

helpful as you might hope. The error TASM displays upon encountering the VX typo follows:

Turbo Assembler Version 1.0 Copyright (c) 1988 by Borland International Assembling file: FOO.ASM

**Error** FOO.ASMC74) Undefined symbol: VX

Error messages:

1

Warning messages:

None

Remaining memory:

395k

This is pretty plain, assuming you know what a "symbol" is. The error message TASM will present when you try to load an immediate value into ES is less helpful:

Turbo Assembler Version 1.0

Copyright (c) 1988 by Borland International

Assembling file:

IBYTECPY.ASM

**Error** IBYTECPY.ASMC74)

Illegal use of segment register

Error messages:

1

 

Warning messages:

None

 

Remaining memory:

395k

 

It'll let you know you're guilty of performing illegal acts with a segment register, but that's it. You have to know what's legal and what's illegal to really understand what you did wrong. As in running a stop sign, ignorance of the law is no excuse.

Assembler error messages do not absolve you from understanding the CPU's or the assembler's rules.

I hope I don't frighten you too terribly by warning you that for more complex errors, the error messages may be almost no help at all.

You may make (or will make; let's get real) more than one error in writing your source code files. The assembler will display more than one error message in such cases, but it may not necessarily display an error for every error present in the source code file. At some point, multiple errors confuse the assembler so thoroughly that it cannot necessarily tell right from wrong anymore. While it's true that the assembler reads and translates source code files line by line, there is a cumulative picture of the final assembly language program that is built up over the course of the whole assembly process. If this picture is shot too full of errors, in time the whole picture collapses.

The assembler will stop and return to DOS, having printed numerous error messages. Start at the first one and keep going. If the following errors don't make sense, fix the first one or two and assemble again.

Back to the Editor

The way to fix errors is to load the .ASM file back into your text editor and start hunting up the error. This "loopback" is shown in Figure 3.5.

The error message will almost always contain a line number. Move the cursor to that line number and start looking for the false and the fanciful. If you find the error immediately, fix it and start looking for the next.

Here's a little logistical snag: how do you make a list of the error messages on paper so that you don't have to memorize them or scribble them down on paper with a pencil? You may or may not be aware that you can redirect the assembler's error message displays to a DOS text file on disk.

It works like this: you invoke the assembler just as you normally would, but add the redirection operator > and the name of the text file to which you want the error messages sent. If you were assembling FOO.ASM with TASM and wanted your error messages written out to a disk file named ERRORS.TXT, you would invoke TASM by entering C:\ASM>TASM FOO > ERRORS.TXT.

Here, error messages will be sent to ERRORS.TXT in the current DOS directory C:\ASM. When you use redirection, the output does not display on the screen. The stream of text from TASM that you would ordinarily see is quite literally steered in its entirety to another place, the file ERRORS.TXT.

Once the assembly process is done, the DOS prompt will appear again. You can then print the ERRORS.TXT file on your printer and have a handy summary of all that the assembler discovered was wrong with your source code file.

Assembler Warnings

As taciturn a creature as an assembler may appear to be, it genuinely tries to help you any way it can. One way it tries to help is by displaying warning messages during the assembly process. These warning messages are a monumental puzzle to beginning assembly language programmers: are they errors or aren't they? Can I ignore them or should I fool with the source code until thev go away?

There is no clean answer. Sorry about that.

Warnings are the assembler acting as experienced consultant, and hinting that something in your source code is a little dicey. Now, in the nature of assembly language, you may fully intend that the source code be dicey. In an 86-family CPU, dicey code may be the

only way to do something fast enough or just to do it at all. The critical factor is that you had better know what you're doing.

The most common generator of warning messages is doing something that goes against the assembler's default conditions and assumptions. If you're a beginner doing ordinary, 100%-by-the-book sorts of things, you should crack your assembler reference manual and figure out why the assembler is tut-tutting you. Ignoring a warning may cause peculiar bugs to occur later on during program testing. Or, ignoring a warning message may have no undesir-able consequences at all. I feel, however, that it's always better to know what's going on. Follow this rule:

Ignore a warning message only if you know exactly what it means.

In other words, until you understand why you're getting a warning message, treat it as though it were an error message. Only when you fully understand why it's there and what it means should you try to make the decision whether or not to ignore the message.

In summary, the first part of the assembly language development process (as shown in Figure 3.5) is a loop. You must edit your source code file, assemble it, and return to the editor to fix errors until the assembler spots no further errors. You cannot continue until the assembler gives your source code file a clean bill of health.

When no further errors are found, the assembler will write an .OBJ file to disk, and you will be ready to go on to the next step.

Linking

Theoretically, an assembler could generate an .EXE (executable) program file directly from your source code .ASM file. Some obscure assemblers have been able to do this, but it's not a common assembler feature.

What actually happens is that the assembler writes an intermediate object code file with an .OBJ extension to disk. You can't run this .OBJ file, even though it contains all the machine instructions that your assembly language source code file specified. The .OBJ file needs to be processed by another translator program, the linker.

The linker performs a number of operations on the ,OBJ file, most of which would be meaningless to you at this point. The most obvious task the linker does is to weave several .OBJ files into a single .EXE program file. Creating an assembly language program from multiple .ASM files is called modular assembly.

Why create multiple .OBJ files when writing a single executable program? One of two major reasons is size. A middling assembly-language application might be 50,000 lines long. Cutting that single monolithic .ASM file up into multiple 8,000 line .ASM files