Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

DOS and DOS files

In the previous chapter, I defined what a computer program is, from the computer's perspective. It is, metaphorically, a long journey in very small steps. A long list of binary codes directs the CPU to do what it must to accomplish the job at hand. These codes are, even in their hexadecimal shorthand form, gobbledygook to us here in meatspace:

FE FF A2 37 4C 0A 29 00 91 CB 60 61 E8 E3 20 00 A8 00 B8 29 1F FF 69 55

7B F4 F8 5B 31

Is this a real program or isn't it? You'd probably have to ask the CPU, unless you were a machine-code maniac of the kind that hasn't been seen since 1977. (It isn't.)

But the CPU has no trouble with programs presented in this form. In fact, the CPU can't handle programs any other way. The CPU simply isn't equipped to understand a string of characters such as

LET X = 42

or even something that we out here would call assembly language:

MOV AX,42

To the CPU, it's binary only and hold the text, please.

So, while it is possible to write computer programs in pure binary (I have done it, but not since 1977), it's unpleasant work and will take you until the next Ice Age to accomplish anything useful.

The process of developing assembly language programs is a path that runs from what we call source code that you can read, to something called machine code that the CPU can execute. In the middle is a resting point called object code that we'll take up a little later.

The process of creating true machine-code programs is one of translation. You must start with something that you and the rest of us can read and understand, and then somehow convert that to something the CPU can understand and execute. Before examining either end of that road, however, we need to understand a little more about the land on which the road is built.

The God Above, the Troll Below

Most of all, we need to understand DOS, both for its own sake and as a sort of idiot younger brother of Linux. Some people look upon DOS as a god; others as a kind of troll. In fact, DOS is a little of both. Mostly what you must put behind you is the common notion that DOS is a part of the machine itself and somehow resides in the same sort of silicon as the CPU. Not so! DOS is a computer program of an only slightly special nature, called an operating system.

In part, an operating system is a collection of routines that do nothing but serve the hardware components of the computer itself. By hardware components I mean such things as disk drives, printers, scanners, and so on. DOS acts something like a troll living under the bridge to your disk drive. You tell the troll what you want to do with the disk drive, and the troll does it, his way, and at some cost (in machine cycles) to you.

You could write a program that handled every little aspect of disk operation itself (many game programmers have done exactly that) but it would be more trouble than it was worth, since every program that runs on a computer needs to access the disk drives. And regardless of how grumpy the troll is, he does get the job done, and (assuming your disk drives aren't falling-down damaged) does it right every time. Can you guarantee that you know all there is to know about running a disk drive? Forgive me if I have my doubts. That is, in my opinion, what trolls are for.

The other (and more interesting) thing that operating systems do is run programs. It is here that DOS seems more godlike than troll-like. When you want to run a program on your computer, you type its name at the DOS command line. DOS goes out and searches one or more disk drives for the named

program, loads it into memory at a convenient spot, sets the instruction pointer to the start of the program, and boots the CPU in the rear to get it going.

DOS then patiently waits for the program to run its course and stop. When the program stops, it hands the CPU obediently back to DOS, which again tilts a hand to its ear and listens for your next command from the command line.

So, as programmers, we use DOS two ways: One is as a sort of toolkit, an army of trolls if you will, each of which can perform some service for your program, thereby saving your program that effort. The other is as a means of loading a program into memory and getting it going, and then catching the machine gracefully on the rebound when your program is through.

I mention DOS again and again in this book. Everywhere you look in 16-bit assembly language, you're going to see the old troll's face. Get used to it.

DOS Files: Magnetic Memory

Very simply, DOS files are memory banks stored on a magnetic coating rather than inside silicon chips. A DOS file contains some number of bytes, stored in a specific order. One major difference from RAM memory is that DOS files stored on disk are sequential-access memory banks.

A disk (be it floppy or hard) is a circular platform coated with magnetic plastic of some sort. (Here, magnetic plastic is simply a polymer in which iron oxide particles or something similar is embedded.) In a floppy disk drive, the platform is a flexible disk of tough plastic; in a hard disk, the platform is a rigid platter of aluminum metal. Data is stored as little magnetic disturbances on the plastic coating in a fashion similar to that used in audio cassettes and VCRs. A sensor called a read/write head sits very close beside the rotating platform and waits for the data to pass by.

A simplified illustration of a rotating disk device is shown in Figure 4.1. The area of the disk is divided into concentric circles called tracks. The tracks are further divided radially into sectors. A sector

(typically containing

Figure 4.1: Rotating disk storage.

512 bytes) is the smallest unit of storage that can be read or written at one time. A DOS disk file consists of one or more sectors containing the file's data.

The read/write head is mounted on a sliding shaft that is controlled by a solenoid mechanism. The solenoid can move the head horizontally to position the head over a specific track. (In Figure 4.1, the head is positioned over track 2—counting from 0, remember!) However, once the head is over a particular track, it has to count sectors until the sector it needs passes beneath it. The tracks can be accessed at random, just like bytes in the computer's memory banks, but the sectors within a track must be accessed sequentially.

Perhaps the single most valuable service DOS provides is handling the headaches of distributing data onto empty sectors on a disk. Programs can hand sectors of data to DOS, one at a time, and let DOS worry about where on the disk they can be placed. Each sector has a number, and DOS keeps track of

what sectors belong together as a file. The first sector in a file might be stored on track 3, sector 9; the second sector might be stored on track 0, sector 4, and so on. You don't have to worry about that. When you ask for sector 0 of your file, DOS looks up its location in its private tables and goes directly to track 3, sector 9 and brings the sector's data back to you.

Binary Files

The data stored in a file are just binary bytes and can be anything at all. Files like this, where there are no restrictions on the contents of a file, are called binary files, since they can legally contain any binary code. Like all files, a binary file consists of some whole number of sectors, with each sector (typically) containing 512 bytes. The least space any file on your disk occupies is 512 bytes; when you see the DOS DIR command tell you a file has 17 bytes it in, that's the count of how many bytes were stored in that file. But like a walk-in closet with only one pair of shoes in it, the rest of the sector is still there, empty but occupying space on the disk.

A binary file has no structure, but is simply a long series of binary codes divided into numbered groups of 512 and stored out on disk in a scheme that for now is best left to DOS to understand. Later on, you can study up on it, especially once you learn more about Linux, in which entire file systems can be loaded as though they were just more programs—which, of course, they are.

Text Files

If you've ever tried to use the DOS TYPE command to display a binary file (like an .EXE or .COM file) to the screen, you've seen some odd things indeed. There's no reason for such files to be intelligible on the screen; they're intended for other "eyes," typically the CPU's.

There is a separate class of files that is specifically restricted to containing human-readable information. These are text files, because they contain the letters, digits, and symbols of which printed human information (text) is composed.

Unlike binary files, text files have a certain structure to them. The characters in text files are divided into lines. A line in a text file is defined not so much by what it contains as by how it ends. A special series of invisible characters called an end-of-line (EOL) marker tags the end of a line. The first line in a text file runs from the first byte in the file to the first EOL marker; the second line starts immediately after the first EOL marker and runs to the second EOL marker, and so on. The text characters falling between two sequential EOL markers are considered a single line.

This scheme is the same for both DOS and Linux. What differs is the exact nature of the EOL marker. The EOL marker for DOS is not one character but two: the carriage return character (called CR by those who know and love it) followed by the linefeed character (similarly called LF). You don't see these characters on the screen as separate symbols, but you see what they do: They end the line. Anywhere a line ends in an ordinary DOS text file, you'll find a mostly invisible partnership of one CR character and one LF character hanging out. With Linux things are different: a single LF, without a partner CR.

Why two characters to end a line in a DOS text file? Long ago, there was (and still is, at hamfests) an incredible mechanical nightmare called a Teletype machine. These were invented during World War II as robot typewriters that could send written messages over long distances through electrical signals that could pass over wires. It was a separate mechanical operation to return the typing carriage to the left margin of the paper (carriage return) and another to feed the paper up one line to expose the next clean line of paper to the typing carriage (line feed). A separate electrical signal was required to do each of these operations, and while I don't know why that was necessary, it has carried over into the dawn of the twenty-first century in the form of those two characters, CR and LF. Not only is this a case of the tail wagging the dog, it's a case of the tail walking around 30 years after the poor dog rolled over and died.

Figure 4.2 shows how CR and LF divide what might otherwise be a single meaningless string of characters into a structured sequence of lines. It's important to understand the structure of a text file because that structure dictates how some important software tools operate, as I explain a little later.

Figure 4.2: The structure of a DOS text file.

The CR character is actually character 13 in the ASCII character set summarized in Appendix D. The LF character is character 10. They are two of a set of several invisible characters called whitespace, indicating their role in positioning visible text characters ('a', '*', etc.) within the white space of a text page. The other whitespace characters include the space character itself (character 32), the tab character (character 9), and the form feed character (character 12), which can optionally divide a text file further into pages.

Living Fossils

Another character, the bell character (BEL), falls in between binary and text characters. When displayed or printed, it signals that a tone should be sounded. Back in the old Teletype days, the BEL character caused the teletype machine to ring its bell—which was literally a mechanical bell struck by a little hammer. BEL characters are allowed in text files, but are little used these days and considered sloppy practice. Many modern printers and most displays don't handle them correctly anyway; like the CR/LF pair, they are a barely surviving remnant of an increasingly fossilized past.

Another one of these fossilized characters will eventually cause you some trouble: the end-of-file (EOF) marker character. Unlike EOL, EOF is a single character, ASCII character 26, sometimes written as Ctrl+Z because you will generate the EOF character by holding the control key down and pressing the Z key.

The EOF character, properly, is not a DOS convention at all. DOS inherited EOF from the even older days of CP/M-80, which reigned between 1976 and 1982. In CP/M's archaic file system, there was no precise count of how many bytes were present in a text file. The operating system counted how many disk sectors were allocated to a text file, but within the last sector CP/M could not simply count its way to the final byte. Instead, CP/M insisted on there being an end-of-file marker at the very end of the significant data and would ignore anything after that marker.

DOS and Windows, by contrast, keep a precise count of how many characters are present in a text file, and therefore do not require any sort of EOF marker at all. However, some older DOS utilities

recognize EOF, as a nod to older CP/M text files that were sometimes carried forward into the DOS world. As character 26 (Ctrl+Z) is not a displayable character and not true white space, this ordinarily did no harm. However, some editors and other utilities will not display or manipulate text past an embedded Ctrl+Z.

Some DOS utilities recognize EOF, and some do not. If you find a text file that seems to end prematurely, use a binary viewer such as DEBUG (more on which shortly) to see if a Ctrl+Z character has found its way into the interior of the file. Ctrl+Z is not otherwise useful in any text files I'm aware of, so removing it will not damage the file.

Keep in mind that this only applies to text files. Binary files may contain any character values at all, and thus may be shot full of Ctrl+Z characters, any or all of which may be vital to the file's usefulness. We return to the issue of inspecting and changing the contents of binary files in a little while.

Text Editors

Manipulating a text file is done with a program called a text editor. A text editor is a word processor for program source code files. In its simplest form, a text editor works like this: You type characters at the keyboard, and they appear on the screen. When you press the Enter key, an EOL marker (for DOS, the two characters CR and LF) is placed at the end of a line, and the cursor moves down to the next line.

The editor also allows you to move the cursor back up into text you've already entered, in order to change it. You can delete words and whole lines and replace them with newly entered text.

Ultimately, when you decide that you're finished, you press some key like F2, or some combination of keys like Ctrl+K+D, and the text editor saves the text you entered from the keyboard as a text file. This text file is the source code file you'll eventually present to the assembler for processing. Later on, you can load that same text file back into the editor to make repairs on faulty lines that cause errors during assembly or bugs during execution.

It's possible to use a word processor as a program text editor. In older times, many programmers used WordStar, WordPerfect, Microsoft Word, and other available word processors to edit their program text. This works—as long as you remember to write your text file to disk in "non-document mode" or "ASCII text mode." Most true word processors embed countless strange little codes in their text files, to control such things as margin settings, font selections, headers and footers, and soft page and line breaks. These codes are not recognized ASCII characters but binary values and actually turn the document file from a text file to a binary file. The codes will give the assembler fits. If you write a program source code file to disk as a document file, it will not assemble. See the word processor documentation for details on how to export a document file as a pure ASCII text file.

Software was expensive in years past, and programmers (who tend to be cheap, yours truly not excluded) understandably wanted to get the most bang for their software budget and used word processors for everything they could. These days, software has become cheap or (increasingly) even free, and there are a multitude of plain ASCII text editors available freely for download from the Internet.

I'll even go you better than that. On the CD-ROM associated with this book I've arranged to distribute a programming text editor specifically designed for assembly language programmers—in fact, specifically designed to work seamlessly with the assembly that I teach in this book (which is also on the CD- ROM—what a deal!). NASM-IDE was written in Turbo Pascal, and its editor works a great deal like the editors you may have used in Borland's DOS-based programming products. I explain how to use NASM-IDE in great detail in the next chapter.

In earlier editions of this book I spoke of something called JED, which was a simple assemblyprogramming editor that I had written for my own use—also in Turbo Pascal. JED is history, and while you can still use it if you have it, it doesn't interface well with NASM, the assembler I teach throughout this book. NASM-IDE is a great deal like JED but much more sophisticated—and obviously, it was created to work with NASM.

If for some reason the CD-ROM didn't come to you with the book, both NASM-IDE and NASM itself can be downloaded from the Internet without charge, along with the listing files. See Appendix C, "Web URLS for Assembly Programmers."

If you have a text editor that you've used for some time and prefer, there's no reason not to use it. It just won't make following along with the text quite as easy.