Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

Your Work Strategy

There are smart ways to work and dumb ways to work. The dumb ways often get the same things done, but for twice the expended time. (Maybe more. How much is your time worth?) It pays to have an organized approach to any kind of programming work, and in this section I'm going to suggest a way of setting up your working environment so that you will waste as little time as possible.

Put Only One Project in One Directory

Traditional practice in the Unix world has long been "one makefile, one directory." What this means is that you should create a separate directory for every project whose end result is a single executable program file. Don't just create one directory for assembly language work and then fill it with umptyseveral different projects. This invites confusion, and it makes certain things (such as using the make facility) trickier and more error-prone.

If you confine each project to its own directory, you can keep the default make file named "makefile" and not worry about typing the name of the make file into EMACS each time you want to rebuild the project. (And with only one make file in the directory, you won't have to worry about accidentally invoking make on the wrong make file. I've done this. If you block on it, you'll soon be doubting your sanity.)

This also allows you to have standard names for test files, log files, and so on, that will be identical irrespective of which project you happen to be working on at any given time. If all the files were glommed together in one huge directory, you'd have to remember a whole set of unique names, one set for each project. Why bother? Directories cost little in disk space and do an enormous amount to manage complexity.

Consider EMACS Home

All of the various steps required for programming can be done right from inside EMACS. You can edit source code files and make files. You can assemble files and link them to generate executable files. You can run the executable program files to test them. You can invoke the GNU debugger. You can execute nearly any Unix command that can be issued from inside a Unix shell such as bash. Why waste time ducking in and out of EMACS as though it were nothing more than a text editor?

More than one book has been written about EMACS. I recommend the book Learning GNU EMACS by Debra Cameron, Bill Rosenblatt, and Eric S. Raymond (O'Reilly, 1996). My one gripe is that it doesn't cover the X Window version of EMACS specifically, but all the key commands are the same. I don't want to duplicate a lot of that book's excellent material here, and EMACS is relatively intuitive on the editing side.

The important big-picture thing to understand about EMACS is that it is buffer-based, and those buffers either may be related to disk files, or may simply contain other text that is not from a disk file. When you open a file, EMACS opens a buffer and loads text from the opened file into that buffer. You can also open a buffer as a scratch buffer, type something in it, cut or copy portions of that buffer into another buffer, and then just kill the scratch buffer (delete it) without saving it to disk. (There is a separate EMACS menu item for killing buffers.)

When EMACS runs the make facility, it pipes output from make and from the tools that make invokes into a new buffer. That buffer is the same as any other EMACS buffer, and if you want, you can give the buffer a name and save it to a disk file as a record of the make session. It does the same when you invoke the GNU debugger from inside EMACS: gdb's output is piped into a buffer, which you can save to disk if you choose for later reference.

Most usefully, you can invoke a Unix shell (I use bash) from inside EMACS, and EMACS will pipe its output into a new buffer, which like any buffer can be saved to disk. Especially while you're learning, there's very little that you'll need to do that can't be done either from the EMACS menus or from a shell opened from within EMACS.

Opening a Shell from inside EMACS

This last is worth explaining, because it is less obvious than most of the editing commands. There is currently no EMACS menu item that opens a shell in a window. (There should be!) To open a shell, the command is "Esc x shell." You press the Esc key followed by the lowercase x key (don't press both at once!) and, in its command line at the bottom of its window, EMACS will display the unhelpful string "M- x." This is its way of expressing the sequence Esc x on a PC. (The M stands for "Meta," which was the name of a control key on some ancient and mercifully forgotten minicomputer dumb terminal.) On other computers or terminals that may lack an Esc key, there may be other ways of initiating the command. EMACS was written to be portable. After the string "M-x" you must type "shell" and then press Enter.

EMACS will open a new buffer in a window and will begin piping shell output from the default shell into that window. At the top of the window will be your familiar shell prompt, waiting for you to type shell commands just as you did before you invoked EMACS. You can invoke the executables you build with make by naming them (usually prepended by "./") just as you would from the shell.

Note that you can exit the shell by typing "exit," but the window and buffer that EMACS opened for the shell will not go away by themselves. You have to kill the buffer as a separate operation, using the Files | Kill Current Buffer menu item.

I mentioned it earlier, but keep in mind that you can launch the GNU Debugger by selecting the EMACS menu item Tools | Debugger.

Chapter 13: Coding for Linux Applying What You've

Learned to a True Protected Mode Operating

System

Overview

Ican see the "fan" mail now: "How can you claim your book is about Linux assembly language when you don't present any Linux code until the very last chapter?" (I get notes like this every time the book I wrote isn't exactly the book that a reader has hoped to find.) The answer here, of course, is that this book isn't about Linux assembly language. It's about assembly language for Intel's x86 family of processors. Most people still start fooling around with x86 assembly under DOS, so that's where I started. Many who started with assembly under DOS would like to move on to something more powerful and more pertinent to real computing today, and more and more people see that destination as Linux.

So, whereas I began this book against a DOS backdrop, I'm finishing it against a Linux backdrop. The book, however, is about neither DOS nor Linux. Nearly everything that I've taught you so far applies to Linux as truly as DOS: addressing modes, machine instructions, and oneand two-level data tables, to name just a few. In truth, some things don't apply: real mode segmented model and DOS calls, primarily. The rest is as good under Linux as it is under DOS.

That being the case, you now have most of what you need to write assembly language programs for x86 processors under Linux. This chapter fills in the essentials of how Linux work differs from DOS work at the code level. If in fact there is a third edition of this book someday (and I hope there will be), I am considering rewriting it almost completely so that DOS at last vanishes into the mists of history, and we begin with Linux and stay with Linux throughout. You may be surprised at how little of what I've taught you will have to change. Stay tuned.

Genuflecting to the C Culture

I made it plain in the previous chapter that Linux was a C world from top to bottom. Some people think that by this I mean most of the programs written for Linux are written in C, that the people who created Linux were C people, and so on. True enough—but not enough truth. C was created for Unix, and Unix was created in C. The two evolved together and left indelible marks on one another. Even if Linux or some other species of Unix were reimplemented in Pascal (a very good idea, in my view), the C flavor would still be there, and would have to be there, or what we would have would not be Unix at all.

The Primacy of Libraries

Not all of this C culture is pertinent to assembly language work, but a good part of it is. The part that most affects assembly work, ironically, is the primacy of the standard C libraries. Linux and the standard C libraries are inseparable. The libraries are the way that applications and utilities communicate with the Linux kernel. They stand in place of the DOS INT 21H interface I explained in early chapters.

There are basically three reasons for this:

Portability. This is less important than it used to be, and for those of us who feel that the CPU wars were won by Intel long ago, it may not be important at all. But it's a fact that the standard C libraries were created to make the porting of Unix to other processors easier.

Complexity management. Linux is an order of magnitude (at least) more complex than DOS. It can do more, and can do it (thanks to some of that complexity) with far greater robustness and flexibility. Much of that complexity can be hidden from typical end-user utilities and applications, and the C library is the most important means by which that hiding is done.

Kernel evolution. Linux—like Unix itself—is a work in progress. One reason Unix has had such staying power is that it has been able to evolve to meet the needs of modern users on modern machines, irrespective of its origins on creaking ancient minicomputers with less processor power than a Wal-Mart video game. One reason that this has been possible is that the kernel is not much burdened by layers of "legacy obligations" like those that have made the DOS/Windows 9x chimera such an unholy and crash-prone muddle. The main reason it remains thus unburdened is that the kernel is off limits and not accessed directly by utility and application code. Any legacy burden is borne by the standard C library. The kernel is free to move in the directions that it must, and the standard C libraries are rewritten as necessary so that the same face is presented to utilities and applications.

The INT 80H Kernel Function Interface

This last item brings up a subject I'm asked about a lot: the Linux INT 80H kernel function call interface. Just as there is a software interrupt-based function call interface to DOS, there is a way to call the Linux kernel through software interrupts. Instead of INT 21H it uses INT 80H, but the basic idea is almost identical: You set up parameters in registers and then call INT 80H. There are over 200 kernel primitives that may be called this way. If you keep to these primitives, you don't need the C library.

The INT 80H interface seems to pull at the imaginations of people who have an aversion to C. Many of these are Europeans, on whose continent Pascal still thrives; and being a Pascal guy myself, I can well understand it. That being said, I advise against it, and I won't explain the INT 80H mechanism further in this book. Some information can be found at the Web site of Konstantin Boldyshev at http://lightning.voshod.com/asm. This is a marvelous (and humbling) site, and worth digesting for the context even if you never intend to try some of the tricks he describes.

The INT 80H interface is what the C library uses to communicate with the kernel, and the authors of Linux make it clear that they reserve the right to change the parameters and semantics (that is, what the calls do) of kernel primitives as necessary without notice or apology. If you make use of kernel primitives through INT 80H, your Linux programs will become version-specific. This is not a good thing and will not endear you to users of your software.

If you intend to do any kind of programming at all under Linux, you will have to cut a personal karmic

truce with the C language. If you intend to work in assembly, you will have to move beyond an uneasy truce (hey, is there ever an easy truce?) to active and willing collaboration. It can be done. I do it all the time.

Get used to it.

C Calling Conventions

One of the most peculiar things I learned early about Linux programs (peculiar to me, at least) is that the main portion of a Linux program is a subroutine call—called from the startup code linked in at the link stage. That is, when Linux executes a program, it loads that program into memory and runs it—but before your code runs, some standard library code runs, and then executes a CALL instruction to the main: label in the program. (Yes, ye purists and gurus, there is some other grimbling involved). This is the reason that the main program portion of a C program is called the main function. It really is a function, the standard C library code calls it, and it returns control to the standard C library code by executing a RET instruction. I diagrammed this in Figure 12.2 in the previous chapter, and it might be useful to take another look at the figure if this still isn't clear to you.

The way the main program obtains control is therefore the first example you'll see of a set of rules we call the C calling conventions. The C library is nothing if not consistent, and that is its greatest virtue. All C library functions implemented on x86 processors follow these rules. Bake them into your synapses early, and you'll lose a lot less hair than I did trying to figure them out by beating your head against them.

Perforce:

A procedure (which is the more generic term for what C calls a function) must preserve the values of the EBX, ESP, EBP, ESI, and EDI 32-bit registers. That is, although it may use those registers, when it returns control to its caller, the values those registers have must be the same values they had before the function was called. The contents of all other general-purpose registers may be altered at will. (Because Linux is a protected mode operating system, this pointedly does not include the segment registers, which are off limits and should not be altered for any reason.)

A procedure's return value is returned in EAX if it is a value 32 bits in size or smaller. Sixty-four-bit integer values are returned in EDX and EAX, with the low 32 bits in EAX and the high 32 bits in EDX. Floating-point return values are returned at the top of the floating-point stack. (I won't be covering floating-point numerics work in this book.) Strings, structures, and other items larger than 32 bits in size are returned by reference; that is, the procedure returns a pointer to them in EAX.

Parameters passed to procedures are pushed onto the stack in reverse order. That is, given the C function MyFunc(foo, bar, bas), bas is pushed onto the stack first, bar second, and foo last. More on this later.

Procedures do not remove parameters from the stack. The caller must do that after the procedure returns, either by popping the procedures off or (more commonly, since it is usually faster) by adding an offset to the stack pointer ESP. (Again, I'll explain what this means in detail later on, when we actually do it.)

Understanding these rules thoroughly will allow you to make calls to the multitude of functions in the standard C library, as well as other extremely useful libraries such as ncurses, all of which are written in C (either currently or originally) and follow the conventions as I've described them. Much of what I have to teach you about Linux assembly language work involves how to call library functions. Most of the rest of it is no different from DOS—and that you already know!