Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

NASM for Linux

Another (minor) reason that I chose NASM as the focus assembler for this book is that a very good implementation-still free-exists for Linux. I've included NASM for Linux, version 0.98, on the CD-ROM for this book. That's the version with which I wrote all the code examples published here. However, there's no saying how long this book will remain in use, and if it's for more than a year or so (and the first edition lasted over seven years), you might check the NASM Web site to see if a newer release is available at www.web-sites.co.uk/nasm/.

This is its home page in early 2000. If it moves in subsequent years, you may have to hunt with a Web search engine. My hunch is that it will always exist somewhere. Free software never dies, though it sometimes gets a little dusty.

You can download NASM in either source code form or in assembled binary form, as an RPM (Red Hat Package Manager) archive. Installing the RPM file might seem to be easier, but there's a catch: You must choose one of two different RPM archives, depending on whether you're using libc5 or libc6. If you know your Linux system well, you probably know which version of the C library it uses; on the other hand, if you're relatively new to Linux, you might not. That's why I have not included the RPM version on the CD-ROM but NASM's full source code in C, which you rebuild in the process of installing it.

Installing NASM's Source Code

Don't faint, newcomers. It's not that hard, and rebuilding tools is a fact of Linux life. Installing the source code and rebuilding it from scratch avoids the libc version problem, as gcc (the Linux C compiler) knows what C library it has, and it uses it to build the NASM assembler binary correctly. That's why you'll find the file nasm-0.98.tar on the CD-ROM for this book. A tar file is an archive file, like a .ZIP file in the DOS world, only without compression. It's simply a way to combine multiple files into one file for easy transport over a network.

Your Linux system probably has a directory /usr/local/src on it. That's a good place to start. (If it doesn't, consider creating a directory with that pathname.) Copy the nasm-0.98.tar file from the CDROM into /usr/local/src, and then use tar to extract all the files from it. The tar utility is one of my leastfavorite Unix utilities, because it has a whole different mindset for dealing with command-line parameters, and if you type something it doesn't understand or like, it will just sit there mute until you Ctrl-C out of it.

So, use this command line, and make sure you get it precisely as shown here:

tar xvf nasm-0.98.tar

Rebuilding NASM

Once you get tar to extract all the files from the archive, you'll notice that tar has created a new directory on your hard drive. Use cd to move to this directory:

cd nasm-0.98

There will be a fair number of files in this directory. The next step configures NASM's make files for rebuilding. You execute this step with the following command:

./configure

The configure step looks at your system, sees what C compilers you have installed, and tests those it finds for suitability. It looks to see what C library your system is using, checks a few other things, and finally creates the make files it will need to recreate the NASM binaries. Once configure has completed its job, you need to execute one very simple command:

./make

This will do a lot, though it won't take a great deal of time, especially if you have a reasonably fast machine and a fast hard drive. (Mine is a 400-MHz Pentium II and the whole build took about 15 seconds.) A great many obscure messages will flow by on your screen. Many of them will be warnings, but you don't need to be concerned about those-the compiler is simply complaining about things in the NASM source code that aren't simon-pure by its own reckoning. A warning is not an indication that the compiler can't understand something or generate correct code.

Once NASM is installed, it makes sense to add to your search path the path to the bin directory where NASM is installed. This command will do it:

PATH=$PATH:/usr/local/bin

Obviously, if you installed NASM somewhere else (and the preceding path is simply where the NASM make process installs it by default), enter the full path after the colon. At this point, NASM is there, installed as a brand-new binary, and ready to go to work.

But there's a lot to talk about first. NASM, like a lot of things in the Linux world, does not work alone, nor in a vacuum.

What's GNU?

Way back in the late 1970s, a wild-eyed Unix hacker named Richard Stallman wanted his own copy of Unix. He didn't want to pay for it, however, so he did the obvious thing: He began writing his own version. (If it's not obvious to you, well, you don't understand Unix culture.) However, he was unsatisfied with all the programming tools currently available and objected to their priciness as well. So, as a prerequisite to writing his own version of Unix, Stallman set out to write his own compiler, assembler, and debugger. (He had already written his own editor, the legendary EMACS.)

Stallman had named his version of Unix GNU, a recursive acronym meaning GNU's Not Unix. This was a good chuckle, and one way of getting past AT&T's trademark lawyers, who were fussy in those days about who used the word Unix and how. As time went on, the GNU tools (the C compiler and its other Swiss army knife go-alongs) took on a life of their own, and as it happened, Stallman never actually finished GNU itself. Other free versions of Unix appeared, and there was some soap opera for a few years regarding who actually owned what parts of which. This so disgusted Stallman that he created the Free Software Foundation as the home base for GNU tools development and created a radical sort of software license called the GNU Public License (GPL), which is sometimes informally called "copyleft." Stallman released the GNU tools under the GPL, which not only required that the software be free (including all source code), but prevented people from making minor mods to the software and claiming the derivative work as their own. Changes and improvements had to be given back to the GNU community.

This seemed to be major nuttiness at the time, but over the years since then it has taken on a peculiar logic and life of its own. The GPL has allowed software released under the GPL to evolve tremendously quickly, because large numbers of people were using it and improving it and giving back the improvements without charge or restriction. Out of this bubbling open source pot eventually arose Linux, the premier GPL operating system. Linux was built with and is maintained with the GNU tool set. If you're going to program under Linux, regardless of what language you're using, you will eventually use one or more of the GNU tools.

The Swiss Army Compiler

The copy of EMACS that you will find on modern distributions of Linux doesn't have a whole lot of Richard Stallman left in it—it's been rewritten umpteen times by many other people over the past 20odd years. Where the Stallman legacy persists most strongly is in the GNU compilers. There are a number of them, but the one that you must understand as thoroughly as possible is the GNU C Compiler, gcc. (Lowercase letters are something of an obsession in the Unix world, a fetish not well understood by a lot of people, myself included.)

Why use a C compiler for working in assembly? Two reasons:

Most of Linux and all of the standard C library for Linux are written in C for gcc. The C library is the only reasonable way to communicate with Linux from an assembly program. Gcc has a great deal of intimate knowledge of the standard C library that you'll need to learn if you choose not to use it. Love Linux, love gcc. There's no way around it.

More interestingly, gcc does much more than simply compile C code. It's a sort of Swiss army knife development tool. In fact, I might better characterize what it does as building software rather than simply compiling it. In addition to compiling C code to object code, gcc governs both the assembly step and the link step.

Assembly step? Yes, indeedy. There is a GNU assembler, gas. And a GNU linker, ld. What gcc does is control them like puppets on strings. If you use gcc (especially at the beginner level), you don't have to do much messing around with gas and ld.

Let's talk more about this.

Building Code the GNU Way

Assembly language work is a departure from C work, and gcc is first and foremost a C compiler. So, we need to look first at the process of building C code. On the surface, building a C program for Linux

using the GNU tools is pretty simple. Behind the scenes, however, it's a seriously hairy business. While it looks like gcc does all the work, what gcc really does is act as master controller for several GNU tools, supervising a code assembly line that you don't need to see unless you specifically want to.

Theoretically, this is all you need to do to generate an executable binary file from C source code:

gcc eatc.c -o eatc

Here, gcc takes the file eatc.c (which is a C source code file) and crunches it to produce the file eatc. (The -o option tells gcc what to name the executable output file.) Note well that in the Linux world, executable files typically do not have file extensions, as they do under DOS and Windows. What might be eatc.com or eatc.exe under DOS is simply eatc under Linux.

However, there's more going on here than meets the eye. Take a look at Figure 12.1 as we go through it. In the figure, shaded arrows indicate movement of information. Blank arrows indicate program control.

Figure 12.1: How gcc builds Linux executables.

The programmer invokes gcc from the shell command line. gcc takes control of the system and immediately invokes a utility called the C preprocessor, cpp. The preprocessor takes the original C source code file and handles certain items like #includes and #defines. It can be thought of as a sort of macro expansion pass on the source code file, if "macro expansion pass" means anything to you. If not, don't fret it—it's a C thing and not germane to assembly work.

When cpp is finished with its work, gcc takes over in earnest. From the preprocessed source code file, gcc generates an assembly language source code file with a .s file extension. This is literally the assembly code equivalent of the C statements in the original .c file, in human-readable form. If you develop any skill in reading AT&T assembly syntax and mnemonics, you can learn a lot from inspecting

the .s files produced by gcc.

When gcc has completed generating the assembly language equivalent of the C source code file, it invokes the GNU assembler, gas, to assemble the .s file into object code. This object code is written out in a file with a .o extension.

The final step involves the GNU linker, ld. The .o file contains binary code, but it's only the binary code generated from statements in the original .c file. The .o file does not contain the code from the standard C libraries that are so important in C programming. Those libraries have already been compiled and simply need to be linked into your application. The linker ld does this work at gcc's direction. The good part is that gcc knows precisely which of the standard C libraries need to be linked to your application to make it work, and it always includes the right libraries in their right versions. So, although gcc doesn't actually do the linking, it knows what needs to be linked—and that is valuable knowledge indeed, as you will learn if you ever try to invoke ld manually.

At the end of the line, ld spits out the fully linked and executable program file. At that point, the build is done, and gcc returns control to the Linux shell. Note that all of this is typically done with one simple command to gcc!

How We Use gcc in Assembly Work

The process I just described, and drew out for you in Figure 12.1, is how a C program is built under Linux using the GNU tools. I went into some detail here because we're going to use part—though only part—of this process to make our assembly programming easier. It's true that we don't need to convert C source code to assembly code—and in fact, we don't need gas to convert gas assembly source code to object code. But we need gcc's expertise at linking. Linking a Linux program is much more complex than linking a simple DOS program. So we're going to tap in to the GNU code-building process at the link stage, so that gcc can coordinate the link step for us.

When we assemble a Linux program using NASM, NASM generates a .o file containing binary object code. Invoking NASM under Linux is typically done this way:

nasm -f elf eatlinux.asm

This command will direct NASM to assemble the file eatlinux.asm and generate a file called eatlinux.o. The "-f elf" part of it tells NASM to generate object code in the ELF format (the acronym means Executable and Linking Format, so saying "ELF format" is redundant even though everyone does it) rather than one of the numerous other object code formats that NASM is capable of producing. The eatlinux.o file is not by itself executable. It needs to be linked. So, we call gcc and instruct it to link the program for us:

gcc eatlinux.o -o eatlinux

What of this tells gcc to link and not compile? The only input file called out in the command is a .o file containing object code. This fact alone tells gcc that all that needs to be done is to link the file with the C library to produce the final executable. The "-o eatlinux" tells gcc that the name of the final executable file is to be "eatlinux." (Remember that Linux does not use file extensions on executable program files.)

Including the -o specifier is important. If you don't tell gcc precisely what to name the final executable file, it will name that file "a.out." Yes, "a.out," every time—irrespective of what your object file or source files are called.

Why Not gas?

You might be wondering why, if there's a perfectly good assembler installed automatically with every copy of Linux, I'm bothering to show you how to install and use another one. First of all, there is no gas lookalike for DOS as best I know, so you can't take your first steps in gas assembly while working with DOS. But more important, gas uses a peculiar syntax that is utterly unlike that of all the other familiar assemblers used in the x86 world (MASM and TASM as well as NASM) and a whole set of instruction

mnemonics unique to itself. I find them ugly, nonintuitive, and hard to read. This is the AT&T syntax, so called because it was created by AT&T as a portable assembly notation to make Unix easier to port from one underlying CPU to another. It's ugly because it was designed to be generic, and it can be recast for any reasonable CPU you could come up with. (Don't forget that Unix significantly predates the x86, and gas's predecessor is older than the x86.)

If it were this simple, I wouldn't mention gas at all, since you don't need to use it to write Linux code in NASM. However, one of the major ways you'll end up learning many of the standard C library calls is by using them in short C programs and then inspecting the assembly output gcc generates. (I have more to say about this later on.) What gcc generates first when it compiles a C program is a file (with a .s extension) of assembly language source code using the AT&T syntax and mnemonics. It may not be necessary to learn the AT&T syntax thoroughly enough to write it, but it will be very helpful if you can pick it up well enough to read it. I'll show you an example later on, and when I do I'll summarize the important differences between AT&T and the NASM syntax and mnemonics, which are more properly called the Intel syntax and mnemonics.