Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

A Framework to Build On

We've been through some pretty substantial programs at the end of our DOS sojourn, so rather than start again with the most primitive "eat at Joe's" one-liner, I'll present a sort of boilerplate assembly program that provides some useful mechanisms that nearly all programs will find handy. The beginning and end are set up for you; when you want to create a new assembly language program for Linux, you just load the boilerplate program and fill in the middle with your own code.

So let's get started. Here it is. Read it over carefully:

; Source name

: BOILER.ASM

; Executable name : BOILER -- though this isn't intended to be run!

; Version

: 1.0

; Created date

: 10/1/1999

; Last update

: 10/18/1999

; Author

: Jeff Duntemann

; Description

: A "skeleton" program in assembly for Linux, using NASM 0.

;

 

; Build using these commands:

;nasm -f elf boiler.asm

;gcc boiler.o -o boiler

;

;HOWEVER, the program as given here is "boilerplate" and has nothing "useful

;to do. The idea is to give you a head start on new projects, by providing

;the things that every (or nearly every) simple Linux assembly program must

;have.

[SECTION .text]

; Section containing code

global main

; Required so linker can find entry point

main:

; Set up stack frame

push ebp

mov ebp,esp

; ebp is our "thumb" in the stack

push ebx

; Program must preserve ebp, ebx, esi, & edi

push esi

 

push edi

 

;;;Everything before this is boilerplate; use it for all ordinary apps!

;;;This is where you put your own code!

;;;Everything after this is boilerplate; use it for all ordinary apps!

pop

edi

; Restore saved registers

pop esi

 

pop ebx

; Destroy stack frame before returning

mov esp,ebp

pop ebp

; Return control to Linux

ret

 

[SECTION

.data]

; Section containing initialized data

[SECTION .bss]

; Section containing uninitialized data

Saving and Restoring Registers

One of the odder provisions of the C calling conventions that I described earlier is that a program may not arbitrarily change all general-purpose registers. To me this is dumb; if the operating system doesn't want an application to change certain registers, it should save those register values before handing control to the application. However, we must deal with what is, as they say, and the best way to do that is to just save the registers that must be saved before we begin, and restore them again before we pack it up and go home.

The registers that cannot be changed by a Linux application are EBX, ESP, EBP, ESI, and EDI. You'll notice that BOILER.ASM saves these registers onto the stack when the program begins, and then restores them from the stack before control returns to Linux.

One very important but extremely nonobvious conclusion you must draw from this requirement to save EBX, ESP, EBP, ESI, and EDI is that the other general-purpose registers may be trashed. Yes, trashed-and not only by you. When you call procedures written by other people-primarily in the standard C libraries and in utility libraries such as ncurses-those procedures may alter the values in EAX, ECX, and EDX. (The stack pointer ESP is a special case and needs special care of a sort not applicable to other registers.) What this means for you is that you cannot assume that (for example) a counter value you're tracking in ECX will be left untouched when you call a C library function such as printf. If you're using ECX to count passes through a loop that calls a library function-or any function that you yourself didn't write-you must save your value of ECX on the stack before you call the library function and restore it after the library function returns. The same applies to EAX and EDX. (EAX is often used to return values from library functions, so it's not a good idea to use it to store counters and addresses and such when you're making library function calls.) If you need to keep their values intact across a call to a library function, you must save them to the stack before the library function is called.

On the other hand, the sacred nature of EBX, EBP, ESI, and EDI means that these registers will keep their values when you make C library calls. What is binding on you is binding on the C library as well. Library functions that must use these registers save and restore them without any attention from you.

Setting Up a Stack Frame

The stack is extremely important in assembly language work, and this is doubly true in Linux work, because Linux is a C world, and in C (as in most high-level languages including Pascal) the stack has a central role. The reason for this is simple: Compilers are machines that write assembly language code, and they are not human and clever like you. (Although I've met some people who appear less intelligent than some of your better compilers . . .) This means a compiler has to use what might seem brute force methods to create its code, and most of those methods depend heavily on the use of the stack.

Compiler code generation is doctoral thesis stuff and I won't have much more to say about it in this book. One compiler mechanism that bears on Linux assembly work is that of the stack frame. Compilers depend on stack frames to create local variables in functions (in Pascal we call them procedures), and while stack frames are less useful in assembly work, you must understand them, because they provide an easy way to access command-line arguments and environment variables.

A stack frame is a location on the stack marked as belonging to a particular function. It is basically the region between the addresses contained in two registers: base pointer EBP, and stack pointer ESP. This draws better than it explains; see Figure 13.1.

Figure 13.1: A stack frame.

A stack frame is created by pushing the caller's copy of EBP on the stack to save it, and then copying the caller's stack pointer ESP into register EBP. The first two instructions in any assembly program that honors the C calling conventions must be these:

push ebp

mov ebp,esp

After this, you must either leave EBP alone, or else if you must use it in a serious pinch make sure you can restore it before the change violates any C library assumptions. (I recommend leaving it alone!) EBP is considered the anchor of your new stack frame, which is the main reason it shouldn't be changed. There are things stored on the stack above (that is, at higher addresses than) your stack frame that often need to be referenced in your code, and EBP is the only safe way to reference them. (These things aren't shown in Figure 13.1, but I return to them later in this chapter.)

Less obvious is the fact that EBP is also the hidey-hole in which you stash the caller's stack pointer value, ESP. This is yet another reason not to change EBP once you create your stack frame. Returning control at the end of your program with a random value in ESP is the shortest path to trouble I could name.

Once EBP is safely anchored as one end of your stack frame, the stack pointer ESP is free to move up and down the stack as required. The first things you need to put on the stack, however, are the caller's values for EBX, ESI, and EDI, as shown in Figure 13.1. The order in which these three are saved isn't crucial, but the order I show in Figure 13.1 is customary. They will be popped back off the stack when the stack frame is destroyed at the end of your program, handing back to the caller (which in our case is the startup/shutdown code from the C library) the same values those registers had when the startup code called your program as the function main.

But once EBX, ESI, and EDI are there, you can push and pop whatever you need to for temporary storage. Calling C library functions requires a fair amount of pushing and popping, as we see shortly.

Destroying a Stack Frame

Before your program ends its execution by returning control to the startup/shutdown code (refer back to Figure 12.2 if this relationship isn't clear), its stack frame must be destroyed. This sounds to many people like

something wrong is happening, but not so: The stack frame must be destroyed, or your program will crash. "Put away" might be a better term than "destroyed" . . . but let it pass. What we must do is leave the stack and the sacred registers in the same state they had when your program received control from the startup code.

Your stack must be clean before you destroy the stack frame and return control. This simply means that any temporary values that you may have pushed onto the stack during the program's run must be gone. All that is left on the stack should be the caller's EBP, EBX, ESI, and EDI values. Basically, if EDI was the last of the caller's values that you saved on the stack, ESP (the stack pointer) had better be pointing to that saved EDI value, or there will be trouble.

Once your stack is clean, to destroy the stack frame you must first pop the caller's register values back into their registers, making sure your pops are in the correct order. Handing back the caller's EBX value in EDI will still crash your program! With that done, we undo the logic we followed in creating the stack frame: We restore the caller's ESP by moving the value from EBP into ESP, and finally pop the caller's EBP value off the stack:

mov esp,ebp

pop ebp

That's it! The stack frame is gone, and the stack and sacred registers are now in the same state they were in when the startup code handed control to our program. It's now safe to execute the RET instruction that sends control to the shutdown code from the C library.

The file BOILER.ASM I showed earlier (it's on the CD-ROM for this book) is a boilerplate Linux assembly language program. It has a comment header, the three sections [.text], [.data], and [.bss], and all the code necessary to create and then destroy a stack frame. In between, you place the code for your own programs. All of the programs we create in the rest of this chapter will be built on this common framework.