Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать

Accessing Command-Line Arguments

One of the most useful things to be able to do when writing simple utilities is to pass them parameters—C people call them arguments—on the command line. If you're working in C or Pascal, these are set up as predefined arrays or functions and are a snap to use. In assembly, there's no such convenience. (Surprise!) You have to know where and how they're stored, which is (alas) nontrivial.

On the other hand, getting at command-line arguments is a wonderful exercise both in the use of pointers and also of accessing the stack up-memory from EBP, where a number of interesting things live. EBP is your marker driven into the stack, and it anchors your access to both your own items (stored down-memory from EBP) and those owned by the runtime library, which are up-memory.

Because we're talking about a pointer to a pointer to a table of pointers to the actual argument strings, the best way to begin is to draw a picture of the lay of the land. Figure 13.3 shows the pointer relationships and stack structures we have to understand to identify and read the command-line arguments.

Figure 13.3: Linux command-line arguments.

As I explained earlier in this chapter, when Linux passes control to your program, the C library's startup code gets control first, and it sets up a number of things for you, before the code that you wrote ever begins executing. One of the things the startup code does is set up your program's access to the command-line arguments. It does this by building a table of pointers to the arguments and placing a pointer to that table of pointers on the stack, up-memory from EBP.

The startup code places other things on the stack as well. Immediately above EBP is the return address for your portion of the code. When your code is done, it executes a RET instruction, which takes execution back into the runtime library code's shutdown sequence. This RET instruction uses the return address just above EBP to take execution to the shutdown sequence. The shutdown sequence has its own return address, which eventually takes it back into Linux. You don't need to access the return address for anything; and certainly don't change it!

Immediately above the return address, at offset 8 from EBP (as the literature would say, at EBP+8) is an integer count of the number of arguments. There will always be at least one, because the name of the program is the first command-line argument. (After all, you typed the name on the command line, no?)

Immediately above the argument count, at EBP+12, is a pointer to the argument table. Immediately above that, at EBP+16, is a pointer to the table of environment variable pairs. Reading environment variable pairs is done pretty much the same way as reading command-line arguments, so if you understand one, you won't have much trouble with the other.

Addressing the Stack Relative to EBP

Our crooked trail to the command-line arguments begins at the address stored in EBP. EBP is your anchor point in the stack. It allows you to access the stack using more than just the PUSH and POP instructions. The stack is just memory, after all, and can be addressed through registers just as any area of memory can (assuming you have permissions in that area, one always has to say when speaking of Unix . . .).

Such addressing is done via offsets from the address stored in EBP. Here's a simple example:

mov ecx,[ebp+8]

; Load argument count into ecx

mov ebx,[ebp+12] ; Load pointer to argument table into ebx

The first instruction copies the argument count from the stack into ECX. If you refer to Figure 13.3, you'll see that the argument count is stored on the stack 8 bytes up-memory from EBP. So, by adding the displacement 8 to the address in EBP, you go right to it. Similarly, the second instruction copies the pointer to the argument table from its spot on the stack, 12 bytes up-memory from EBP, into EBX.

Once you have these two items in registers, you're most of the way there. With the pointer to the argument table in EBX (as the preceding code snippet shows), you now have a pointer to the first element in the argument table, which is always a pointer to the program name as you typed it on the command line. (You're following all this on Figure 13.3, aren't you?)

Scaled Addressing

One of the marvelous new features introduced on the x86 architecture with the 386 is scaled addressing. I described this earlier in this chapter, but it's worth recapping as I explain how to use it to access the rest of the arguments through the table of argument pointers.

We now have a pointer to the beginning of the argument pointer table, stored in EBX. Obtaining the other pointers in the table requires that we somehow index into that table. Scaled addressing is the best way to do it. With scaled addressing, we can multiply a register value by 2, 4, or 8, and add it to the base register to generate the final address.

Consider the argument pointer I marked as Arg(1) in Figure 13.3. It's a pointer, and like all pointers in protected mode flat model, it's 32 bits—4 bytes—in size. With another 4-byte pointer beneath it, Arg(1) is 4 bytes from the beginning of the pointer table.

So, let's do some pointer math. We start with the address of pointer Arg(0), lying at the very beginning of the table. We need to add 4 to it to reach Arg(1). Here's the algorithm:

<BASE POINTER> + (<ARGUMENT INDEX> X 4)

The base pointer we already have in EBX. The argument index for Arg(0) is 0, for Arg(1) is 1, and so on. The address for Arg(1) would thus be the base pointer plus 1 times 4—which points at the pointer to Arg(1). The address for Arg(3) would be the base pointer plus 3 times 4, or 12 bytes from the start of the table. As you can see in Figure 13.3, that's exactly where Arg(3) is. The way this encodes in NASM syntax is this:

push dword [ebx+esi*4]

; Push address of an arg on the stack

We're pushing the pointer onto the stack here, but scaled addressing can of course be used anywhere you can use a memory address in assembly work. The important part of the notation is [EBX + ESI * 4]. This is the implementation of our addressing algorithm, and it's baked right into the silicon of the CPU!

I've written a short program that displays all the command-line arguments, and in doing so demonstrates how to use scaled addressing to get the address of any given argument. Read it carefully:

; Source name

: SHOWARGS.ASM

; Executable name : SHOWARGS

; Version

: 1.0

; Created date

: 10/1/1999

; Last update

: 12/3/1999

; Author

: Jeff Duntemann

; Description

: A demo that shows how to access command line arguments

;

stored on the stack by addressing them relative to ebp.

;

 

; Build using these commands:

; Destroy stack frame before returning
; Return control to Linux

;nasm -f elf showargs.asm

;gcc showargs.o -o showargs

;

;To test, execute with some command-line arguments:

;./showargs foo bar bas bat

[SECTION .text]

; Section containing code

global main

; Required so linker can find entry point

extern printf

; Notify linker that we're calling printf

main:

; Set up stack frame for debugger

push ebp

mov ebp,esp

; Program must preserve ebp, ebx, esi, & edi

push ebx

push esi push edi

;;; Everything before this is boilerplate; use it for all ordinary apps!

mov edi,[ebp+8]

; Load

argument count into edi

mov ebx,[ebp+12]

;

Load

pointer

to argument table into ebx

xor esi,esi

;

Clear esi to

0

.showit:

 

 

 

 

push

dword [ebx+esi*4] ; Push address of an arg on the stack

push

esi

; Push arg number on the stack

push dword argmsg ; Push address of display string on the stack

call printf

; Display the arg

number and arg

add esp, byte 12 ;

Clean up

stack after printf call

inc esi

; Bump arg

number

to

next arg

dec

edi

; Decrement arg counter by

1

jnz

.showit

;

If arg count is

0,

we're

done

;;; Everything after this is boilerplate; use it for all ordinary apps! pop edi ; Restore saved registers

pop esi pop ebx

mov esp,ebp pop ebp

ret

[SECTION .data]

; Section

containing initialized data

argmsg

db "Argument %d: %s",10,0

[SECTION .bss]

; Section

containing uninitialized data

The logic I followed is this: We begin by copying the argument count into EDI and the pointer to the start of the argument pointer table into EBX. We clear ESI to 0 by XORing it against itself. With that accomplished, we go into a loop that pushes the argument pointer, the argument number, and a base string onto the stack and calls printf to display them. After printing each argument, we increment the argument number in ESI and decrement the argument count in EDI. When EDI goes to 0, we've displayed all the arguments, and we're done.

One final note on this program, which I've said before but must emphasize: If you're calling a C library function in a loop, you must either use the sacred registers to hold your counters that govern the loop, or you must push them onto the stack before making a library call. The library trashes the nonsacred registers such as EAX, ECX, and EDX. If you had tried to store the argument count in ECX, the count would have been destroyed the first time you called printf. The sacred nature of EBX, ESI, and EDI makes them ideal for this use. (EBP is reserved for use in addressing data on the stack, so don't try to use it for anything like counters unless you very carefully save its value on the stack!)

There is a pointer to a table of environment variables on the stack at EBP+16. It's set up pretty much the same way, so you could very easily create a program to print out all the environment strings in that table.

The major difference is this: There is no count of the number of environment variables stored anywhere. The end of the table of pointers to environment variables is marked by a null pointer; that is, a pointer whose value is 0. You have to fetch each pointer and test it against 0 before attempting to display data at the pointer address.

For tomorrow's assignment, modify SHOWARGS.ASM to display the environment variables as well. (I've written such a program, but it isn't printed here in the chapter. Find it on the CD-ROM to check your work: SHOWENV.ASM.)