- •Table of Contents
- •Foreword
- •Do Not Pass GO
- •Counting in Martian
- •Octal: How the Grinch Stole Eight and Nine
- •Hexadecimal: Solving the Digit Shortage
- •From Hex to Decimal and from Decimal to Hex
- •Arithmetic in Hex
- •Binary
- •Hexadecimal as Shorthand for Binary
- •Switches, Transistors, and Memory
- •The Shop Foreman and the Assembly Line
- •The Box That Follows a Plan
- •DOS and DOS files
- •Compilers and Assemblers
- •The Assembly Language Development Process
- •DEBUG and How to Use It
- •Chapter 5: NASM-IDE: A Place to Stand Give me a lever long enough, and a place to stand, and I will move the Earth.
- •NASM-IDE's Place to Stand
- •Using NASM-IDE's Tools
- •NASM-IDE's Editor in Detail
- •Other NASM-IDE Features
- •The Nature of Segments
- •16-Bit and 32-Bit Registers
- •The Three Major Assembly Programming Models
- •Reading and Changing Registers with DEBUG
- •Assembling and Executing Machine Instructions with DEBUG
- •Machine Instructions and Their Operands
- •Reading and Using an Assembly Language Reference
- •Rally Round the Flags, Boys!
- •Using Type Specifiers
- •The Bones of an Assembly Language Program
- •Assembling and Running EAT.ASM
- •One Program, Three Segments
- •Last In, First Out via the Stack
- •Using DOS Services through INT
- •Boxes within Boxes
- •Using BIOS Services
- •Building External Libraries of Procedures
- •Creating and Using Macros
- •Bits Is Bits (and Bytes Is Bits)
- •Shifting Bits
- •Flags, Tests, and Branches
- •Assembly Odds 'n Ends
- •The Notion of an Assembly Language String
- •REP STOSW, the Software Machine Gun
- •The Semiautomatic Weapon: STOSW without REP
- •Storing Data to Discontinuous Strings
- •Chapter 12: The Programmer's View of Linux Tools and Skills to Help You Write Assembly Code under a True 32-Bit OS
- •Prerequisites-Yukkh!
- •NASM for Linux
- •What's GNU?
- •The make Utility and Dependencies
- •Using the GNU Debugger
- •Your Work Strategy
- •Genuflecting to the C Culture
- •A Framework to Build On
- •The Perks of Protected Mode
- •Characters Out
- •Characters In
- •Be a Time Lord
- •Generating Random Numbers
- •Accessing Command-Line Arguments
- •Simple File I/O
- •Conclusion: Not the End, But Only the Beginning
- •Where to Now?
- •Stepping off Square One
- •Notes on the Instruction Set Reference
- •AAA Adjust AL after BCD Addition
- •ADC Arithmetic Addition with Carry
- •ADD Arithmetic Addition
- •AND Logical AND
- •BT Bit Test (386+)
- •CALL Call Procedure
- •CLC Clear Carry Flag (CF)
- •CLD Clear Direction Flag (DF)
- •CMP Arithmetic Comparison
- •DEC Decrement Operand
- •IMUL Signed Integer Multiplication
- •INC Increment Operand
- •INT Software Interrupt
- •IRET Return from Interrupt
- •J? Jump on Condition
- •JMP Unconditional Jump
- •LEA Load Effective Address
- •MOV Move (Copy) Right Operand into Left Operand
- •NOP No Operation
- •NOT Logical NOT (One's Complement)
- •OR Logical OR
- •POP Pop Top of Stack into Operand
- •POPA Pop All 16-Bit Registers (286+)
- •POPF Pop Top of Stack into Flags
- •POPFD Pop Top of Stack into EFlags (386+)
- •PUSH Push Operand onto Top of Stack
- •PUSHA Push All 16-Bit GP Registers (286+)
- •PUSHAD Push All 32-Bit GP Registers (386+)
- •PUSHF Push 16-Bit Flags onto Stack
- •PUSHFD Push 32-Bit EFlags onto Stack (386+)
- •RET Return from Procedure
- •ROL Rotate Left
- •ROR Rotate Right
- •SBB Arithmetic Subtraction with Borrow
- •SHL Shift Left
- •SHR Shift Right
- •STC Set Carry Flag (CF)
- •STD Set Direction Flag (DF)
- •STOS Store String
- •SUB Arithmetic Subtraction
- •XCHG Exchange Operands
- •XOR Exclusive Or
- •Appendix C: Web URLs for Assembly Programmers
- •Appendix D: Segment Register Assumptions
- •Appendix E: What's on the CD-ROM?
- •Index
- •List of Figures
- •List of Tables
Accessing Command-Line Arguments
One of the most useful things to be able to do when writing simple utilities is to pass them parameters—C people call them arguments—on the command line. If you're working in C or Pascal, these are set up as predefined arrays or functions and are a snap to use. In assembly, there's no such convenience. (Surprise!) You have to know where and how they're stored, which is (alas) nontrivial.
On the other hand, getting at command-line arguments is a wonderful exercise both in the use of pointers and also of accessing the stack up-memory from EBP, where a number of interesting things live. EBP is your marker driven into the stack, and it anchors your access to both your own items (stored down-memory from EBP) and those owned by the runtime library, which are up-memory.
Because we're talking about a pointer to a pointer to a table of pointers to the actual argument strings, the best way to begin is to draw a picture of the lay of the land. Figure 13.3 shows the pointer relationships and stack structures we have to understand to identify and read the command-line arguments.
Figure 13.3: Linux command-line arguments.
As I explained earlier in this chapter, when Linux passes control to your program, the C library's startup code gets control first, and it sets up a number of things for you, before the code that you wrote ever begins executing. One of the things the startup code does is set up your program's access to the command-line arguments. It does this by building a table of pointers to the arguments and placing a pointer to that table of pointers on the stack, up-memory from EBP.
The startup code places other things on the stack as well. Immediately above EBP is the return address for your portion of the code. When your code is done, it executes a RET instruction, which takes execution back into the runtime library code's shutdown sequence. This RET instruction uses the return address just above EBP to take execution to the shutdown sequence. The shutdown sequence has its own return address, which eventually takes it back into Linux. You don't need to access the return address for anything; and certainly don't change it!
Immediately above the return address, at offset 8 from EBP (as the literature would say, at EBP+8) is an integer count of the number of arguments. There will always be at least one, because the name of the program is the first command-line argument. (After all, you typed the name on the command line, no?)
Immediately above the argument count, at EBP+12, is a pointer to the argument table. Immediately above that, at EBP+16, is a pointer to the table of environment variable pairs. Reading environment variable pairs is done pretty much the same way as reading command-line arguments, so if you understand one, you won't have much trouble with the other.
Addressing the Stack Relative to EBP
Our crooked trail to the command-line arguments begins at the address stored in EBP. EBP is your anchor point in the stack. It allows you to access the stack using more than just the PUSH and POP instructions. The stack is just memory, after all, and can be addressed through registers just as any area of memory can (assuming you have permissions in that area, one always has to say when speaking of Unix . . .).
Such addressing is done via offsets from the address stored in EBP. Here's a simple example:
mov ecx,[ebp+8] |
; Load argument count into ecx |
mov ebx,[ebp+12] ; Load pointer to argument table into ebx
The first instruction copies the argument count from the stack into ECX. If you refer to Figure 13.3, you'll see that the argument count is stored on the stack 8 bytes up-memory from EBP. So, by adding the displacement 8 to the address in EBP, you go right to it. Similarly, the second instruction copies the pointer to the argument table from its spot on the stack, 12 bytes up-memory from EBP, into EBX.
Once you have these two items in registers, you're most of the way there. With the pointer to the argument table in EBX (as the preceding code snippet shows), you now have a pointer to the first element in the argument table, which is always a pointer to the program name as you typed it on the command line. (You're following all this on Figure 13.3, aren't you?)
Scaled Addressing
One of the marvelous new features introduced on the x86 architecture with the 386 is scaled addressing. I described this earlier in this chapter, but it's worth recapping as I explain how to use it to access the rest of the arguments through the table of argument pointers.
We now have a pointer to the beginning of the argument pointer table, stored in EBX. Obtaining the other pointers in the table requires that we somehow index into that table. Scaled addressing is the best way to do it. With scaled addressing, we can multiply a register value by 2, 4, or 8, and add it to the base register to generate the final address.
Consider the argument pointer I marked as Arg(1) in Figure 13.3. It's a pointer, and like all pointers in protected mode flat model, it's 32 bits—4 bytes—in size. With another 4-byte pointer beneath it, Arg(1) is 4 bytes from the beginning of the pointer table.
So, let's do some pointer math. We start with the address of pointer Arg(0), lying at the very beginning of the table. We need to add 4 to it to reach Arg(1). Here's the algorithm:
<BASE POINTER> + (<ARGUMENT INDEX> X 4)
The base pointer we already have in EBX. The argument index for Arg(0) is 0, for Arg(1) is 1, and so on. The address for Arg(1) would thus be the base pointer plus 1 times 4—which points at the pointer to Arg(1). The address for Arg(3) would be the base pointer plus 3 times 4, or 12 bytes from the start of the table. As you can see in Figure 13.3, that's exactly where Arg(3) is. The way this encodes in NASM syntax is this:
push dword [ebx+esi*4] |
; Push address of an arg on the stack |
We're pushing the pointer onto the stack here, but scaled addressing can of course be used anywhere you can use a memory address in assembly work. The important part of the notation is [EBX + ESI * 4]. This is the implementation of our addressing algorithm, and it's baked right into the silicon of the CPU!
I've written a short program that displays all the command-line arguments, and in doing so demonstrates how to use scaled addressing to get the address of any given argument. Read it carefully:
; Source name |
: SHOWARGS.ASM |
; Executable name : SHOWARGS |
|
; Version |
: 1.0 |
; Created date |
: 10/1/1999 |
; Last update |
: 12/3/1999 |
; Author |
: Jeff Duntemann |
; Description |
: A demo that shows how to access command line arguments |
; |
stored on the stack by addressing them relative to ebp. |
; |
|
; Build using these commands:
;nasm -f elf showargs.asm
;gcc showargs.o -o showargs
;
;To test, execute with some command-line arguments:
;./showargs foo bar bas bat
[SECTION .text] |
; Section containing code |
global main |
; Required so linker can find entry point |
extern printf |
; Notify linker that we're calling printf |
main: |
; Set up stack frame for debugger |
push ebp |
|
mov ebp,esp |
; Program must preserve ebp, ebx, esi, & edi |
push ebx |
push esi push edi
;;; Everything before this is boilerplate; use it for all ordinary apps!
mov edi,[ebp+8] |
; Load |
argument count into edi |
||
mov ebx,[ebp+12] |
; |
Load |
pointer |
to argument table into ebx |
xor esi,esi |
; |
Clear esi to |
0 |
|
.showit: |
|
|
|
|
push |
dword [ebx+esi*4] ; Push address of an arg on the stack |
|
push |
esi |
; Push arg number on the stack |
push dword argmsg ; Push address of display string on the stack
call printf |
; Display the arg |
number and arg |
|||||
add esp, byte 12 ; |
Clean up |
stack after printf call |
|||||
inc esi |
; Bump arg |
number |
to |
next arg |
|||
dec |
edi |
; Decrement arg counter by |
1 |
||||
jnz |
.showit |
; |
If arg count is |
0, |
we're |
done |
;;; Everything after this is boilerplate; use it for all ordinary apps! pop edi ; Restore saved registers
pop esi pop ebx
mov esp,ebp pop ebp
ret
[SECTION .data] |
; Section |
containing initialized data |
|
argmsg |
db "Argument %d: %s",10,0 |
||
[SECTION .bss] |
; Section |
containing uninitialized data |
The logic I followed is this: We begin by copying the argument count into EDI and the pointer to the start of the argument pointer table into EBX. We clear ESI to 0 by XORing it against itself. With that accomplished, we go into a loop that pushes the argument pointer, the argument number, and a base string onto the stack and calls printf to display them. After printing each argument, we increment the argument number in ESI and decrement the argument count in EDI. When EDI goes to 0, we've displayed all the arguments, and we're done.
One final note on this program, which I've said before but must emphasize: If you're calling a C library function in a loop, you must either use the sacred registers to hold your counters that govern the loop, or you must push them onto the stack before making a library call. The library trashes the nonsacred registers such as EAX, ECX, and EDX. If you had tried to store the argument count in ECX, the count would have been destroyed the first time you called printf. The sacred nature of EBX, ESI, and EDI makes them ideal for this use. (EBP is reserved for use in addressing data on the stack, so don't try to use it for anything like counters unless you very carefully save its value on the stack!)
There is a pointer to a table of environment variables on the stack at EBP+16. It's set up pretty much the same way, so you could very easily create a program to print out all the environment strings in that table.
The major difference is this: There is no count of the number of environment variables stored anywhere. The end of the table of pointers to environment variables is marked by a null pointer; that is, a pointer whose value is 0. You have to fetch each pointer and test it against 0 before attempting to display data at the pointer address.
For tomorrow's assignment, modify SHOWARGS.ASM to display the environment variables as well. (I've written such a program, but it isn't printed here in the chapter. Find it on the CD-ROM to check your work: SHOWENV.ASM.)