Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Assembly Language Step by Step Programming with DOS and Linux 2nd Ed 2000.pdf
Скачиваний:
156
Добавлен:
17.08.2013
Размер:
4.44 Mб
Скачать
; Section containing initialized data

Characters Out

Enough warm-up-it's time to start writing programs! Actually, we've already been through a complete and assemble-able program called BOILER.ASM. However, if you assemble BOILER.ASM and run it, you won't see anything. It takes no real action and doesn't display any output. Making an "Eat at Joe's" program out of it requires that we make a C library function call to display text on the screen. This isn't particularly difficult, and it's good practice in learning the conventions for making C library calls from assembly. I explained the C calling conventions in some detail in an earlier section of this chapter, and now we'll actually put them to work. Consider the following assembly program, which is built on the BOILER.ASM foundation I showed you earlier:

; Source name

: EATLINUX.ASM

; Executable name : EATLINUX

; Version

: 1.0

; Created date

: 11/12/1999

; Last update

: 11/22/1999

; Author

: Jeff Duntemann

; Description

: A simple program in assembly for Linux, using NASM 0.98,

;demonstrating the use of the puts C library routine to display text.

;Build using these commands:

;nasm -f elf eatlinux.asm

;gcc eatlinux.o -o eatlinux

;

 

[SECTION .text]

; Section containing code

extern puts

; Required so linker can find entry point

global main

main:

; Set up stack frame for debugger

push ebp

mov ebp,esp

; Program must preserve ebp, ebx, esi, & edi

push ebx

push esi push edi

;;; Everything before this is boilerplate; use it for all ordinary apps!

push dword eatmsg ; Push a 32-bit pointer to the message on the stack call puts ; Call the clib function for displaying strings

add esp, 4 ; Clean stack by adjusting esp back 4 bytes

;;; Everything after this is boilerplate; use it for all ordinary apps!

pop edi

; Restore saved registers

pop esi

 

pop ebx

; Destroy stack frame before returning

mov esp,ebp

pop ebp

; Return control to Linux

ret

[SECTION .data]

eatmsg: db "Eat at Joe's!",10,0

[SECTION .bss]

; Section containing uninitialized data

The C library has a number of routines for displaying text to the screen. The simplest of all of them to understand is puts, which, as its name implies, puts a string to standard output. (I explain what standard output is very shortly.) Here's the code required to call puts from within an assembly program:

push dword eatmsg ; Push a 32-bit pointer to the message on the stack

call puts

; Call the clib function for displaying strings

add esp,4

; Call cleans stack by adjusting esp back 4 bytes

This is a wonderful example, in miniature, of the process you'll use to call most any C library routine. All library routines take their parameters on the stack, which means you have to push either numeric values that fit in 32 bits, or else pointers to strings or other larger data objects. In this case, we push a 32-bit pointer to a text string on the stack. The string itself is defined in the [.data] section of the program, and by now it should be pretty familiar:

eatmsg: db "Eat at Joe's!",10,0

Note well that in the PUSH instruction we specify eatmsg and not [eatmsg]. What we need to push is the address of eatmsg, and not the data that eatmsg contains. As you should recall from earlier chapters, when you reference the name of a data item you're actually referencing its address. To reference its contents you must surround it by brackets. Here, we leave out the brackets and thus push the string's address on the stack instead.

The text to be displayed is followed by two numbers: a 10 and a 0. The 10 is the numeric code for what Unix people call newline, which is the character that, when sent to the screen or to a text file, moves the current position to the left margin of the next line. In the x86 Unix world, newline is equivalent to ASCII linefeed, Ctrl-A, which has a numeric value of 10. On other hardware systems, newline might be something else entirely, but as long as you're working on Linux for the x86 processors, the 10 will be interpreted by the system as newline.

In Unix jargon the 0 is called a null, and it is used almost everywhere in the standard C library to indicate the end of a string. The puts library function displays the text at the location passed to it in the pointer pushed on the stack, from the first character up to the first null that it encounters. The null is important. If you don't append a null to the end of the string, puts will keep stuffing bytes from memory to the screen until it encounters a null somewhere up-memory of the original string-which could mean that hundreds of random garbage characters will appear on your screen.

The Three Standard Files

This is a good place to explain that puts and the other character-output library functions don't send text explicitly to your screen display. They send it to a special Unix mechanism called standard output, which is a destination to which you can send text. Standard output defaults to the screen display. Unless you redirect standard output to some other place (such as a disk-based text file), characters written to standard output will appear on your screen.

Standard output is one of three standard text streams that Linux will open and make available to a running Linux application, no matter how small. A stream is a logical file intended for use with text information. These are the three standard streams:

Standard output (stdout) which defaults to the screen display. It can be redirected to a text file or some other text-oriented device such as a printer.

Standard error (stderr) which also defaults to the screen display. The availability of this standard file allows programs to write their error messages to something other than the screen display, for debugging or logging purposes. This "something other" is typically a text file, which then provides a persistent record of what errors occurred during the program's execution.

Standard input (stdin) which (in contrast to stdout and stderr) is a source of text. It defaults to the system keyboard, but it can be redirected to a text file, which can allow you to drive a program with "canned" inputs stored in a separate file.

If your program sends text to standard output (which is what happens by default), you can redirect its output to a text file when executing the program on the Unix command line:

# ./eatlinux > eattext.txt

Here, instead of appearing on your screen, the text displayed by the EATLINUX program is sent to the text file eattext.txt instead.

I don't have the room in this book to discuss how to programmatically redirect the standard streams to other sources or destinations, but any good book on Unix or Linux C programming will explain it in detail. Like most everything else, it's nothing more complex than a function call.

Formatted Text with printf

The puts library routine may seem pretty useful, but compared to a few of its more sophisticated siblings, it's kid stuff. With puts you can only send a simple text string to a stream, without any sort of formatting. Worse, puts always includes a newline at the end of its display, whether you include one in your displayed string or not. (Notice when you run the executable program EATLINUX that there is a blank line after its output. That's the second newline, inserted by the puts routine.) This prevents you from using multiple calls to puts to output several text strings all on a single line.

About the best you can say for puts is that it has the virtue of simplicity. For nearly all of your character output needs, you're way better off using a much more powerful library routine: printf. The printf routine allows you to do a number of truly useful things, all with one function call:

Output text without a newline

Convert numeric data to text in numerous formats by passing formatting codes along with the data

Output text to a stream that includes multiple strings stored separately

If you've worked with C for more than half an hour, printf will be perfectly obvious to you, but for people coming from other languages (such as Pascal, which has no direct equivalent), it may take a little explaining.

The printf routine will gladly display a simple string like "Eat at Joe's!"-but you can merge other text strings and converted numeric data with that base string as it travels toward standard output, and show it all seamlessly together. This is done by dropping formatting codes into the base string, and then passing a data item to printf for each of those formatting codes, along with the base string. A formatting code begins with a percent sign and includes information relating to the type and size of the data item being merged with the base string, as well as how that information should be presented.

Let's look at a very simple example to start out. Here's a base string containing one formatting code:

"The answer is %d, and don't you forget it!"

The %d formatting code simply tells printf to convert a signed integer value to text, and substitute that text for the formatting code in the base string. Of course, you must now pass an integer value to printf (and I show you how that's done shortly), but when you do, printf will convert the integer to text and merge it with the base string as it sends text to the stream. If the decimal value passed is 42, on your screen you'll see this:

The answer is 42, and don't you forget it!

A formatting code actually has a fair amount of structure, and the printf mechanism as a whole has more wrinkles than I have room here to describe. Any good C reference will explain the whole thing in detail-one more reason why it's useful to know C before you attempt Linux assembly work. Table 13.2 lists the most common and useful ones.

Table 13.2: Common printf Formatting Codes

 

 

 

 

 

 

 

 

CODE

 

BASE

 

DESCRIPTION

 

 

 

 

 

 

 

 

 

%c

 

n/a

 

Displays a character as a character

 

 

 

 

 

 

 

 

%d

 

10

 

Converts an integer and displays it in decimal

 

 

 

 

 

 

 

 

 

%s

 

n/a

 

Displays a string as a string

 

 

 

 

 

 

 

 

%x

 

16

 

Converts an integer and displays it in hex

 

 

 

 

 

 

 

 

 

%%

 

n/a

 

Displays a percent sign

 

 

 

 

 

 

 

 

The most significant enhancement you can make to the formatting codes is to place an integer value between the % symbol and the code letter:

%5d

This code tells printf to display the value right-justified within a field 5 characters wide. If you don't put a field width value there, printf will simply give the value as much room as its digits require.

Passing Arguments to printf

The real challenge in working with printf, assuming you understand how it works logically, is knowing how to pass it all the arguments that it needs to pull off any particular display. Like the Writeln function in Pascal, printf has no set number of arguments. It can take as few arguments as one base string, or as many arguments as you need, including additional strings, character values, and numeric values of various sorts.

All arguments to C library functions are passed on the stack. This is done either directly, by pushing the argument value itself on the stack, or indirectly, by pushing a 32-bit pointer to the argument onto the stack. For 32-bit or 64-bit data values, you push the values themselves onto the stack. (The big instruction set win with protected mode is that you can push immediate values onto the stack, something that was impossible prior to the introduction of the 386.) For larger data items such as strings and arrays, you push a pointer to the items onto the stack.

When there are multiple arguments passed to printf, they all have to be pushed onto the stack, and in a very particular and nonintuitive order: from right to left as they would appear if you were to call printf() from C. The base string is considered the leftmost argument and is always pushed onto the stack last. A simple example will help here:

printf('%d + %d = %d ... for large values of %d.',2,2,5,2);

This is a C statement that calls printf(). The base string is enclosed in quotes and is the first argument. After the string are several numeric arguments. There must be one numeric value for each of the %d formatting codes embedded in the base string. The order that these items must go onto the stack is from the right reading toward the left: 2,5,2,2, and finally the base string. In assembly, you'd do it this way:

push dword 2 push dword 5 push dword 2 push dword 2

push dword mathmsg call printf

add esp,20

The identifier mathmsg is the base string, and its address is pushed last of all the arguments. Remember that you don't push the string itself onto the stack. You push the string's address, and the C library code will follow the address and fetch the string's data using its own machinery.

The ADD instruction at the end of the sequence represents what you'll hear described as "cleaning up the stack." Each time you push something onto the stack with a PUSH instruction, the stack pointer ESP

moves toward low memory by a number of bytes equal to the size of whatever was pushed. In our case here, all arguments are exactly 4 bytes in size. Five such arguments thus represent 20 bytes of change in ESP for the sake of making the call. After the call is done, ESP must be moved back to where it was before you started pushing arguments on the stack. By adding 20 to the value in ESP, the stack pointer moves back up by 20 bytes and will then be where it was before you began to set up the printf call.

If you forget to clean up the stack, or if you clean it up by the wrong number of bytes, your program will almost certainly throw a segmentation fault. Details-dare I call it neatness?-count!

Here's another example, in which three separate strings are merged at standard output by the call to

printf:

push dword dugongs

; Rightmost arg is pushed first

push dword mammals

; Next arg to the left

push dword setbase

; Base string is pushed last

call printf

; Make the printf call

add esp,12

; Stack cleanup: 3 args x 4 bytes = 12

[SECTION .data]

; Section containing initialized data

setbase db 'Does the set of %s contain the set of %s?',10,0 mammals db 'mammals',0

dugongs db 'dugongs',0

I haven't shown everything here for the sake of brevity-how often do you need to see the comment headers?-but by now you should be catching the sense of making calls to printf. The three crucial things to remember are these:

Arguments are pushed onto the stack from right to left, starting with the function call as it would be written in C. The base string is pushed last. If you're doing anything even a little complex with printf, it helps to write the call out first in C form, and then translate it from there into assembly.

After the call to printf, you must add to ESP a value equal to the total size of all arguments pushed onto the stack. Don't forget that for strings you're pushing the address of the string and not the data contained in the string! For most arguments this will be 4 bytes.

The printf function call trashes everything but the sacred registers. Don't expect to keep values in other registers intact through a call to printf! (If you try to keep a counter value in ECX while executing a loop that calls printf, the call to printf will destroy the value in your counter. You must save ECX on the stack before each call to a library function, and restore it after the library call returnsor use a sacred register such as ESI, EDI, or EBX.)

If you can't get a printf call to work in assembly, write up a simple one-liner C program containing the call, and see if it works there. If it does, you're probably getting the order or number of the arguments wrong. Never forget that there must be one argument for each formatting code!