Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Dr.Dobb's journal.2005.12

.PDF
Скачиваний:
25
Добавлен:
23.08.2013
Размер:
9.06 Mб
Скачать

also specifies an algorithm used to handle

Lessons Learned

course of a decade. Despite its complex in-

its bidirectional display. In turn, Unicode

It took the BBEdit engineers about 24

ternal workings and equally complex his-

uses the 16-bit Unicode Transformation

hours to make the changes described here

tory, the BBEdit port went smoothly. Solid

Format (UTF-16), which encodes the char-

so that BBEdit executed on the x86-based

code design was the key factor that made

acters as 16-bit values. (For the record,

Mac platform. What took much longer was

the code port a success. See the accompa-

there are variations of UTF used to encode

the regression testing that thoroughly

nying textbox entitled “Best Programming

8- and 32-bit values.)

stressed every feature of the application

Practice Rules” for a list of good program-

UTF-16 uses three encoding schemes to

to ensure that subtle side-effects due to

ming practices that BBEdit’s engineers fol-

describe the byte-ordering of the data:

Endian issues or the APIs weren’t over-

lowed to ensure the writing of quality code.

UTF-16BE, UTF-16LE, and UTF-16. In the

looked. This rigorous testing also had to

Siegel summarizes the situation: “There

first two schemes, the name describes the

verify that any changes to the code to fa-

were no unexpected bumps during the

byte ordering, where BE represents Big-

cilitate the port didn’t adversely affect the

migration process, particularly when you

Endian and LE represents Little-Endian.

execution of the Power PC version of the

consider we were using prerelease tools

For the third encoding, a two-byte se-

application. In all, testing took several

on a prerelease OS.” Bare Bones Software’s

quence at the start of the file makes up a

weeks. At the end of this testing, Bare

porting experience augers well for the mi-

Byte Order Mark (BOM) that describes the

Bones Software announced BBEdit 8.2.3,

gration of other Power PC-based Mac ap-

Endian encoding scheme.

which is now a universal binary.

plications to the x86-based Mac platform.

BBEdit 8.0, which was released in late

BBEdit is a sophisticated application that

DDJ

2004, uses the full Unicode conversion

has survived several platform ports over the

and rendering features of Mac OS X. These

 

 

APIs automatically read a file’s encoding

 

 

scheme and manage the data transfers and

 

 

file I/O appropriately. By choosing to use

 

 

Unicode early on, the Bare Bones Soft-

 

 

ware engineers not only expanded the

 

 

number of languages the editor could sup-

 

 

port, but also avoided what could have

 

 

been a serious problem with reading and

 

 

writing files when migrating to the x86-

 

 

based Mac platform.

 

 

Another key revision made in BBEdit

 

 

8.0’s code design was that the application

 

 

began using Mac OS X’s Preference Ser-

 

 

vices API, rather than storing binary data

 

 

in a custom resource. This modification

 

 

also side-stepped the Endian problem.

 

 

The biggest impact of the Endian is-

 

 

sues was in BBEdit’s UI, particularly its

 

 

menus. The menu choices were stored as

 

 

tabular data in a custom resource. A byte-

 

 

swapping routine had to be written to

 

 

massage the menu data so that it was

 

 

properly organized in memory for the x86

 

 

platform.

 

 

Interestingly, the plug-in mechanism

 

 

was unaffected by the Endian issue. Since

 

 

this mechanism passes data structures be-

 

 

tween methods, and these methods access

 

 

the structures at their native data sizes, the

 

 

order of the bytes didn’t matter. Howev-

 

 

er, older plug-in modules that used the

 

 

PEF binary format had to be recompiled

 

 

into the Mach-o format, as that is the ex-

 

 

ecutable binary format both Mac OS X

 

 

platforms will use going forward.

 

 

The only code compatibility issue that

 

 

the BBEdit engineers had to fix was that

 

 

Apple made a data structure in the Carbon

 

 

Alias Manager (which is used to locate files

 

 

by name only on a local drive) opaque.

 

 

That is, the data structure’s contents could

 

 

no longer be accessed directly. Specifical-

 

 

ly, the userType field (used to specify the

 

 

file’s type) in an AliasRecord was made

 

 

opaque. Rather than access that structure

 

 

directly, you now use getter and setter func-

 

 

tions to manage this information.

 

 

http://www.ddj.com

Dr. Dobb’s Journal, December 2005

59

SUMMER OF CODE

Google’s

Summer of Code

Google’s Summer of Code was a unique and exciting program in which student programmers were provided stipends for creating new open-source projects or helping established ones. Over the summer of 2005, Google funded more

than 400 projects to the tune of $5000 each, with $4500 going to the student and $500 to a mentoring organization. DDJ will be profiling some of the student participants over the coming months. Google’s open-source programs manager Chris DiBona and engineering manager Greg Stein led the Summer of Code project. DDJ recently talked to DiBona about the program.

DDJ: What was the original goal of the Summer of Code? CD: The original impetus behind the program was to ensure that budding computer scientists wouldn’t let their programming skills diminish over the summer while working in a noncomputerrelated job. We thought that if we could make it possible for these students to work with the open-source community then they would be exposed to a whole new, very real, class of problem. This would then lead to more open-source software developers, programs, and better developers overall.

DDJ: Did the final results meet your expectations?

CD: From the very beginning the students far exceeded Google’s and my personal expectations. The quality of the applications alone caused us to double the number of accepted students from 200 to 419, and I think that easily a thousand of them proposed acceptable applications. Now that the program is over, the early results are pretty terrific, showing around 80 percent of the students having succeeded to execute on their projects to their mentors’ satisfaction.

DDJ: What was the biggest surprise coming out of SoC?

CD: Just how advanced some of the projects ended up being. I remember thinking when I saw some of the projects that there was no way someone new to a project could pull them off. One, a CIL back end for GCC, which allows for the creation of CLR code from any GCC front-end language, should be preposterously difficult to do, but the student not only completed it, but did it in such a way that amazed his mentor, Miguel De Icaza.

DDJ: What was the geographic distribution of participants? CD: We had 419 students taking part in the program from 49 countries. In the U.S. alone, we had students from 38 states.

DDJ: Are they representative of open source as they are today, or are they signs of things to come?

CD: I think that they are a little bit of the present and a big part of the future. Open source can be a little intimidating for the newcomer, and I think the Summer of Code helped to mix things up a bit and keep things fresh. Happily, a good number of the students have indicated that they intend to continue working on their open-source projects.

DDJ: Where does SoC go from here?

CD: Into the Fall, of course! We’re going to examine the feedback and make sure that the program was successful; if so, we may do another one next year.

DDJ

Apache Axis2 JMX Front

pache Axis2 is a highly extensible

goal of Axis2 JMX Front is to provide a

Java-based web-service engine. Its ex-

JMX management interface for monitor-

Atensibility comes mainly from the han-

ing and configuring Axis2 at runtime.

dler chain-based architecture. Axis2 allows

Axis2 JMX Front consists of a manage-

configuring these handlers and other feath-

ment class (MBean) named Axis2Manag-

ers mainly using XML files. There was no

er, which provides access to all config-

proper way to configure these settings

urable modules. It handles everything

while Axis2 was running in servers. The

regarding configuring various modules and

Name: Chathura C. Ekanayake Contact: ccekanayake@gmail.com

School: University of Moratuwa, Sri Lanka

Major: Computer Science and Engineering Project: Axis2 JMX Front

Project Page: http://wiki.apache.org/ws/SummerOfCode/2005/JMXFront/ Mentors: Deepal Jayasinghe and Srinath Perera

Mentoring Organization: Apache Software Foundation (http://www.apache.org/)

Chathura C. Ekanayake

provides a simple interface. This MBean has the functionality to configure settings of handlers, transport protocol handlers, and deployed services. For example, administrators can use this interface to turn

60

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

CL-GODB: A Common Lisp

GO Database Manipulation Library

CL-GODB is a new interface to the GO Database (http://www.geneontology

.org/) written in Common Lisp. The Gene Ontology (GO) is a collection of terms organized in a taxonomy representing a controlled vocabulary used to describe genes, gene products, their functions, and the processes they are involved in for a variety of organisms. The GO Database (GODB) represents the ontological information and gene product annotations in a convenient relational database format (the GO database uses MySQL).

Until now, there have been no interfaces to the database that use Common Lisp. This is inconvenient as there are Bioinformatics and Systems Biology tools that employ the language (BioLingua, GOALIE, and the BioCYC suite, for instance).

GOALIE, developed by Marco Antoniotti and Bud Mishra in NYU’s Bioinformatics Group (http://bioinformatics.nyu.edu/ ~marcoxa/work/GOALIE/) analyzes time

course data from micro-array clustering experiments. The CL-GODB library will be integrated into GOALIE, improving the tool’s functionality and efficiency.

The library works by building an incremental, as-needed, internal image of the GO database contents in core. This improves the speed of queries and facilitates the construction of more complex predicates that may be needed in an application such as GOALIE.

Users start by creating a handle that identifies their session and is linked to several hash indexes used in the incore caching. Once they have connected to their copy of the GO database, they have access to a variety of built-in SQL queries, which take advantage of the indexing and add to the stored data. The queries range from getting basic information about a term, to finding a term’s lineage using a choice of hierarchies.

As a testbed for the CL-GODB library, we built a GUI application that is available as a standalone executable. The CLGODB Viewer lets users browse the hierarchy with a graphical tree view and provides information about each term and

Samantha Kleinberg

its associated genes, in a manner similar to that of several other GO viewer applications available online.

Creating the CL-GODB was challenging at times, as it was my first project in Common Lisp. The biggest hurdle was making sure that case-sensitivity vagaries were taken care of, as Common Lisp and MySQL behave differently under Windows and UNIX. In the end, it did work and I learned more about the intricacies of SQL syntax than I ever wanted to know.

DDJ

Name: Samantha Kleinberg Contact: sjk267@nyu.edu School: New York University

Major: Physics and Computer Science

Project: CL-GODB

Project Page: http://common-lisp.net/project/cl-godb/ Mentor: Marco Antoniotti

Mentoring Organization: LispNYC (http://www.lispnyc.org/)

Figure 1: The CL-GODB user interface.

/Create Object String myObjectName =

"Axis2:type=management.MyObject"; MyObject myObject = new MyObject();

/Register myObject using JMXManager

JMXManager jmxManager = JMXManager.getJMXManager(); jmxManager.registerMBean(myObject, myObjectName);

Example 1: Using JMXManager.

off selected operations from web services, after they are deployed. This MBean is registered in an MBeanServer and published in a JMXConnectorServer. Remote management applications (JConsole, JManage, and so on) can access this MBean using RMI and call any function it provides. Therefore, administrators of Axis2 can log on to this interface to monitor and configure the system while it is running in servers.

They can also manage different Axis2 engines running in different servers as a collection (cluster) using this interface.

Axis2 JMX Front can be extended seamlessly with additional management functionality. Developers can add functions to the existing MBean or create separate MBeans without altering the rest of the code. After implementing a class with the required management functionality, they can call the

methods of the JMXManager class to register and publish objects of those classes as MBeans. Example 1 illustrates the use of JMXManager for registering a normal Java object named “myObject” as an MBean.

Axis2JMX Front uses the Apache commons.modeler package for registering MBeans. Therefore, MBean developers are not required to provide a separate interface for their management objects. JMX Front loads all the JMX-specific classes at runtime to make the Axis2 build independent of JMX libraries. It also provides a separate class named “JMXAdmin” to handle all JMX-related features. Axis2 engine can load this class at runtime to JMX-enable the system. This allows Axis2 JMX Front to be deployed as an optional package, which can be integrated to Axis2 at deploy time.

DDJ

http://www.ddj.com

Dr. Dobb’s Journal, December 2005

61

Wide Character Support in

NetBSD Curses Library

The current NetBSD curses library doesn’t support wide characters, which limits the use of NetBSD in countries with wide-character locales. The “Wide Character Support in curses” project adds wide-character support to the NetBSD curses library, complying with the X/Open Curses Reference

to provide internationalization and localization.

The difficulty of adding wide-character support to NetBSD curses lies in its internal character storage data structure and related functions, which assume an 8-bit character in each display cell. Adding wide-character support means adding a new character storage data structure to hold wide-character information. This structure holds not only the character but also the attributes, including any nonspacing characters associated with the display cell.

The internal character storage data structure adds two linked lists for foreground/background nonspacing characters and uses spare bits in the attribute field for the character width, which are required for multicolumn characters. There is one storage cell per column, but the width fields are set differently for a multicolumn character. For an m-column-wide character, the first cell holds the width of the character, and the other m–1 cells hold the position information in their width fields. This offset is negative, making it easy to detect a cell belonging to a multicolumn character.

To read a wide character from a keyboard, a distinction must be made between a function key sequence and a wide-character sequence. The keymap routines for narrow character input are used to detect function keys, and the stateful wide-character conversion routine mbrtowc( ) is used to assemble input bytes into a valid wide character.

Some existing narrow character routines have been modified

to work with wide Ruibiao Qiu characters. The new

storage data structure makes screen-refreshing code more complicated because the NetBSD curses library uses a hash function to determine if a screen needs to be refreshed. For wide-character support, the hash function must include the nonspacing characters as well to capture the changes in rendition. Another issue is when a character is added or deleted, a check must be made to detect if that character was part of a multicolumn character. All parts of the multicolumn character are removed in this case.

The modified curses library was tested with three widecharacter locales — Simplified Chinese, Traditional Chinese, and Japanese. Test results show that twice the memory is generally required to support wide characters.

DDJ

Name: Ruibiao Qiu

Contact: ruibiao@arl.wustl.edu School: Washington University

Major: Doctoral Candidate, Computer Science and Engineering Project: Wide Character Support in Curses

Project Page: http://netbsd-soc.sourceforge.net/projects/wcurses/ Mentors: Julian Coleman and Brett Lymn

Mentoring Organization: The NetBSD Project (http://www.netbsd.org/)

gjournal: FreeBSD GEOM Journaling Layer

The aim of the gjournal project is to create a data journaling layer for FreeBSD’s GEOM storage device layer. The idea of gjournal was born from the observation that FreeBSD doesn’t currently have a journaling filesystem, but in an early phase the specification

was extended to include copy-on-write (COW) functionality.

The GEOM subsystem is a modern kernel-based framework that manages pretty much all aspects of usage and control of storage devices. It’s based on the concept of classes. A GEOM class can be a source of data or it can implement data transformations in a completely transparent way. All classes can be arbitrary combined in a hierarchy in the form of a directed acyclic graph. Examples of existing GEOM classes are gmirror, which consumes two or more underlying class instances (called “geoms”) and provides one that duplicates and distributes I/O requests to them (a RAID 1 layer); and geom_dev, which consumes all disk device geoms and creates entries in the /dev filesystem hierarchy for them.

The gjournal is implemented as a GEOM class that consumes two geoms and produces one. The first of the two consumed ge-

Name: Ivan Voras Contact: ivoras@gmail.com

School: University of Zagreb

Major: Electrical Engineering and Computing

Project: gjournal

Project Page: http://wikitest.freebsd.org/moin.cgi/gjournal/ Mentors: Pawel Jakub Dawidek and Poul-Henning Kamp

Mentoring Organization: The FreeBSD Project (http://www.freebsd.org/)

oms is designated as a “data device” and the second as a “journal device.” The basic idea is to transform write requests to the produced geom

into sequential writes Ivan Voras to the journal device.

The class implements two kernel threads: A main worker thread to which I/O requests are delegated, and a helper thread used to asynchronously commit data from the journal to the data device.

In regular mode, the journal device is divided into two areas, one of which is used to record data until it’s filled — at which point, it’s scheduled for asynchronous commit. A timed callout is scheduled that periodically triggers the swap/commit process. Two journal formats are implemented — one optimized for speed that emphasizes sequentiality of writes to the journal device, and another that conserves space by keeping metadata for the journal in one place.

Unfortunately, the most used FreeBSD filesystem — the UFS — cannot be used with gjournal because this layer doesn’t distinguish metadata (for example, information about deleted but still referenced files) and requires a fsck run to correct references. The COW facility is functional and can be used for experimentation with filesystems.

DDJ

62

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

Wide-Character Format

String Vulnerabilities

Strategies for handling format string weaknesses

ROBERT C. SEACORD

The ISO/IEC C Language Specification (commonly referred to as “C99”) defines formatted output functions that operate on wide-character strings, as well as those functions that operate on multibytecharacter strings. The widecharacter formatted output functions include: fwprintf( ), wprintf( ), swprintf( ), vfwprintf( ), vswprintf( ), and vwprintf( ).

(There is no need for snwprintf() or vsnwprintf( ) functions because the swprintf( ) and vswprintf( ) include an output length argument.) These functions correspond functionally to the multibyte-character formatted output functions (that is, the similarly named functions with the “w” removed) except that they work on wide-character strings such as Unicode and not on multibyte-character strings such as ASCII. (A multibyte character is defined by the ISO/IEC 9899:1999 as a sequence of one or more bytes representing a member of the extended character set of either the source or the execution environment. ASCII strings are represented as multibyte-character strings, although all ASCII characters are represented as a single byte.)

Formatted output functions are susceptible to a class of vulnerabilities known as “format string” vulnerabilities. Format string vulnerabilities can occur when a format string (or a portion of a format string) is supplied by a user or other untrusted source. Listing One, for example, is a common programming idiom, particularly for UNIX command-line pro-

Robert is a senior vulnerability analyst for CERT/CC and author of Secure Coding in C and C++ (Addison-Wesley, 2005). He can be reached at rcs@cert.org.

grams. The program prints usage information for the command. However, because the executable may be renamed, the actual name of the program entered by users and specified in argv[0] is printed instead of a hardcoded name.

By calling this program using execl( ), attackers can specify an arbitrary string as the name for arg[0], as in Listing Two. In this case, the specified string is likely to cause the program to crash, as the printf( ) function on line 6 of Listing One attempts to read many more arguments off the stack than are actually available. However, this could be much, much worse. In addition to crashing a program (and possibly causing a denial-of-service attack), attackers can also exploit this vulnerability to view arbitrary memory or execute arbitrary code with the permissions of the vulnerable program. For example, attackers can execute arbitrary code by providing a format string of the form:

address advance-argptr %widthu%n

The address field contains a Little-endian encoded string; for example,\xdc\xf5\x42\ x01. The advance-argptr string consists of a series of format specifiers designed to advance the internal argument pointer within the formatted output function until it points to the address at the start of the format string. The %n conversion specifier at the end of the string writes out the number of characters output by the formatted output function. The %widthu conversion specifier advances the count to the required value. When processed by the format output function, this string writes an attacker-provided value (typically the address of some shellcode) to an attacker-specified address such as the return address on the stack. When the vulnerable function returns, control is transferred to the shellcode instead of the calling function, resulting in execution of arbitrary code with the permissions of the vulnerable program.

A detailed description of format string vulnerabilities and possible exploits with multibyte-character strings is presented in my book Secure Coding in C and C++

(Addison-Wesley, 2005; ISBN 0321335724) and by Scut/Team Teso (see “Exploiting

Format String Vulnerabilities,” http://www

.mindsec.com/files/formatstring-1.2.pdf). In this article, I focus on the vulnerabilities resulting from the incorrect use of wide-character formatted output functions.

Environment

Formatted output functions that operate on wide-character strings are also susceptible to format string vulnerabilities. To understand the effect of wide characters

“Formatted output functions that operate on wide-character strings are also susceptible to vulnerabilities”

on format string vulnerabilities, you must understand the interactions between the program and environment. Here, I examine the mechanisms used to manage these interactions for Windows and Visual C++.

Visual C++ defines a wide-character version of the main( ) function called wmain( ) that adheres to the Unicode programming model. Formal parameters to wmain( ) are declared in a similar manner to main( ):

int wmain( int argc[ ,

wchar_t *argv[ ] [, wchar_t *envp[ ] ] ] );

The argv and envp parameters to wmain( ) are of type wchar_t *. For programs declared using the wmain( ) function, Windows creates a wide-character environment at program startup that includes wide-character argument strings and optionally, a wide-character environment pointer to the program. When a program is declared using main( ), a multi- byte-character environment is created by the operating system at program startup.

http://www.ddj.com

Dr. Dobb’s Journal, December 2005

63

Typically, a programmer that uses wide characters internally will use wmain( ) to generate a wide-character environment, while a program that uses multibyte characters internally uses main( ) to generate a multibyte-character environment.

When a program specified for a multibytecharacter environment calls a widecharacter function that interacts with the environment (for example, the _wgetenv() or _wputenv() functions), a wide-character copy of the environment is created from the multibyte environment. Similarly, for a program declared using the wmain( ) function, a multibyte-character environment is created on the first call to _ putenv( ) or getenv( ).

When an ASCII environment is converted to Unicode, alternate bytes in the resulting Unicode string are null. This is a result of the standard Unicode representation of ASCII characters. For example, the ASCII representation for a is \x60 while the Unicode representation is \x6000. Alternating null bytes creates an interesting obstacle for shellcode writers.

Wide-Character

Format String Vulnerabilities

Listing Three illustrates how the widecharacter formatted output function wprintf( ) can be exploited by an attacker. This example was developed using Visual C++ .NET with Unicode defined and tested on a Windows 2000 Professional platform with an Intel Pentium 4 processor. Because the program was declared (on line 6) using wmain( ), the vulnerable program accepts wide-character strings directly from the environment that have not been explicitly or implicitly converted. The shellcode is declared on line 7 as a series of nop instructions instead of actual malicious code. Unfortunately, examples of malicious shellcode are not hard to locate on the Internet and elsewhere. The wchar_t array format_str is declared as an automatic (stack) variable on line 9. I’ll return to the mysterious float variable shortly.

The idea of the sample exploit is to create a wide-character format string such that the execution of this string by the wprintf( ) function call on line 31 results in the execution of the shellcode. This is typically accomplished by overwriting a return address on the stack with the address of the shellcode. However, any indirect address can also be used for this purpose. The modulo divisions are to ensure that each subsequent write can output a larger value while the remainder modulo 0x1000 preserves the required low-order byte.

Although most of the conversion specifications interpreted by the formatted output function are used to format output,

there is one that writes to memory. The %n conversion specifier writes the number of characters successfully output by the formatted output function (an integer value) to an address passed as an argument to the function. By providing the address of the return address, attackers can trick the formatted output function into overwriting the return address on the stack.

The number of characters output by the formatted output function can be influenced using the width and precision fields of a conversion specifier. The width field is controlled to output the exact number of characters required to write the address of the shellcode. Because there are some practical limitations to the size of the width and precision fields, the exploit writes out the first word of the address (line 20) followed by the second word (line 27). The first word is written to 0x0012f1e0 and the second word is offset by two bytes at 0x0012f1e2. These addresses are specified as part of the format string on lines 29–30.

The next trick is to get the argument pointer within the formatted output function to point to this address. Formatted output functions are variadic functions, typically implemented using ANSI C stdargs. These functions have no real way of knowing how many arguments have been passed, so they will continue to consume arguments as long as there are additional conversion specifiers in the format string. Once the formatted output function has consumed all the actual arguments, the function’s argument pointer starts to traverse through the local stack variables and up through the stack. This makes it possible for attackers to insert the address of the return address in a local stack variable or, as in this case, as part of the format string that is also located on the stack. The address is typically added at the start of the format string, which is then output by the formatted output like any other character. The widecharacter formatted output functions, however, are more likely to exit when an invalid Unicode character is detected. As a result, an address included at the beginning of a malicious format string is likely to cause the function to exit without accomplishing the attacker’s goal of executing the shellcode because the address is unlikely to map to valid Unicode. This is not a significant obstacle to determined attackers, however, because the address can be moved to the end of the format string as in lines 29–30 in a wide-character exploit. These addresses may still cause the function to exit, but not before the return address has been overwritten.

The greatest problem introduced by moving the address pairs to the end of the string is that attackers must now

progress the argument pointer past the conversion specifiers used to advance the argument pointer to the start of the dummy integer/address pairs. This creates a race in that adding conversion specifiers increases the distance the argument pointer must be advanced to reach the dummy integer/address pairs. In ASCII, the conversion specifier %x requires two bytes to represent and, when processed, advances the argument pointer by four bytes. This means that each conversion specifier of this form narrows the gap between the argument pointer and the start of the dummy integer/address pairs by two bytes (that is, the four bytes the argument pointer is advanced minus the two bytes the start of the dummy integer/address pairs is advanced). The widecharacter representation of the %x conversion specifier requires four bytes to represent but only advances the argument pointer by four bytes (the length of an integer). As a result, the %x conversion specifier can no longer be used to gain ground on the dummy integer/address pairs.

Is there a conversion specifier that can be used to gain ground on the dummy integer/address pairs? One possibility is the use of a length modifier to indicate that the conversion specifier applies to a long long int or unsigned long long int. Because these data types are represented in eight bytes and not four, each conversion specifier advances the argument pointer by eight bytes. Visual C++ does not support the C99 ll length modifier but instead provides the I64 length modifier. A conversion specifier using the I64 length modifier takes the form %I64x. This conversion specifier requires five wide characters or 10 bytes to represent but, as already noted, only advances the argument pointer by eight bytes. Now you are actually losing ground! Using a compiler that supports the standard ll length modifier (such as GCC) is not much better because the conversion specifier requires four characters or eight bytes.

Another possibility is using the a, A, e, E, f, F, g, or G conversion specifiers to output a 64-bit floating-point number and thereby incrementing the argument pointer by eight bytes. For example, the conversion specifier %f requires two wide characters or four bytes to represent but advances the argument pointer by eight bytes, which lets attackers gain four bytes on the address for each conversion specifier processed by the formatted output function. Lines 11–14 in the wide-charac- ter exploit show how the %f conversion specifier can be used to advance the argument pointer. Line 11 also adds a single wide character (a) to properly align the argument pointer to the start of the

64

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

dummy integer/address pairs. The only problem with the %f conversion specifier is that it can cause the abnormal termination of the program if the floating-point subsystem is not loaded — hence, the extremely unlikely declaration of a float on line 10. In theory, this problem could limit the number of programs that could be attacked using this exploit. In practice, most nontrivial programs load the floating-point subsystem.

Venetian Shellcode

Again, when a Windows program is declared using main( ), an ASCII environment is created by the operating system for the program. If a wide-character representation of an environmental variable is required, it is generated on demand. Because the Unicode string is converted from ASCII, every other byte will be zero. For example, if the ASCII string “AAAA

is converted to Unicode, the result (in hexadecimal) is 00 41 00 41 00 41 00 41. This creates an interesting obstacle for exploit writers.

Chris Anley has done some work (see “Creating Arbitrary Shellcode In Unicode Expanded Strings: The “Venetian” Exploit; http://www.ngssoftware.com/papers/ unicodebo.pdf) in creating Venetian shellcode with alternating zero bytes (analogous to Venetian blinds). While creating these programs by hand is quite troublesome, Dave Aitel’s makeunicode2.py and Phenoellit’s “vense” generator are both capable of automatically generating Venetian shellcode.

Conclusion

Wide-character formatted output functions are susceptible to format string and buffer overflow vulnerabilities in a similar manner to multibyte-character formatted out-

put functions, even in the extraordinary case where Unicode strings are converted from ASCII.

Unicode actually has characteristics that make it easier to exploit functions that use these strings. For example, multibytecharacter functions recognize a null byte as the end of a string, making it impossible to embed a null byte (\x00) in the middle of a string. The null character in Unicode, however, is represented by \x0000. Because Unicode characters can contain null bytes, it is easier to inject a broader range of addresses into a Unicode string.

There are a number of mitigation strategies for format string vulnerabilities. The simplest solution that works for both multibyteand wide-character strings is to never allow (potentially malicious) users to control the contents of the format string.

DDJ

Listing One

1.#include <stdio.h>

2.#include <string.h>

3.void usage(char *pname) {

4.char usageStr[1024];

5.snprintf(usageStr, 1024,

"Usage: %s <target>\n", pname);

6.printf(usageStr);

7.}

8.int main(int argc, char * argv[]) {

9.if (argc < 2) {

10.usage(argv[0]);

11.exit(-1);

12.}

13.}

Listing Two

1.#include <unistd.h>

2.#include <errno.h>

3.int main(void) {

4.execl("usage", "%s%s%s%s%s%s%s%s%s%s", NULL);

5.return(-1);

6.}

Listing Three

1.#include <stdio.h>

2.#include <string.h>

3.static unsigned int already_written, width_field;

4.static unsigned int write_word;

5.static wchar_t convert_spec[256];

6.int wmain(int argc, wchar_t *argv[], wchar_t *envp[]) {

7.unsigned char exploit_code[1024] = "\x90\x90\x90\x90\x90";

8.int i;

9.wchar_t format_str[1024];

10.float x = 5.3;

//advance argument pointer 63 x 4 bytes

11.wcscpy(format_str, L"a%f"); // 2 bytes filler

12.for (i=0; i < 63; i++) {

13.wcscat(format_str, L"%f");

14.}

15.already_written = 0x084d;

//first word

16.write_word = 0xfad8;

17.already_written %= 0x10000;

18.width_field = (write_word-already_written) % 0x10000;

19.if (width_field < 10) width_field += 0x10000;

20.swprintf(convert_spec, L"%%%du%%n", width_field);

21.wcscat(format_str, convert_spec);

//last word

22.already_written += width_field;

23.write_word = 0x0012;

24.already_written %= 0x10000;

25.width_field = (write_word-already_written) % 0x10000;

26.if (width_field < 10) width_field += 0x10000;

27.swprintf(convert_spec, L"%%%du%%n", width_field);

28.wcscat(format_str, convert_spec);

//two dummy int/address pairs

29.wcscat(format_str, L"ab\xf1e0\x0012");

30.wcscat(format_str, L"ab\xf1e2\x0012");

31.wprintf(format_str);

32.return 0;

33.}

DDJ

http://www.ddj.com

Dr. Dobb’s Journal, December 2005

65

Amazon Web Services

Small devices meet large databases

ASHISH MUNI

AND JUSTIN HANSEN

ScanZoom is an application that lets you use mobile camera phones to launch services and applications simply by taking a photo of a barcode.

For example, say you’re shopping at Fry’s Electronics and want to compare iPod prices with other retailers. You could take a picture of the barcode on the iPod packaging (or enter the barcode value manually) and ScanZoom finds out and reports on what that same iPod costs at competing retailers.

Developed at Scanbuy (where we work), ScanZoom can be installed by first downloading the application onto a PC from http://www.scanzoom.com/, and then installing it on mobile phones (via BlueTooth, infrared, WAP, or SMS), or by embeddable microchips.

In developing ScanZoom, we had two requirements that the data source needed to provide:

A large, extensive database.

Access to the Web without the need for monotonous manual entry or navigation.

Because of the size of Amazon’s product database and its vast amount of information, Amazon’s freely available web service API was the first resource we turned to. In particular, Amazon’s E- Commerce Service (ECS) provides access to all of the content on the product pages of the Amazon.com site in an XML format.

Ashish is the chief technology officer and Justin is a software engineer at Scanbuy. They can be reached at ashish@ scanbuy.com and justin@scanbuy.com, respectively.

From our perspective, Amazon’s ECS (http://www.amazon.com/aws/) revolutionizes what a web service can do for developers. The service gives you direct, real-time access to all of Amazon’s features, including consumer reviews, new and used product listings from individuals and companies, and a list of similar products that can be purchased along with almost all category-specific details that you can imagine. This service let us take ScanZoom to the next level and link mobile phones with the power of Amazon.

Providing information that would not be readily available to users holding products in their hands was essential. If all we offered were those basic details, what good would the application be to the consumer? Users would be able to see most, if not all, of the basic information while scanning the barcode on the package. ECS provided us with extra information, which would be helpful to a consumer in making a decision while shopping. For example, many people read consumer reviews before selecting a product to purchase. Through ECS, we can offer shoppers quick access to this information so that they can read each and every review that Amazon has on a product. Shoppers can then make an informed decision without having to research the product before leaving home. The application is also able to show similar products that are offered so that consumers are able to see other options and read reviews on those items before making a decision.

Still, pricing is one of the main reasons that consumers shop the Internet. However, comparing prices requires research, and many purchases are impulse buys made while shopping for other items and no research has been done. Through our application, users can instantly gain access to not only the price that Amazon offers, but to the price that retailers in the Amazon network provides. Because many of these retailers (Target and Circuit City, for instance) have large retail stores, they can go to another retailer to make their purchase at a discount. Users can also save more money by buying used items through Amazon. The used item prices are shown on the handset, along with the

seller’s rating and a description of item quality and condition.

Specific product information is joined by the item’s packaging information, such as DVD movie run time, CD release date and track listing, and the like. Having all of this information combined into one data feed made it possible for us to pull up all of this information with a single search. The process of integrating the web service into our current web application was also fairly simple because Amazon offers

“You can further reduce the time it takes to return data to the mobile handset by searching in exact categories”

a WSDL that we are able to connect to inside our C#.NET development environment. Creating a reference in the project to the URL of the ECS WSDL is all it took to have access to all the data types and functions necessary to gather our product information. Example 1 is a sample connection and query that returns an Amazon Item data type containing all the products from the search, along with the relevant item information. That’s all that’s involved in fetching Amazon’s product information.

This query is also very customizable. Because we are dealing with mobile devices with limited memory and bandwidth, we need to make sure that the page being retrieved by the mobile browser is as compact as possible. We are able to efficiently do that by limiting the searchRequest.ResponseGroup to an array of only the response groups that are necessary for what is being displayed on the phone. For example, if you only need to know the item’s price from Amazon along with some basic item details (such as category), you can simply have the response group set

66

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

1string subID = "<our subscription id>";

2 AWSECommerceService service = new AWSECommerceService();

3ItemSearch itemSearch = new ItemSearch();

4 ItemSearchRequest searchRequest = new ItemSearchRequest();

5 searchRequest.Keywords = keywords;

6 string[] responseGroup = newstring[]{"OfferFull","ItemAttributes","Reviews","Images"};

7 searchRequest.ResponseGroup = responseGroup;

8 searchRequest.SearchIndex = "Blended";

9

10 ItemSearchRequest[] searchRequests = newItemSearchRequest[]{searchRequest};

11 itemSearch.SubscriptionId = subID;

12 itemSearch.Request = searchRequests;

13 ItemSearchResponse response = service.ItemSearch(itemSearch);

14 Items info = response.Items[0];

15 Item[] items = info.Item;

16 return items[0];

Example 1: Sample connection and query that returns an Amazon Item data type.

REST versus SOAP

 

he proliferation of web services has

product pages) because navigating it

brought out an interesting focus on

would be nearly impossible. In addition,

TREST versus SOAP. Those in favor of

many of the stock web browsers avail-

REST argue for HTTP’s built-in security

able on mobile devices do not support

(SSL). On the other hand, SOAP offers

the full HTML collection and require a

flexibility in delivery through other

special XHTML Mobile Profile compliant

modes of transport by storing address

page. Because Amazon provides access

information in the SOAP envelope. While

to its information through XML, we were

Amazon ECS offers both scenarios to de-

free to format its content in a way that

velopers, we opted for SOAP because it

made it simple for the end user to un-

provides consistency across different ap-

derstand and navigate, even on primi-

plications.

tive cell-phone browsers, while still al-

This feature was especially important

lowing the functionality that one would

in developing ScanZoom since the screen

expect with more advanced PDA-type

of the cell phone is limited in both size

handsets. ECS also let us handle browser-

and resolution compared to that of typ-

compatibility issues, increasing the num-

ical displays. Because of this constraint,

ber of handsets that we can support.

we couldn’t simply redirect users to an

—A.M. and J.H.

HTML web page (such as Amazon’s

 

 

 

to “Small” (also offered are “Medium” and

ample, we needed to display products

“Large”). Those response groups contain

from different categories on the same page

an array of smaller response groups. In-

template. The item information that is re-

stead of picking “Medium” or “Large”

turned in the Item data type from ECS has

when you need more than just the basic

different names for item details depend-

information, you call on only those re-

ing on the item category. For instance, the

sponse groups that are needed, such as

artist of a CD would be reached from

the ones that are listed on line 6 in Ex-

Item.ItemAttributes.Artist, while the writ-

ample 1.

er of a book would be Item.ItemAttri-

You can further reduce the time it takes

butes.Writer. To solve this issue, we find

to return data to the mobile handset by

the Item.ItemAttributes.ProductCategory,

searching in exact categories with more

then list the item details based on that cat-

exact search criteria. In Example 1 (line

egory. This was the only issue that we

8), we are using a Blended SearchIndex.

faced and it was definitely not a major is-

However, you could specify Books or

sue, just a small bump in the road.

Movies instead. We don’t use this query

Using ECS, our application offers users

when doing the initial search based on

a mobile shopping and pricing guide that

UPC or ISBN, but when we are request-

is simple and fast to use. Developing the

ing the “Reviews” or “Similar Products,”

solution was almost as easy and let us of-

we use it to minimize the irrelevant re-

fer users something that they couldn’t pre-

sults and make the data set smaller.

viously get on their cell phone without a

Most of the development we did to in-

lot of keyboard pecking and squinting at

tegrate ECS into our web application was

a small screen.

as simple as Example 1. However, we did

DDJ

run into a snag here and there. For ex-

All Access for

Dr. Dobb’s Journal

Subscribers!

The CMP Developer Network includes thousands of articles from such publications as:

Dr. Dobb's Journal

C/C++ Users Journal

Software Development magazine

BYTE.com

The Perl Journal

As a paid subscriber to Dr. Dobb’s Journal you now have unlimited online access to ALL ARCHIVED AND CURRENT CONTENT from these five publications for FREE!

To activate, simply go to http://www.ddj.com/registration/

and register using the subscriber code found on your mailing label.

*Please Note: You only need to enter your Web ALL ACCESS code once to activate your upgraded membership.

In just minutes, all the software development content you have come to rely on for your professional needs will be at your fingertips. You will have access to an abundance of pragmatic, trusted information on

Windows/.NET, Java, C/C++, Unix, Linux, Algorithms, Databases, Testing and Debugging, and much, much more!

We’re excited to make all the CMP Developer Network content available to Dr. Dobb’s Journal subscribers. Thank you for your support and we hope you enjoy your ALL ACCESS membership – REGISTER TODAY!

Not yet a subscriber? Subscribe now or sign up for online-only access at: http://www.ddj.com/registration/

http://www.ddj.com

Dr. Dobb’s Journal, December 2005

67

W I N D O W S / . N E T D E V E L O P E R

Enterprise

Application Logging

Using the SQL Server

2005 Service Broker

JIM MANGIONE

While we routinely standardize project details such as naming conventions and abstraction layers of corporate architectures, standardized logging often gets overlooked. Consequently, each application ends up logging messages differently. Can we do better? Absolutely. For the enterprise to accurately gauge the health of its IT assets across applications and gather pertinent metrics to support those findings, we need true enterprise-level logging. In this article, I propose a common way to meet this objective, using Microsoft’s SQL Server 2005 as an example. In the process, I present a “Hello Broker” WinForm client using the Log4NET framework. The complete source code for the client-testing application (available electronically, see “Resource Center,” page 4) posts test messages to each of the log severities so you can

trace them through the queues.

The requirements for a logging system include:

A common schema for log messages.

A common set of severities.

Jim is a senior consultant in Global Medical Applications at Wyeth Pharmaceuticals. He can be reached at mangionej @yahoo.com.

A centralized, loosely coupled logging service with a single point of entry and published API.

A scalable architecture.

The ability to dynamically set up and change application severities/destinations centrally (without touching each individual application configuration).

These specifications— especially the scalability and asynchronous features — point to a traditional middleware, queuing-type implementation.

Logging Service Architecture

Figure 1 is a high-level view of the proposed logging service. Again, I am implementing a queuing-based system, which aids in scalability and performance of client applications using this service. At the core is the Central Log Queue, which accepts all log message types from all registered applications. Messages are published to this queue from the public API Log Publisher. This is the only way a message can enter the queue. Other publishers may exist to translate proprietary log messages to the standard message format our service accepts, but they must then call the Log Publisher to post the messages. (This is demonstrated in the Log4NET client implementation.)

The Central Log Queue has a single subscriber type, a message router that consumes each message and interrogates its contents to determine what destination queue(s) to route to. Messages are routed based on their severity and the application’s configuration details. Each application that wants to use this service must register by adding information into the log service metadatabase. For routing pur-

poses, a mapping of each severity to one or more destinations is all that is necessary. Logs that are published with severities that aren’t mapped simply vaporize — they are consumed and ignored so the queue doesn’t jam up.

“SQL Server 2005 includes a reliable messaging framework within its database engine”

Each destination is also a queue. This again facilitates a scalable architecture where the service can distribute the workload of more popular destinations across multiple queues. Each destination queue has its own subscriber type that, upon consuming a message, performs the particular function that places that message at the end point — the real destination. Here, the severity can be ignored for purposes of workflow because these subscribers are only interested in dumping all messages to its relevant destination. I have defined three destinations:

68

Dr. Dobb’s Journal, December 2005

http://www.ddj.com