Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Dr.Dobb's journal.2006.02

.pdf
Скачиваний:
18
Добавлен:
23.08.2013
Размер:
2.38 Mб
Скачать

#381 FEBRUARY 2006

Dr.DobbsSOFTWARE

TOOLS FOR THE

PROFESSIONAL

J O U R N A L PROGRAMMER

http://www.ddj.com

64-BIT COMPUTING!

Multiplatform Porting to 64 Bits

Mac OS X & 64 Bits

Examining µC++

Native Queries for Persistent Objects

Dynamic Bytecode

Instrumentation

$ 4 . 95US $ 6 . 95CAN

 

Summer of Code

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

74470 01051

7

Range

Tracking &

Comparison

GIF Images &

Mobile

Phones

Inside Sudoku

Viewing &

Organizing

Log Files

Porting

Real-Time

Operating

Systems

C O N T E N T S

F E A T U R E S

FEBRUARY 2006 VOLUME 31, ISSUE 2

Multiplatform Porting to 64 Bits 20

by Brad Martin, Anita Rettinger, and Jasmit Singh

Porting 300,000 lines of 32-bit code to nearly a dozen 64-bit platforms requires careful planning.

Mac OS X Tiger & 64 Bits 26

by Rodney Mach

Before migrating to 64-bit platforms, the first question to ask is whether you really need to do so.

Ajax: Asynchronous JavaScript and XML 32

by Eric J. Bruno

Ajax, short for “Asynchronous JavaScript and XML,” lets you create dynamic web pages.

Examining C++ 36

by Peter A. Buhr and Richard C. Bilson

C++ was designed to provide high-level concurrency for C++.

Native Queries for Persistent Objects 41

by William R. Cook and Carl Rosenberger

Among other benefits, native queries overcome the shortcomings of string-based APIs.

Dynamic Bytecode Instrumentation 45

by Ian Formanek and Gregg Sporar

Dynamic bytecode instrumentation is an innovative technique that makes profiling fast and easy.

Range Tracking & Comparison Algorithms 50

by Kirk J. Krauss

Some information is best viewed as a list of ranges. Kirk presents algorithms for dealing with ranges.

Displaying GIF Images on J2ME Mobile Phones 52

by Tom Thompson

Surprisingly, many Java-based mobile phones couldn’t display GIF image files— until now.

Sudoku & Graph Theory 56

by Eytan Suchard, Raviv Yatom, and Eitan Shapir

Understanding graph theory is central to building your own Sudoku solver.

Google’s Summer of Code: Part III 58

by DDJ Staff and Friends

Google’s Summer of Code resulted in thousands and thousands of lines of code. Here are more students who participated.

Viewing & Organizing Log Files 61

by Phil Grenetz

LogChipper, the tool Phil presents here, lets you view and organize the contents of log files.

E M B E D D E D S Y S T E M S P R O G R A M M I N G

Porting an RTOS to a New Hardware Platform 65

by Byron Miller

Porting software to new hardware boards doesn’t need to be difficult.

C O L U M N S

Programming Paradigms 68

Chaos Manor 74

by Michael Swaine

by Jerry Pournelle

Everything Michael knows he attributes to Roger

Beware of Sony’s Digital Rights

Penrose’s The Road to Reality: A Complete

Management (DRM) scheme, which

Guide to the Laws of the Universe.

covertly installs itself.

Embedded Space 71

Programmer’s Bookshelf 77

by Ed Nisley

by Peter N. Roth

Ed remembers to tell you that memory really

Peter reviews Stephen C. Perry’s Core

does matter.

C# and .NET.

F O R U M

EDITORIAL 10

by Jonathan Erickson

LETTERS 12 by you

DR. ECCO’S

OMNIHEURIST CORNER 14 by Dennis E. Shasha

NEWS & VIEWS 16 by DDJ Staff

PRAGMATIC EXCEPTIONS 24 by Benjamin Booth

OF INTEREST 79 by DDJ Staff

SWAINE’S FLAMES 80 by Michael Swaine

NEXT MONTH: The smart thing to do in March is to read our issue on Intelligent Systems.

http://www.ddj.com

Dr. Dobb’s Journal, February 2006

5

D R . D O B B ’ S O N L I N E

C O N T E N

O n l i n e E x c l u s i v e s

http://www.ddj.com/exclusives/

VB6 to VB.NET Migration

There are millions of Visual Basic 6 developers and an enormous amount of VB6 code. What does the landscape look like for this tremendous pool of legacy code and talent?

The Obsolete Operating System

To some, the modern definition of a computer operating system is obsolete.

T S

T h e C / C + +

U s e r s J o u r n a l

http://www.cuj.com/

Flexible C++ #13: Beware Mixed Collection/Enumerator Interfaces

When the semantics of collection and enumerator interfaces are blurred, the result can mean trouble.

D o b b s c a s t A u d i o

http://www.ddj.com/podcast/

SysML: A Modeling Language for Systems Engineering

Chris Sibbald discusses SysML, a visual modeling language for systems engineering applications.

Computer Theft: A Growing Problem

Biometric and computer security expert Greg Chevalier discusses the growing problem of mobile computer theft, and what you can do to combat it.

AADL: A Design Language for Embedded Systems

Peter Feiler discusses the Architecture Analysis and Design Language, a textual and graphical language that supports modelbased engineering of embedded real-time systems.

COM Interop

.NET guru Juval Lowy explores how COM Interop can allow legacy VB6 applications to coexist in a .NET world.

W i n d o w s / . N E T

http://www.ddj.com/topics/windows/

An Overview of Generics

In the .NET Framework 2.0, C# and Visual Basic .NET support generics.

D o t n e t j u n k i e s

http://www.dotnetjunkies.com/

Top 10 Must-Have Features in O/R Mapping Tools

What features would a good O/R mapping tool provide you with and how can it be beneficial to you?

B Y T E . c o m

http://www.byte.com/

Why Can’t Windows Do Windows?

Multimedia apps require lots of desktop real estate, so having two or more displays can be the answer — if you can get them to work.

T h e N e w s S h o w

http://thenewsshow.tv/

The Feds and IT Failures

The IRS spent nearly $2 billion on business modernization before it began to process even 1 percent of tax returns.

R E S O U R C E

C E N T E R

As a service to our readers, source code, related files, and author guidelines are available at http://www.ddj.com/. Letters to the editor, article proposals and submissions, and inquiries should be sent to editors@ddj.com. For subscription questions, call 800-456-1215 (U.S. or Canada). For all other countries, call 902- 563-4753 or fax 902-563-4807. E-mail subscription questions to ddj@neodata.com, or write to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188.

If you want to change the information you receive from CMP and others about products and services, go to http://www.cmp.com/ feedback/permission.html or contact Customer Service at Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188.

Back issues may be purchased prepaid for $9.00 per copy (which includes shipping and handling). For issue availability, send e-mail to orders@cmp.com, fax to 785-838-7566, or call 800-444-4881 (U.S. and Canada) or 785- 838-7500 (all other countries). Please send payment to Dr. Dobb’s Journal, 4601 West 6th Street, Suite B, Lawrence, KS 66049-4189. Digital versions of back issues and individual articles can be purchased electronically at http://www.ddj.com/.

W E B S I T E

A C C O U N T A C T I VAT I O N

Dr. Dobb’s Journal subscriptions include full access to the CMP Developer Network web sites. To activate your account, register at http://www.ddj.com/registration/ using the web ALL ACCESS subscriber code located on your mailing label.

DR. DOBB’S JOURNAL (ISSN 1044-789X) is published monthly by CMP Media LLC., 600 Harrison Street, San Francisco, CA 94017; 415-947-6000. Periodicals Postage Paid at San Francisco and at additional mailing offices. SUBSCRIPTION: $34.95 for 1 year; $69.90 for 2 years. International orders must be prepaid. Payment may be made via Mastercard, Visa, or American Express; or via U.S. funds drawn on a U.S. bank. Canada and Mexico: $45.00 per year. All other foreign: $70.00 per year. U.K. subscribers contact Jill Sutcliffe at Parkway Gordon 01-49-1875-386. POSTMASTER: Send address changes to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80328-6188. Registered for GST as CMP Media LLC, GST #13288078, Customer #2116057, Agreement #40011901. INTERNATIONAL NEWSSTAND DISTRIBUTOR: Source Interlink International, 27500 Riverview Center Blvd., Suite 400, Bonita Springs, FL 34134, 239-949-4450. Entire contents © 2006 CMP Media LLC.

Dr. Dobb’s Journalis a registered trademark of CMP Media LLC. All rights reserved.

6

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

Dr.DobbsSOFTWARE

TOOLS FOR THE

PROFESSIONAL

J O U

R N A L

 

PROGRAMMER

 

 

 

 

 

 

P U B L I S H E R

E D I T O R - I N - C H I E F

Michael Goodman

 

Jonathan Erickson

E D I T O R I A L

MANAGING EDITOR

Deirdre Blake

SENIOR PRODUCTION EDITOR

Monica E. Berg

ASSOCIATE EDITOR

Della Wyser

COPY EDITOR

Amy Stephens

ART DIRECTOR

Margaret A. Anderson

SENIOR CONTRIBUTING EDITOR

Al Stevens

CONTRIBUTING EDITORS

Bruce Schneier, Ray Duncan, Jack Woehr, Jon Bentley,

Tim Kientzle, Gregory V. Wilson, Mark Nelson, Ed Nisley,

Jerry Pournelle, Dennis E. Shasha

EDITOR-AT-LARGE

Michael Swaine

PRODUCTION MANAGER

Stephanie Fung

I N T E R N E T O P E R A T I O N S

DIRECTOR

Michael Calderon

SENIOR WEB DEVELOPER

Steve Goyette

WEBMASTERS

Sean Coady, Joe Lucca

A U D I E N C E D E V E L O P M E N T

AUDIENCE DEVELOPMENT DIRECTOR

Kevin Regan

AUDIENCE DEVELOPMENT MANAGER

Karina Medina

AUDIENCE DEVELOPMENT ASSISTANT MANAGER

Shomari Hines

AUDIENCE DEVELOPMENT ASSISTANT

Andrea Abidor

M A R K E T I N G / A D V E R T I S I N G

ASSOCIATE PUBLISHER

Will Wise

SENIOR MANAGERS, MEDIA PROGRAMS see page 78

Pauline Beall, Michael Beasley, Cassandra Clark, Ron Cordek, Mike Kelleher, Andrew Mintz

MARKETING DIRECTOR

Jessica Marty

SENIOR ART DIRECTOR OF MARKETING

Carey Perez

DR. DOBB’S JOURNAL

2800 Campus Drive, San Mateo, CA 94403

650-513-4300. http://www.ddj.com/

CMP MEDIA LLC

Steve Weitzner President and CEO

John Day Executive Vice President and CFO

Jeff Patterson Executive Vice President, Corporate Sales

and Marketing

Bill Amstutz Senior Vice President, Audience Marketing

and Development

Mike Azzara Senior Vice President, Internet Business

Joseph Braue Senior Vice President, CMP Integrated

Marketing Solutions

Sandra Grayson Senior Vice President and General

Counsel

Anne Marie Miller Senior Vice President, Corporate Sales

Marie Myers Senior Vice President, Manufacturing

Alexandra Raine Senior Vice President, Communications

Kate Spellman Senior Vice President, Corporate

Marketing

Michael Zane Vice President, Audience Development

Robert Faletra President, Channel Group

Tony Keefe President, CMP Entertainment Media

Vicki Masseria President, CMP Healthcare Media

Philip Chapnick Senior Vice President, Group Director,

Applied Technologies Group

Paul Miller Senior Vice President, Group Director,

Electronics and Software Groups

Fritz Nelson Senior Vice President, Group Director,

Enterprise Group

Stephen Saunders Senior Vice President, Group

Director, Communications Group

Printed in the USA

American Buisness Press

8

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

 

E D I T O R I A L

Bits and Bytes…

must think so, as the company recently announced at least some of its upcoming server

 

f you believe everything you read, “64 bits” is this week’s bee’s knees of computing. Microsoft

 

Iofferings will run only on x86-compatible 64-bit processors. In fact, the ready availability of 64-

 

bit platforms is an important step forward. Still, that doesn’t necessarily mean it’s time to post your

 

32-bit system on Craigslist or eBay. There’s a time and place for everything, including 64 bits.

 

According to Microsoft’s Bob Kelly, the time and place for 64-bit systems is with performance-

 

critical applications such as Microsoft’s Exchange 12 e-mail server and its SQL Server database.

 

Other applications areas that benefit from 64-bit processors are complex engineering programs,

 

games, and anything that involves audio/video encoding. Anything, in other words, which takes

 

advantage of 64-bit arithmetic or requires addressing datasets beyond the 4-gigabyte constraint of

 

32-bit processors. A 64-bit processor can address up to 16 exabytes of memory— that’s 18-billion

 

gigabytes, and more than enough for most compute-intensive applications.

 

Of course, in the spirit of “there’s no such thing as a free lunch,” the memory used by a 64-bit

 

processor’s larger integers and/or pointers can also lead to more paging and disk I/O, thereby

 

degrading performance. This means that while some applications don’t need 64-bit integers

 

and/or pointers, they end up paying for them anyway.

 

In short, the fundamental difference between 32-bit and 64-bit processors isn’t necessarily the

 

speed of the processor, but the amount of data that can be processed that, at times, lends the

 

appearance of faster speed. That said, there are workarounds (some of which involve virtual

 

memory) that let you utilize 64-bit addressing on systems with less than 4 GB of memory, not to

 

mention that you can gain some performance pop by running a 64-bit processor in 32-bit mode.

 

The bottom line is that there’s still a lot to learn when it comes to effectively using next-generation

 

platforms, and the sooner we jump on them, the better prepared we will be for the future.

 

Speaking of the future, anyone who doesn’t think the wireless world has found a home in

 

academia hasn’t sat in on a college lecture class recently. What with everything from iPods and

 

Instant Messaging to e-mail and FreeCell, there’s a whole lot of something going on, most of

 

which seems to have little to do with learning.

 

That’s changing, however, with the advent of “Interactive Audience Response Systems,”

 

referred to simply as “clickers”— radio frequency (RF) sender/receiver devices that let students

 

and teachers interact in real time. A typical student/teacher scenario goes something like this:

 

Students buy or rent a clicker (somewhat akin to a TV remote-control device but with fewer

 

keys) at the beginning of the semester and register it with the school. Students can use a single

 

clicker in multiple classes. When instructors want feedback, students answer, and their responses

 

are instantly available and/or recorded for later review. Because many universities now have

 

wired lecture halls, tracking and storing clicker information for professors isn’t a big deal.

 

Alternatively, instructors can plug USB readers into their laptops and store the information locally.

 

With typical systems, up to 1000 student RF keypads can be used per receiver, with up to 82

 

sessions (channels) running at the same time in close proximity without interference.

 

There are a number of companies that offer this technology, including Turning Technologies

 

(http://www.turningtechnologies.com/) and eInstruction (http://www.einstruction.com/).

 

eInstruction claims its system is being used in 800 institutions in 50 states and 20 countries, with

 

more than a million devices in the hands of students.

 

Granted, audience response systems such as these have been around for a while. Early

 

implementations were based on infrared technology (IR), but RF offers clear advantages in range

 

and the ratio of sender units to the receiver. Additionally, some vendors offer “virtual clickers”—

 

soft keypads that run on PCs or PDAs that support all the features of standard clickers but with

 

the added functionality of text messaging, which lets students submit questions to teachers and

 

offers support for response to fill-in-the-blank and essay questions.

 

And on a sad note, John Vlissides, coauthor of the seminal book Design Patterns: Elements of

 

Reusable Object-Oriented Software, recently passed away. Along with his coauthors who made

 

up the “Gang of Four,” John was a recipient of the Dr. Dobb’s Journal Excellence in

 

Programming Award in 1998. He was also the author of several other books, most of which

 

focused on software design and patterns. For much of his career, John was a researcher at IBM’s

 

T.J. Watson Research Center. Prior to joining IBM Research, John was a postdoctoral scholar in

 

the Computer Systems Lab at Stanford University, where he codeveloped InterViews. Memories of

 

John have been put together on Ward Cunningham’s Wiki (http://c2.com/cgi/wiki?JohnVlissides/).

Jonathan Erickson editor-in-chief jerickson@ddj.com

10

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

L E T T E R S

 

 

 

 

 

 

T

 

,

 

 

S

 

 

O

 

 

 

 

P

 

 

 

 

S

 

 

 

 

B

 

 

 

 

B

 

 

 

 

 

O

 

 

 

 

 

 

D

 

 

 

 

 

 

 

2

 

S

 

T

 

N

2

E

C

Nuclear versus Wind Energy

Dear DDJ,

Luis de Sousa stated in “Letters” (DDJ, September 2005) that nuclear is not a clean energy due to mining, purifying, and disposing of nuclear wastes. Okay, as a 25year nuclear health physicist who dealt with nuclear waste issues in about 15 of the 48 contiguous states, I might agree with the waste issue because our hosed-up government can’t find anybody willing to give enough kickbacks to make some Senator or Representative rich enough to make the waste issues work.

However, to compare the first two issues— mining and purifying—I have to ask Luis how does he expect the windmills to be made? Will the same God that makes the wind provide the metal for the towers, the blades, the housings, and the generators; the metal for the cabling that will run for how far from the wind towers; the insulation for these same cables? (As an aside, the creation of insulation for cable is one of the most polluting manufacturing processes known to man. And the generators, breakers, and switches of a sea-mounted windpowered farm filled with PCBs and other chemicals is just scary!) How about the environmental impact on the sea bed where his “wind-generators” will be placed? I believe when we start comparing the manufacturing of the materials that are used, well, the scales are pretty much balanced.

When nuclear (not “nukular” as GWB would say) people discuss the cleanliness of nuclear power, they are talking about the actual lack of emissions of any pollutants into the atmosphere: I mean sulfuric acid, sulfur dioxide, carbon monoxide, carbon dioxide, hydroand hyperchloric acids, and the like, that come from burning fossil fuels. Granted, wind has great potential, but if you have driven through northern New Mexico and observed the miles and miles of wind-powered generators (most of them setting idle, by the way, where land potential is surrendered to make room for 50+ foot wind-turbine blades by the score), well, I cannot consider wind as a viable option,

unless we place a few wind-turbines around the inner-belt of the GW parkway in Washington, D.C. When Congress is in session, I am certain gigawatts of electricity could easily be generated by the hot air produced.

Ronald R. Goodwin goodwir2@nationwide.com

Piracy versus Privacy

Dear DDJ,

It is reported that Mr. Yale spent his entire life attempting to make a lock he himself could not pick. He never succeeded. Reading Dennis Shasha and Michael Rabin’s “Preventing Piracy While Preserving Privacy” (DDJ, October 2005) in the light of this insight leads me to several questions, none of them included in the FAQ:

1.The users of my software operate in remote parts of the globe, where Internet access is unavailable (or prohibitively expensive). Weekly access to your servers is out of the question. Also, I have a missioncritical WinXP PC here on my desk that has never been infected by a virus or adware or spyware trojan. How is this possible, given the notorious fragility of Microsoft software? I never let it on the Internet for any reason. I often transfer files on the local LAN to this Mac, but only through a physical A/B switch that disconnects the Internet when the PC is connected. Who cares about privacy if our mission-critical systems won’t work at all under your system?

2.Speaking of the notorious fragility of Microsoft products and the comparable (adjusted for market penetration) fragility of UNIX-based products, how do you propose to implement a “Supervising Program” that cannot be remotely cracked (to say nothing of local attacks)?

3.What happens if a clever pirate distributes a freeware program (no rights management needed) that runs under your SP and acts as a surrogate SP to run the protected content one step removed from the “Content Identifying” processes of the actual SP? For example, this rogue crypto-SP can process sound files, but instead of sending the sound waves out the speaker port where the real SP can measure the melodic content, it sends it out to an iPod on the USB bus? Everybody knows the iPod has no direct Internet connection to run your verification protocols. Or else to a rogue USB- to-speaker device sold on the black market? It is arbitrarily difficult for your SP to know it is sound content going out that port.

4.Speaking of a surrogate SP running under the real SP, given that your protocols must be open, how do you prevent rogue SPs from swamping the servers with bogus TTIDs?

5.Who is qualified to upload a CII signature to your “Superfingerprint” server? What happens if a “vendor” tries to upload

a fingerprint that matches an existing fingerprint? In the case of music, I can imagine something keyed to melodic lines matching only if the music is, in fact, the same tune (although much modern “music” is, in fact, tuneless), but I can also imagine a clever programmer designing his software to have a signature that matches the signature of the program he wishes to bore.

These questions arose in just the few minutes it took me to read your article. Crackers have a lot more time to probe for weaknesses. Do you really think your system is any more secure than the existing software-based protection mechanisms?

I think the iPod phenomenon is a much more robust mechanism for reducing the market cost of piracy: The proportion of paid-for music to pirate copies has improved significantly since the iPod came to market. Furthermore, the remaining pirate copies do not represent nearly as great a loss to the content-creation industry as they want you to believe because most of those “librarians and 12-year-old kids” wouldn’t buy it anyway.

I was there when Dan Sokol came to the HomeBrew Computer Club with 10 copies of Altair Basic (which, as he pointed out, contained no copyright notice anywhere and was, therefore, legally in the public domain), and I watched over the years as those pirate copies were multiplied into thousands of local electronics businesses, so that when they needed a legitimate copy of Basic, they bought the version they knew— from Microsoft! My own Basic was too cheap to pirate, so it never reached the same market penetration. The result: Bill Gates is rich and I am not.

Tom Pittman tpittman@ittybittycomputers.com

Dennis and Michael respond: Thanks, Tom.

1.Superfingerprint downloads and callups can occur through intermediaries. So there is no need for a direct connection to the Internet. The fidelity of Superfingerprints is certainly an issue and will require substantial care.

2.The article refers to the Lampson-style boot strategy to assure the integrity of the Supervising Program. Trusted hardware is a part of this solution.

3.Content going out to unprotected devices may not be detected. We agree.

4.There will be a notion of hash-cash to prevent denial-of-service attacks.

5.When Superfingerprints are uploaded, they must be checked against existing ones to ensure that an author’s rights are protected. We will also provide a service to register freeware, so Superfingerprints don’t appear that prevent freeware from running.

DDJ

12

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

D R . E C C O ’ S O M N I H E U R I S T C O R N E R

Proteins for Fun and Profit

Dennis E. Shasha

Pulling a card out of the inside pocket of his well-tailored, dark suit, the professor presented it to Ecco. It read

Ming Thomas, PhD, protein industrialist. “I’ve come with a project,” Thomas began after greeting us and taking a seat. “In the early days of molecular biology, people asserted — with the authority that only uncertainty could inspire — that every gene generates one protein.

“Now it seems that there are at least a few genes that produce thousands of proteins. Let me explain how.

“A gene is a sequence of DNA, but, in higher organisms, that DNA alternates between strings that in fact produce portions of proteins (called ‘exons’) and strings that don’t (called ‘introns’). Thus, a gene sequence has the form E1 I1 E2 I2 E3 I3… where the Es represent exons and the Is represent introns.

“Genes can produce many proteins because any (not necessarily consecutive) subsequence of exons can form a protein. For example, E2 E4 E5 can form a protein as can E1 E2 E7, but E6 E4 E5 cannot because the ordering E6 E4 violates the order of the original exon sequence. E3 E3 E5 cannot form a protein either because an exon at a given position cannot be repeated.

“When manufacturing proteins at industrial scale, we can handle up to seven exons. Our expense is directly related to the total length of those exons. We hope you can minimize our expense.

“Our first client wants us to generate 15 hydrophobic proteins that are alanine heavy. They believe these will act like sticky balls floating on top of water allowing translucent water sculpture. Think Los Angeles swimming pools. We want help designing the exons in order to minimize their size. I know you like warmups, so here is one. Suppose we could use only three exons and we wanted to generate the following proteins (where each amino acid is represented by a single letter; for example, Alanine is A):

GA

GAGAS

GAS

RAGA

RAGAS

Dennis, a professor of computer science at New York University, is the author of four puzzle books. He can be contacted at DrEcco@ddj.com.

What would the exons have to be to generate these proteins, trying to minimize the total length of the exons?”

Solution to Warm-Up:

The following three exons could do this, having a total length of seven.

RA

GA

GAS

“Just a minute,” Ecco interrupted turning to his 17-year-old niece Liane, who had been listening in. “Liane, isn’t the biology here somewhat more complicated?” “Well, yes, but probably not in an essential way,” Liane responded. “DNA doesn’t literally consist of amino acids, but rather, an alphabet of ‘nucleotides’ whose nonoverlapping consecutive triplets are translated to amino acids. So, when Dr. Thomas speaks of minimizing the length of the exons, he formally means minimizing the number of nucleotides. Provided each exon’s length is a multiple of three, however, the problems are mathematically identical because minimizing the number of amino acids produced by the exons minimizes the number of nu-

cleotides in the exons themselves.”

“I couldn’t have explained this better myself,” said Thomas visibly impressed.

“For many reasons, we want each exon to generate full amino acids, so each exon’s length is in fact a multiple of three. Therefore, we can view each exon as consisting of the amino acid string it generates. Now do you understand the warm-up?”

“Sure,” said 11-year-old Tyler.

“The protein RAGAS is generated from the RA and GAS exons, for example. RAGA is generated from the first two exons and GAGAS from the last two. So give us your big challenge.”

Ming Thomas chuckled. “May I hire your whole family, Dr. Ecco?”

“We’re all confirmed puzzle freaks,” Ecco responded with a smile. “Do tell us which proteins you want.”

“Here they are,” said Thomas. “Remember that you are allowed seven exons and we want to minimize the total length (in amino acids) of those exons:

AGPA

APASAG

APASARAGPA

APASARASA

APASARASAPA

CAAPASAGASAPA

CAAPASARAG

CAAPASARPA

CARAPAPAS

CARAPAPASAGASA

CARAPAPASPA

CARAPASA

RAPAPASAGPA

RAPAPASASAPA

RAPASA

1.Can you find an encoding into exons whose total amino acid length is 20 or less?

Liane and Tyler worked this out.

“Very nice,” said Thomas. “That’s better than the solution we had thought of. Very nice work.

“Here is a follow-up question: One of our biochemists says he can manipulate up to 11 exons provided each produces two amino acids. In that case, what is the smallest total amino acid length of exons to create the following 15 proteins?

BAPAFADAFACA

BAPAGAPADA

RABAPAGADAFACA

RASA

RASAGAPAFAFACA

RASATABAPAGAPAFACA

RASATABAPAGAPAFAFA

RATAGAPAFADAFA

SABAPAFADACA

SAPADA

SAPAPAFADAFACA

SATABAGAPADAFA

SATABAPAGADAFACA

SATAPAGAPAFA

TABACA

Ecco helped his nephew and niece solve the problem this time. When Thomas saw the solution, he nodded and said, “Excellent. We have a long consulting arrangement ahead of us.” 2. Please give it a shot.

Ecco turned to the children after Thomas left: “The longest protein in Dr. Thomas’s last problem had a length of only 18. It is therefore conceivable that nine two-amino-acid exons would have been sufficient. Our solution required 11. Could we have done better?”

3. What do you think?

For the solution to last month’s puzzle, see page 70.

DDJ

14

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

Dr. Dobb’s

SECTION

MAINANEWS News & Views

DR. DOBB’S JOURNAL

February 1, 2006

IBM Previews Next-Generation DB2 Database

IBM has unveiled details about Viper, its next-generation DB2 database that is designed to help manage and access data across service-oriented architectures (http:// www.ibm.com/db2/xml/). Viper will be the first database with both native XML data management and relational data capability. Scheduled for release in 2006, DB2 Viper will supposedly be able to seamlessly manage both conventional relational data and XML data without requiring the XML data to be reformatted or placed into a large object within the database. DB2 Viper also will simultaneously handle range partitioning, multidimensional clustering, and hashing, and provide XQuery support.

Smart Vehicles Show Off

Among the technology demonstrations presented at the 12th World Congress on Intelligent Transport Systems (ITS) (http:// www.itsworldcongress.org/) were those involving: Vehicle-Infrastructure Integration (VII) technology, in which “smart” roads with roadside antennas wirelessly communicated information to cars equipped with on-board units — the communication network provides information about travel times and about warnings and locations of work zones or traffic incidents to the driver; Integrated Collision Warning Systems, in which conference attendees rode transit buses fitted with a front and side collision warning system designed for use on both highways and in dense urban environments; Automated Bus Rapid Transit Technology, in which buses were fitted with sensors, actuators, and computerbased processors that let them perform automated lane maneuvers and precisely dock at boarding platforms; and Smart Intersections, in which radar, GPS, and sensors were used to track the position of vehicles approaching intersections and activate warning signs. ITS is an organization of international researchers, industry professionals, and government officials developing advanced transportation technologies and deployment activities.

Microsoft Opens File Formats

Microsoft has announced that it will open up and submit its file format technology for its Office produces —Word, PowerPoint, and Excel — to the Ecma International standards body. In turn, Ecma will develop and make available documenta-

tion of those formats. In addition, Microsoft will make available tools to enable old documents to make use of the open standard format.

Report Says Innovation Is Possible

In a study entitled “Innovation, R&D and Offshoring,” University of California at Berkeley researchers Dwight Jaffee and Ashok Bardhan concluded that technological innovation — even if it takes place in emerging international markets — will not spell economic doom. According to their study (http://repositories.cdlib.org/ iber/fcreue/reports/1005/), new jobs and economic growth will result in the U.S., particularly in the Silicon Valley. Jaffee and Bardhan found that many large U.S. firms are increasingly sending R&D activities offshore by setting up affiliated, intrafirm R&D centers abroad. Their research also shows that smaller firms generally conduct their research in the U.S.— and tend to produce more innovation. At the same time, the authors found that the U.S. market could benefit from the geographical dispersion of innovation and research to India, China and other transitioning countries.

Iris Recognition

Is an Eye Opener

Researchers at the University of Bath have developed a biometric iris recognition system that uses the colored part of the eye to validate a person’s identity (http:// www.bath.ac.uk/elec-eng/pages/sipg/ irisweb/). According to Professor Don Monro of the Department of Electronic and Electrical Engineering, the algorithm at the heart of the system has produced 100 percent accuracy in initial trials. Monro and his team are currently road testing the technology using a specially constructed database containing thousands of iris images collected from students and colleagues at the university. Iris recognition, which is regarded as the most accurate biometric recognition technology, works by “unwrapping” a digital image of a person’s iris and creating a unique encrypted “barcode” that is stored in a database. The images are captured using a special camera and an infrared light source that helps get over problems caused by shadows and competing light sources. Hundreds of images can be captured in a few minutes, and the team selected 20 from each eye from each vol-

unteer. Monro hopes to build a database with 16,000 iris images.

Sun Announces Postgres Support, ZFS Filesystem

Sun Microsystems will distribute the Postgres database with its Solaris 10 operating system. At the same time, the company announced integration of Solaris ZFS, a 128bit filesystem with error detection and correction capabilities, into OpenSolaris. Finally, Sun announced plans to integrate Solaris Containers for Linux applications, which lets companies run Red Hat binaries unmodified in Containers on Solaris 10 into OpenSolaris. The Solaris ZFS filesystem supports self-healing data through advanced error detection and correction, task automation that simplifies storage management — in some cases reducing task times from hours to seconds — and builtin storage virtualization that eliminates the complexity of a volume manager.

Financial Industry

Is Always a Target

In a recent study entitled “2005 Attack Trends: Beyond The Numbers,” security expert Bruce Schneier reports that criminals who are motivated by money are generally better funded, less risk-averse, and more tenacious than run-of-the-mill intruders who are in it for thrills (http://www.counterpane

.com/cgi-bin/attack-trends2.cgi). Schneier also pointed out that, although the financial industry ranks second highest in attacks, it is actually the most vulnerable to criminal activity. Of the 13 major vertical markets tracked by Counterpane (the security company Schneier founded), approximately 50 percent of all targeted scans detected by Counterpane occurred within the financial industry. According to Schneier, damaging attacks such as Trojan viruses and bot networks are expected to increase. All categories of organizations are at risk, but the financial industry is expected to remain the highest risk vertical in the near term.

Security Threats: Cross-Platform Software

For the first time, the SANS Institute has included cross-platform applications as targets in its annual list of top Internet security threats (http://www.sans.org/top20/). The list includes backup programs, media players, antivirus software, PHP-based applications, and database software, among others.

16

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

Multiplatform Porting to 64 Bits

Up-front planning is worth the effort

BRAD MARTIN, ANITA RETTINGER, AND JASMIT SINGH

One project we were recently involved in was the port of a large 32-bit application, which supported 11 platforms to a 64-bit environment. The number of lines of code in this application exceeded 300,000 lines. Considering that

the 32-bit application had parts developed several years ago, there was every likelihood that the code had been modified by a variety of developers. For this and other reasons, we suspected that, among other problems, type mismatches that cause problems for a 64-bit port were likely introduced as modules were added or removed over time. We ported the 32-bit application to 64-bit to take advantage of the benefits of 64-bit technology— large file support, large memory support, and 64-bit computation, among other features. Our overall approach was an iterative one that alternated between zooming in on detailed issues such as byte order and refining compiler flags, to stepping back to look at global issues, such as ANSI compliance and future portability of source-code base. Our first step was to research 64-bit resources to learn about each of the 11 operating system’s compiler switches, memory models, and coding considerations. To define our starting point, we turned on the compiler warnings for one platform, ran a first build, and examined the build log’s messages. With these initial builds and later use of tools such as Parasoft’s Insure++ (http://www.parasoft.com/), lint, and native debuggers, we developed a road map of the issues we would encounter. From there, we proceeded to perform a complete inventory of the source code and examine every build configuration.

After initial code modifications, debug sessions, and passes through build messages, we had enough information to sort out and prioritize realistic milestones and the specific tasks required to get there. We reached a significant milestone when we had a running application with enough basic functionality that it could be debugged by running it through our automated test suite, which consists of backward compatibility tests in addition to new tests built to exercise 64-bit features.

If you have several 64-bit platforms as part of your conversion project, you might be tempted to work on one platform at a time. Once the application is running properly on the first platform, you might move on to the next platform, and so on. However, we found significant advantages to working on all platforms at the same time because:

The authors are senior software engineers for Visual Numerics. They can be contacted at http://www.vni.com/.

Each of the compilers provided different information in its warnings, and looking at the errors from several compilers can help to pinpoint problem areas.

Errors behave differently on different platforms. The same problem might cause a crash on one platform and appear to run successfully on another.

“Some application requirements call for binary data or files to work with both 64-bit and 32-bit applications”

A final consideration in approaching this project was to plan ahead for time required for the final release testing phase. Because our newly modified code base is shared across multiple 32-bit and 64-bit platforms, each 32-bit platform would need to be retested as thoroughly as our newly ported platforms, thereby doubling testing time and resources.

Cross-Platform Issues

There are a number of issues, ranging from compiler warnings to reading/writing binary data, that you can face when porting 32-bit applications that run on multiple 64-bit operating systems. Luckily, compilers can assist in determining 64-bit porting issues. Set the warning flags of the compilers to the strictest level on all platforms, paying close attention to warnings that indicate data truncation or assignment of 64-bit data to 32-bit data. However, one problem with compiler warnings is that turning on stricter warning levels can lead to an overwhelming number of warnings, many of which were automatically resolved by the compiler. The problem is that major warnings are buried within the mass of minor warnings, with no easy way to distinguish between the two. To resolve this issue, we enabled the warnings on multiple platforms and performed concurrent builds. This helped because different compilers give different warnings with different levels of detail. We then filtered the warnings using information from multiple compilers and were able to determine which warnings needed to be fixed.

20

Dr. Dobb’s Journal, February 2006

http://www.ddj.com

(continued from page 20)

Some application requirements call for binary data or files to work with both 64-bit and 32-bit applications. In these situations, you have to examine your binary format for issues resulting from larger longs and pointers. This may require modifications to your read/write functions to convert sizes and handle any Littleor Big-endian issues for multiple platforms. To get the correct machine endianess, the larger data sizes in 64-bit applications require extended byte swapping. For example, a 32-bit long:

Big Endian = (B0, B1, B2, B3)

can be converted to:

Little Endian = (B3, B2, B1, B0)

while a 64-bit long:

Big Endian = (B0, B1, B2, B3, B4, B5, B6, B7)

is converted to:

Little Endian = (B7, B6, B5, B4, B3, B2, B1, B0).

Most compilers will find mismatched types and correct them during the build. This is true for simple assignments as well as most parameters passed to other functions. The real problems lay in the integer-long-pointer mismatches that are invisible to the compiler at compile time, or when an assumption the compiler makes at compile time is what produces a mismatch. The former concerns pointer arguments and function pointers, while the latter primarily concerns function prototypes.

Passing integer and long pointers as arguments to functions can cause problems if the pointers are then dereferenced as a different, incompatible type. These situations are not an issue in 32-bit code because integers and longs are interchangeable. However, in 64-bit code, these situations result in runtime errors because of the inherent flexibility of pointers. Most compilers as-

In a 32-bit system, the structure would look like:

4 Bytes

4 Bytes

4 Bytes

4 Bytes

4 Bytes

Integer

Long

Natural Boundary

In a 64-bit system, the structure would look like:

8 Bytes

4 Bytes

4 Bytes

4 Bytes

4 Bytes

Integer Padding

Long

Natural Boundary

Figure 1: Structure alignment in 32-bit and 64-bit systems.

sume that what you are doing is what you intended to do, and quietly allow it unless you can enable additional warning messages. It is only during runtime that the problems surface.

Listing One, for example, compiles without warnings on both Solaris and AIX (Forte7, VAC 6) in both 32-bit and 64-bit modes. However, the 64-bit version prints the incorrect value when run. While these problems may be easy to find in a short example, it may be more difficult in much larger code bases. This sort of problem might be hidden in real-world code and most compilers will not find it.

Listing One works properly when built as a 64-bit executable on a Little-endian machine because the value of arg is entirely contained within the long’s four least-significant bytes. However, even on Little-endian x86 machines, the 64-bit version produces an error during runtime when the value of arg exceeds its four least-significant bytes.

With function pointers, the compiler has no information about which function will be called, so it cannot correct or warn you about type mismatches that might exist. The argument and return types of all functions called via a particular function pointer should agree. If that is not possible, you may have to provide separate cases at the point at which the function is called to make the proper typecasts of the arguments and return values.

The second issue concerns implicit function declarations. If you do not provide a prototype for each function that your code calls, the compiler makes assumptions about them. Variations of the compiler warning “Implicit function declaration: assuming extern returning int” are usually inconsequential in 32-bit builds. However, in 64-bit builds, the assumption of an integer return value can cause real problems when the function returns either a long or a pointer (malloc, for example). To eliminate the need for the compiler to make assumptions, make sure that all required system header files are included and provide prototypes for your own external functions.

Hidden Issues

There are, of course, issues that may not be readily apparent at the beginning of the project. For instance, in 64-bit applications, longs and pointers are larger, which also increases the size of a structure containing these data types. The layout of your structure elements determines how much space is required by the structure. For example, a structure that contains an integer followed by a long in a 32-bit application is 8 bytes, but a 64-bit application adds 4 bytes of padding to the first element of the structure to align the second element on its natural boundary; see Figure 1.

To minimize this padding, reorder the data structure elements from largest to smallest. However, if data structure elements are accessed as byte streams, you need to change your code logic to adjust for the new order of elements in the data structure.

For cases where reordering the data structures is not practical and the data structure’s elements are accessed as a byte stream, you need to account for padding. Our solution for these cases was to implement a helper function that eliminates the padding from the data structure before writing to the byte stream. A side benefit to this solution was that no changes were required on the reader side; see Listing Two.

Arrays

64-bit long type arrays and arrays within structures will not only hold larger values than their 32-bit equivalents, but they may also hold more elements. Consider that 4-byte variables previously used to define array boundaries and allocate array sizes may also need to be converted to longs. (For help in determining whether existing long arrays should be reverted to integer type for better performance in your 64-bit application, see http://developers

.sun.com/prodtech/cc/articles/ILP32toLP64Issues.html.)

22

Dr. Dobb’s Journal, February 2006

http://www.ddj.com