Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Dr.Dobb's journal.2005.12

.PDF
Скачиваний:
25
Добавлен:
23.08.2013
Размер:
9.06 Mб
Скачать

Rapid development with robust objects

Lightning speed with a multidimensional engine

Easy database administration

Massive scalability on minimal hardware

Multidimensional Database

Enables Rapid Development.

Caché is the first multidimensional database for transaction processing and real-time analytics. Its post-relational technology combines robust objects and robust SQL, thus eliminating object-relational mapping. It delivers massive scalability on minimal hardware, requires little administration, and incorporates a rapid application development environment.

These innovations mean faster time-to-market, lower cost of operations, and higher application performance. We back these claims with this money-back guarantee: Buy Caché for new application development, and for up to one year you can return the license for a full refund if you are unhappy for any reason.* Caché is available for Unix, Linux, Windows, Mac OS X, and OpenVMS – and it's deployed on more than 100,000 systems ranging from two to over 50,000 users. We are InterSystems, a global software company with a track record of innovation for more than 25 years.

Try an innovative database for free: Download a fully functional, non-expiring copy of Caché, or request it on CD, at www.InterSystems.com/Cache12N

*Read about our money-back guarantee at the web page shown above.

©2005 InterSystems Corporation. All rights reserved. InterSystems Caché is a registered trademark of InterSystems Corporation. 10-05 CacheInno12DrDo

D R . E C C O ’ S O M N I H E U R I S T C O R N E R

The Box Chip Game

Dennis E. Shasha

Dr. Ecco seemed to be in a mood to muse on a cool fall day in his McDougal Street apartment.

“Sometimes a problem comes along for which a seemingly insignificant twist completely changes its character,” he said. “This holds particularly in games of chance. You remember our friend Jimmy Casino, don’t you? Well, he invented a new game. Want to hear it?”

“I’d be delighted,” I said thinking fondly of the baby-faced huckster who had visited us a few months earlier.

Ecco began: “You have some even number n of chips, each having a different color. You put one in each box. Your adversary rearranges the boxes behind a curtain. You then rearrange them further once they’re on your side of the curtain. (At the end of the game, you will be able to verify that there is a chip of each color in the boxes, so you can be sure that the adversary does no more than rearrange the boxes.) The net effect is that you can assume the rearrangement is entirely random.

“For each color, you guess n/2 boxes where that color might be. You make all

Dennis, a professor of computer science at New York University, is the author of four puzzle books: The Puzzling Adventures of Dr. Ecco (Dover, 1998); Codes, Puzzles, and Conspiracy (Freeman 1992, reprinted by Dover in 2004 as Dr. Ecco: Mathematical Detective); and recently Dr. Ecco’s Cyberpuzzles (W.W. Norton, 2002); and Puzzling Adventures (W.W. Norton, 2005). With Philippe Bonnet, he has written Database Tuning: Principles, Experiments, and Troubleshooting Techniques (2002, Morgan Kaufmann). With Cathy Lazere, he wrote Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists (1995, Copernicus/Springer). He can be contacted at DrEcco@ddj.com.

guesses about all colors before any box is opened. If you are correct about every color, you win. Otherwise, you lose. It might seem that your chances of winning are (1/2)n, but you can do much better.

“Warm-Up: Suppose there are two chips — blue and red — and two boxes. So each guess is about just one box. Which guesses can you make so your chances of being right in both guesses is 1/2?”

“That one is easy,” I said. “Guess red in the left box and blue in the right box. You’ll either be both correct or both wrong.”

“Right,” said Ecco. “Now let’s say there are four chips (n= 4) whose colors are black, white, red, and green. You are allowed to guess two boxes for each chip. Number the boxes 1, 2, 3, 4. To start with, your guesses have to be a simple list of box numbers for each color. For example:

Black guess: 1, 2

White guess: 2, 4

Red guess: 1, 3

Green guess: 3, 4

“1. For four chips and two guesses per color, can you figure out a way to win at least 1/6 of the time assuming all rearrangements are equally likely? (The example just given might not do this.)

“2. What if there are six chips and three guesses per color?

“3. How does this probability generalize for n chips (n even) and n/2 guesses per color?

“Now here comes the small twist. Under the old rules, for each color, you provide n/2 boxes as guesses. Under the new rules, for each color, you provide an initial guess and a procedure for determining each next guess based on the actual color found by the previous guess.

“Example for black in the case of four colors:

Start with box 1;

case result of first step is Black, then win

Red, then try 2 White then try 3 Green then 4

“For convenience, we will abbreviate this as follows: black guess: 1; black -> win, red -> 2, white -> 3, green -> 4

“All initial guesses and procedures are provided before any checking occurs and the procedures may not share information. If every color is found in n/2 or fewer guesses, you win. Make sense?”

“I think I understand,” I said. “We might conceptualize the situation as follows: Each color is represented by an agent who follows a procedure like the one above, possibly different procedures for different agents. A given agent enters the room with the boxes, looks inside the boxes as dictated by his procedure but doesn’t otherwise disturb the boxes, then leaves the scene without being able to communicate with any other agents.”

“Exactly,” said Ecco. “Continuing with your scenario: A judge determines whether that ‘procedural agent’ has won or not. If all agents win, then you (the person betting) win. Otherwise, the house wins. It may seem surprising but you can do a lot better under these new rules.

“4. How well can you do in the case of four chips and two guesses under the procedural agent assumption? (Hint: You can win over 40 percent of the time.)

“5. How well can you do if you have six chips and three guesses?”

For the solution to last month’s puzzle, see page 78.

DDJ

10

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

Dr. Dobb’s

SECTION

MAINANEWS News & Views

DR. DOBB’S JOURNAL

December 1, 2005

New Learning Algorithm

Researchers at Cornell University and Tel Aviv University have devised a technique for scanning text in a variety of languages, then autonomously learning its complex syntax. This information can then be used to generate new and meaningful sentences. The Automatic Distillation of Structure (ADIOS) algorithm, which relies on a statistical method for pattern extraction and on structured generalization, discovers the patterns by repeatedly aligning sentences and looking for overlapping parts (http:// adios.tau.ac.il/). The technique has implications for speech recognition and naturallanguage engineering, genomics, and proteomics. The algorithm was developed by David Horn and Eytan Ruppin, professors at Tel Aviv University, with Ph.D. student Zach Solan.

Infrared Specs Released

The IrSimple Protocol and Profile Specifications, released by the Infrared Data Association (http://www.irda.org/), were designed for fast, wireless communication between mobile devices and digital home appliances. Data rates up to 16 Mbps (VFIR) are currently available with 100 Mbps (UFIR) under development. For example, the specs will enable mobile devices such as camera phones to instantly connect and wirelessly transmit digital images to similarly enabled TVs, monitors, projectors, and photo kiosks. The organization claims that there are more than 750 million IrDAenabled devices in the global market today.

Optical Storage Spec Updated

The Optical Storage Technology Association has released its Universal Disc Format (UDF) Rev. 2.60 filesystem specification (http://www.osta.org/specs/). A major addition in Rev. 2.60 is the addition of a new Pseudo OverWrite (POW) mechanism that supports sequential recording on new types of write-once discs and drives such as BD- R (Blu-Ray Disc-Write Once). The special logical overwrite function of the POW mechanism that enables write-once media to behave more like a rewritable disc was developed in parallel by the Blu-Ray Disc Association (BDA). Rev. 2.60 with POW also increases disc compatibility between consumer video recorders and computer systems, and allows use of the Metadata File to locate metadata in a logically contiguous manner for increased efficiency. UDF is a filesystem first defined by OSTA

in 1997 to support transfer of Magneto-Op- tical discs and files between different computer systems. The specification, based on the ECMA-167/ISO 13346 Standard, is intended for developers planning to implement UDF to enable disc and file interchange among different operating systems.

GPL Update Underway

The Free Software Foundation (FSF) has launched the GPL Version 3 Development and Publicity Project (http://www.fsf.org/ news/gplv3/), with the goal of updating the General Public License (GPL). The current version of the license was written in 1991. Among the issues expected to be addressed are distribution and licensing. Free Software Foundation members have been invited to participate in various committees, starting with advisory committees that will specify the objectives and parameters for participation in GPLv3. Guidelines are expected to be released by December 2005, with a draft ready in early 2006.

International

Programming Challenge Held

More than 70 countries were represented in this year’s International Olympiad on Informatics, an annual world computing championship contest for precollegiate students held this year in Nowy Sacz, Poland. The competition consisted of two five-hour programming sessions, each with three programming challenges. The four USA Computing Olympiad (http://www.usaco.org/) team members —Alex Schwendner, Eric Price, John Pardon, and Richard McCutchen — all won gold medals. Additionally, Eric Price received a special Grand Award for his perfect score of 600. Poland’s Filip Wolski was crowned as the World Champion. The USACO is sponsored by Equinix, USENIX, SANS, the ACM, and IBM.

NASA Honors Two Projects

The National Science and Space Administration has announced that two of its teams will receive the agency’s Software of the Year Award. A team from NASA’s Goddard Space Flight Center was recognized for its Land Information System Software 4.0, while a team from the Jet Propulsion Laboratory was acknowledged for its Autonomous Sciencecraft Experiment (ASE) software. LIS (http://lis.gsfc.nasa.gov/) is a land surface modeling and data assimilation system that predicts water and energy cycles such as

runoff, evaporation from plants and soil, and heat storage in the ground. ASE (http:// ase.jpl.nasa.gov/) makes it possible for autonomous spacecrafts to increase their science return by two orders of magnitude via onboard decision making. ASE is used for detecting and tracking environmental events on Earth, such as volcanic eruptions, floods, and wild fires.

CA Releases Software Patents

Computer Associates International (http:// ca.com/) has released 14 of its patents to individuals and groups working on opensource software. In the process, CA joined IBM in encouraging an industry-wide “patent commons” in which patents are pledged royalty free to further innovation. The patents covered by CA’s move generally include: Application development and modeling that automates translation between programming languages; business intelligence and analytics; systems management and storage-management solutions that provide intelligent process controls to maximize system performance and storage utilization; and network management and security tools that enhance visualization of network traffic patterns and congestion, selectively capture and filter network traffic, and provide granular session-control capabilities.

IBM Launches Transition-to-Teaching Program

To help address a nationwide shortage of math and science teachers, IBM has launched a Transition-to-Teaching program that enables as many as 100 U.S. employees to gain teacher certification. IBM will reimburse participants up to $15,000 for tuition and stipends while they student teach, as well as provide online mentoring and other support services in conjunction with partner colleges, universities, and school districts. According to the U.S. Department of Labor, more than 260,000 new math and science teachers will be needed by the 2008 –2009 school year. The IBM pilot will be operational in January 2006 in New York, North Carolina, and other locations where IBM has a significant population. Employees will need management approval and must fulfill requirements such as 10 years of service with IBM, a bachelor’s degree in math or science or a higher degree in a related field, and some experience teaching, tutoring, or volunteering in a school or other children’s program.

12

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

iðÌÀÕiÀÊVÌÊ`ÊÃiÛiÀ>ÞÊ>>iÀÊÊ*Ê-vÊÃÊ>ÀÃÌiÀi`ÊÌÀ>`i`ÊÀi}ÃÊ>>ÀÊ>ÀiÊÌÀ>`i}*Ê-iÊ`ÊÌ*Ê>-°Ê*Ê-^ÊÓääxÊ

TCPUHQTOO[QWTWCRRNKECVKQPUYKVJJHTGGVQQNUCPFEWUVQOK\CVKQPCFXKEGK HTQOGZRGTVUCPFRGGTUUCVUFP UCR EQO CMGG[QWTWEQFGHTQOYKORGFVQ VTWN[[RKORGFGYKVJJEWUVQOQFGXGNQROGPVHQTWOU¢UCORNGNEQFGFQYPNQCFU¢GDNQIU CPFPVJGHTGG±YJGGNKPIGKPRWVWQHQXGT ¢ KPUKFGTU VÊUUC VQVCNRCEMCIGGVJCVCOCMGUUFP UCR EQOQPGUYGGVVTKFG

°°

Table Patterns and

Changing Data

Dealing with history in relational databases

TODD SCHRAML

With the financial reporting requirements of the SarbanesOxley Act, businesses have good reason to be concerned about the past. Enhancements to existing applications and changes that incorporate auditing needs and

point-in-time recoverability in new application development have moved from back-burner wish-lists to mission-critical, immediate necessities. Beyond backups and logs at the DBMS level, there are numerous options when considering history-preserving data structure configurations. For instance, you can add start/stop timestamps or currency flags to tables in various fashions. Sometimes you can add dates to unique indexes. Best practice necessitates mapping historical needs to chosen historical data structures so that they work well together.

But problems arise for both applications and users if historical data structures do not align with actual usage of the contained history. Historical data can easily clutter a database, ultimately slowing down performance, or forcing jumps through SQL-hoops to access the correct active subset of rows in a table. There are any number of permutations that address these problems. The approach I present here offers five structural archetypes that involve the fusion of time and data inside database tables. I also examine the nature of temporal data structure patterns, and provide guidelines for establishing a history-management strategy that can be leveraged across an organization.

Regardless of the implemented data structure, a few basic auditsupporting attributes should apply to all tables. These attributes (serving as basic columns) encompass tracking a row’s creation, and timing a row’s last change. I recommend incorporating four standard columns on virtually every table. The four standard column titles would be some variation on “Row-Created Timestamp,” “Row-Creator ID,” “Row Last Updated Timestamp,” and “Row Last Updated ID.” The last two columns are unnecessary in situations warranting only inserts with no row updates. Thus, even on tables that only contain current data, the proper population of these four columns (along with a comprehensive set of database backups) should provide a considerable amount of point-in-time recoverability.

The temporal patterns (or archetype structures) I examine here generally echo the standardized approaches used for handling time as found inside existing business-intelligence-related (BI) multidimensional data-management practices. A variant of these BI ritual practices applies these data-management practices across all kinds of database tables, especially tables used

Todd is a data architect at Innovative Health Strategies. He can be reached at tschraml@ihsiq.com.

for supporting operational applications. Multidimensional practice (like that found in star schemas) simplifies the collection of data elements into two kinds of table structures. Items are grouped together into:

Fact tables that contain all numeric data intended for analysis.

Dimension tables that store the various text data items available for selecting, grouping, or ordering the fact statistics.

“Top priorities should include evolving a strategy that identifies the preferred historical archetypes”

Pointers inside the fact table rows provide the proper link to dimension rows. In many respects, the dimension tables function as indexes into the associated fact tables. Within these dimensional structures, there are three standard update strategies employed for managing value changes over time. The dimension update process is generically called “Slowly Changing Dimension,” and these strategies are “Type 1,” “Type 2,” and “Type 3.”

Type 1 strategy retains only current values because the associated columns in a dimension table do not require the retention of history.

Type 2 strategy inserts a new row every time one of the column values changes, which always allows association of the crucial fact with the original value of dimension columns.

Type 3 strategy actually adds an additional column onto the dimension table, providing retention of both old and new values (Old_Customer_Name and New_Customer_Name).

For generalized database usage, the value change management archetypes in this discussion include the:

Current-only table.

Functional-history table.

Row-level audit (or shadow) table.

Column-level audit table.

Ledger table.

14

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

Your best source for software development tools!

GUARANTEED BEST PRICES*

® Should you see one of these products listed at a lower price in another ad in this magazine,

CALL US! We’ll beat the price, and still offer our same quality service and support!

*Terms of the offer:

• Offer good through Dec. 31, 2005

• Offer does not apply towards

• Applicable to pricing on current

obvious errors in competitors’ ads

versions of software listed

• Subject to same terms

• December issue prices only

and conditions

Paradise #

L05 053F

$829.99

64-bit SQL

Available!

Paradise #

F01 0131

$850.99

Paradise #

C18 045N

$899.99

Paradise #

P26 010H

LEADTOOLS

Raster Imaging Pro for .NET by LEAD Technologies

Extend and simplify the .NET framework and GDI+ with LEADTOOLS. Support for over 140+ file formats including TIFF, JPG, J2K*, PDF*, and GIF using various compression schemes like JPEG, JPEG2000, LZW, CCITT G3/G4, and CMP. Scan, capture, 200+ image processing filters, high speed display (resize, rotate, quality controls),

and much more. Lots of sample source code to get you started quickly!

*Requires plug-in

www.programmersparadise.com/lead

c-tree Plus® by FairCom

With unparalleled performance and sophistication, c-tree Plus gives developers absolute control over their data management needs. Commercial developers use c-tree Plus for a wide variety of embedded, vertical market, and enterprise-wide database applications. Use any one or a combination of our flexible APIs including low-level and ISAM C APIs, simplified C and C++ database APIs, SQL, ODBC, or JDBC. c-tree Plus can be used to develop single-user and multi-user non-server applications or client-side application for FairCom’s robust database server—the c-treeSQL Server. Windows to Mac to Unix all in one package.

www.programmersparadise.com/faircom

ComponentOne

Studio Enterprise 2005 by ComponentOne

Studio Enterprise 2005 is a truly comprehensive and integrated framework for developing cuttingedge, enterprise-style .NET, ASP.NET, Mobile, and ActiveX applications. Developers receive over 110 best-of-breed grid, reporting, charting, data, user interface, and eCommerce components, top-notch support, and unparalleled power and flexibility. This solution delivers more than what you’ve come to expect from a component vendor.

www.programmersparadise.com/componentone

AllFusion® ERwin® Data Modeler 4.1.4 Plus 1 Year Enterprise Maintenance

by Computer Associates

AllFusion® ERwin® Data Modeler enables you to visualize complex data structures, inventory information assets and establish enterprise-wide standards for managing data. It intelligently automates the design process and synchronizes the model with the database design. You can use this product to design transactional systems, data marts and data warehouses in one integrated environment.

$3735.99

www.programmersparadise.com/ca

8 0 0 - 4 4 5 - 7 8 9 9

dtSearch Web with Spider

Quickly publish a large amount of data to a Web site, with “blazing speeds” (CRN Test Center) searching.

Features over a dozen indexed and fielded data search options.

Highlights hits in XML, HTML and PDF, while displaying links and images; converts other files (“Office,” ZIP, etc.) to HTML with highlighted hits.

Spider adds a third-party site to a site’s own searchable database.

Optional API supports SQL, C++, Java, and all .NET languages.

Download dtSearch Desktop with Spider for immediate evaluation

Single Server

“The most powerful document

New Version 7

Paradise #

search tool on the market”

Terabyte Indexer!

D29 0726

—Wired Magazine

 

$888.99

www.programmersparadise.com/dtsearch

TX Text Control 12.0

Word Processing Components

TX Text Control is royalty-free, robust and powerful word processing software in reusable component form.

.NET WinForms control for VB.NET and C#

ActiveX for VB6, Delphi, VBScript/HTML, ASP

File formats RTF, DOC, HTML, XML

PDF export without additional 3rd party tools or printer drivers

Nested tables, headers & footers, hyperlinks, bullets, numbered lists, multiple undo/redo

Professional Edition Paradise # T79 021V

$739.99

• Ready-to-use toolbars and dialog boxes

Download a demo today.

 

www.programmersparadise.com/theimagingsource

WebWorks ePublisher Pro for Word by Quadralay

Professional online content made easy.

Take control of your content with the next-generation online publishing solution

for Microsoft Word. Lightning-quick, XML-based conversions and a revolutionary workflow make content delivery to the Web, professional online Help systems, portable devices, and PDFs easier than ever before.

 

Paradise #

Free Trial Available!

W21 015B

$893.99

 

 

www.programmersparadise.com/webworks

Microsoft® Visual Studio® 2005

NEW!

Professional Edition

by Microsoft

 

Microsoft Visual Studio 2005 Professional Edition expands on the Standard Edition feature set to include tools for remote server development

and debugging, Microsoft SQL Server™ 2005 development, and a full, unconstrained development environment.

Microsoft, Encarta, MSN, and Windows are either registered

Paradise #

trademarks or trademarks of Microsoft Corporation in the

M47 3030

United States and/or other countries.

CALL

 

www.programmersparadise.com/microsoft

Paradise Picks

DevTrack 6.0

Powerful Defect and Project Tracking

by TechExcel

DevTrack, the market-leading defect and project tracking solution, comprehensively manages and automates your software development processes.

DevTrack features sophisticated workflow and process automation, seamless source code control integration with VSS, Perforce and ClearCase,

robust searching, and built-in reports and analysis. Intuitive administration and integration reduces the cost of deployment and maintenance.

programmersparadise.com/techexcel

Paradise #

$550.99

T34 0206

/n software Red Carpet Subscriptions

by /n software

/n software Red Carpet™ Subscriptions give you everything in one package: communications components for every major Internet protocol, SSL and SSH security, S/MIME encryption, Digital Certificates, Credit Card Processing, ZIP compression, Instant Messaging, and even e-business (EDI) transactions.

.NET, Java, COM, C++, Delphi, everything is included, together with per developer licensing, free quarterly update CDs and free upgrades during the subscription term.

programmersparadise.com/nsoftware

Paradise #

D77 0148 $1444.99

p r o g r a m m e r s p a r a d i s e . c o m

Prices subject to change. Not responsible for typographical errors.

(continued from page 14)

Each of the structures can serve a unique kind of circumstance (as described in Table 1). Therefore, when building new applications, the database design process should include a table-by- table evaluation, identifying which type of temporal functions best suit the overall application needs. Moreover, these approaches are widely used today by developers everywhere. My focus here is on organizing these options into a framework that leads to a cohesive and organized development approach.

Current Only

Clearly, instances occur where you need only the data as it currently stands. For example, an order-taking system may require a table that manages the next available service date. As capacity is scheduled for use, that availability date moves ahead. Knowing what the next available service date was as of last week or last month may not be useful. The current-only table (Figure 1) simply offers an original version of any table that omits start or stop timestamping columns, and provides an important tool to keep in the toolchest. Remember that even under the eyes of the Sarbanes-Oxley compliance reaper, some data may still have no historical value. (However, it is likely that there is much less data categorized in this manner than was considered of little historical value previously.) Furthermore, circumstances may dictate using a current-only table for performance, while using an additional table-type for duplicating the historical aspects of the data. It may be a bit obvious, but the current-only approach is indeed equivalent to the multidimensional Type 1 strategy.

Functional History

When most people think of “keeping history” in their database, some variation of the functional-history table is often what comes to mind (Figure 1). A functional-history table is built by adding an activity start date and an activity stop date to a basic table structure. The row retains historic values frozen in time. When a value changes, then the original row has its activity stop date populated with the current date, and a new row, using the current date as the start date and the new column values, is in-

serted. Logically, this is the equivalent of the previously mentioned dimension Type 2 updating approach.

Fundamentally, this functional-history table alteration does revise the meaning of the table involved. A table originally worked as an “Employee” or “Order” table now becomes an “Employee Activity” or “Order History” table. This meaning adjustment occurs because a single item (either a single employee or a single order, per the given examples) exists in multiple rows after making this change. Consequently, effective use of the functional-history table requires making another change or two. Previously, a unique index likely existed on the item identifier (the “Employee ID” or the “Order Number”). Because any single Employee ID value may be duplicated across multiple rows as changes unfold over time, that index needs to include the activity start-date column. With this start-date addition, the index remains unique.

If you manage data models, the functional-history table, in all its time-laden glory, should exist within the conceptual data model because the functional dependencies and joins should expect a time component. Additionally, while many application tasks read this functional-history table, some need only the current values. It becomes tedious rewriting code over and over in search of a maximum start-date value. Likewise, using a test for NULLs in a stop-date column in order to determine currency can be an issue because NULLs and indexing sometimes conflict with each other. Therefore, a fairly common practice when dealing with functional-history tables involves adding a column that serves as an “Active Indicator.” A “Live or Dead” flag provides a simple test to retrieve only current rows. As a DBMS managed view can be defined using the flag, even this simple test remains hidden from the application code.

Row-Level Audit

The similarity of the row-level audit table to the functional-history table results from retaining values of each change event that impacts table content (see Figure 2). A distinct difference from the functional-history table exists because the row-level audit table is not meant to change the base table, but serve as an add-on structure so that you now have two tables — the original and the

 

 

Can Fit Criteria

 

Requirement

Current Only

Functional History

Row-Level Audit

Column-Level Audit Ledger

 

 

 

 

 

Application only wishes to see current

 

 

 

 

active values for all functional tasks

 

 

 

 

against data store.

X

 

 

 

Normal application tasks will require

 

 

 

 

access to data valid at specific and

 

 

 

 

varying points-in-time.

 

X

 

 

Application only wishes to see current

 

 

 

 

values, but auditors need to be able to

 

 

 

 

see every change.

X

 

X

 

Application only wishes to see current

 

 

 

 

values, but users want to browse through

 

 

 

 

changes made to specific columns.

X

 

 

X

Critical data is numeric and must always

 

 

 

 

be able to know about every change

 

 

 

 

impacting figures.

 

 

 

X

Normal application tasks will require access

 

 

 

 

to data valid at specific and varying

 

 

 

 

points-in-time for a subset of data. Auditors

 

 

 

 

will need to be able to see every change made.

X

X

 

Normal application tasks will require access to

 

 

 

 

data valid at specific and varying points-in-time

 

 

 

for a subset of data. Users want to browse

 

 

 

 

through changes made to specific columns.

 

 

 

 

Auditors will need to be able to see every

 

 

 

 

change made.

 

X

X

X

 

 

 

 

 

Table 1: Considerations in choosing a structure.

16

Dr. Dobb’s Journal, December 2005

http://www.ddj.com

Tools to build. Tools to tune. Tools to transform.

You’re building software applications that will change the

way the world works, plays and interacts. We’re right there with you. With products that help your applications run better and faster.

Compilers. Tuners. Threaders. Debuggers. And much more. Our products help you integrate software innovation with Intel platform technologies—from idea to optimization.

Intel® software network

Check us out at www.intel.com/software/products

© 2005 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

(continued from page 16)

row-level audit version keeping history! The row-level audit table is a virtual copy of the base table with an addition of two columns. One column contains the timestamp of the change event, while the second column identifies the basic nature of that change event— for example, as an INSERT, UPDATE, or DELETE. Any change to the base table also inserts a row into the row-level audit table. This duplication of content and effort suggests the preference to name the row-level audit table a shadow table. In a sense, this shadow table is beneath the surface of the main application functionality, making it a hypofunctional-history table more than a functionalhistory table. By using a shadow structure, a current-only table can remain as part of an application, and its link to a shadow version retains change history in case the application is audited.

Column-Level Audit

Another potentially useful add-on table is the column-level audit table. While ostensibly a simple variation of the previous

Figure 1: Current-only and functional-history structure.

.

Tip #4: Make Only the

.

Actionable Obvious

.

 

 

.

A general rule for showing obvious excep-

 

. tion messages (such as in a dialog) should

Exceptions

be: Show the message only if there is an

File changes to the currently edited file are

 

immediate, reasonable course of action users

 

can take. If none can be imagined, then

 

either the exception can be quietly logged in

 

the background or it’s serious enough to crash

 

the application now, before doing any further

 

damage.

 

An example of an actionable message

 

would be: “The file you were editing was

 

updated; load the new version?”

Pragmatic

exceptional conditions that, internally, could

very well generate exceptions. In such a case, showing users what happened and offering some course of action is the best approach.

But what about showing a dialog (or console message) just before the application crashes that points to the the log file? Go for it. This still fits the rule since they can take action by examining the log.

—Benjamin Booth http://www.benjaminbooth.com/

row-level audit, it differs from the shadow table because everything has gone vertical. The row has been placed up-and- down rather than side-to-side (see Figure 2). By up-ending things in this vertical manner, the table now must include descriptive metadata. In fact, this audit table looks nothing like the original table being tracked. Structurally, the column-level audit table contains the key from the original table, a timestamp of the value change event, the name of the column with changing data, the old/original value, and lastly, the incoming new value. The type of change event may be implied by the data (a nulled old value column implying an INSERT, or a nulled new value implying a deletion), or these actions may be made explicit within another column of the table. This structure assumes that the application tracks exactly which columns are changing and only inserts rows for those columns with altered values. Including arbitrary inserts of all columns for any change, removes any usefulness from this approach because the table explodes with an overabundance of details that must be sifted to find a true change. By keeping both old and new values in this manner on a single row, the column-level audit table provides the functionality implied by a Type 3 Slowly Changing Dimension.

Applications often make use of column-level audit tables in cases that include a detailed application requirement, thus allowing users the capability to browse specific data changes easily. Under these requirements-driven circumstances, change tracking is usually limited to specific column and specific change events. The column-level audit table’s existing format allows for fairly meaningful row browsing with less additional formatting necessary for presentation.

Ledger

The ledger table is a specialty table. This history archetype is unlike any of the “Slowly Changing Dimension” approaches. The distinction for this unlikeness occurs because the ledger table is treated like the fact table in multidimensional designs. In the past, the ledger table structure was useful only in rare and specific

Figure 2: Row-level and column-level audit structures.

Figure 3: Ledger table.

18

Dr. Dobb’s Journal, December 2005

http://www.ddj.com