Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
158
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Python in the Enterprise

Document retention is a subject in the news lately, again because of the new requirements imposed by Sarbanes-Oxley (you’ll look at them in more detail later). A document retention framework is an organized way of ensuring that particular documents are kept around in storage for a given period of time, after which their disposition is further specified (the terms disposition and archival are more good terminology you can use to impress your friends). Documents may be off-loaded to less accessible, but more permanent, storage (such as optical media), or they may be shredded — that is, their content discarded. Sometimes, they may leave a permanent archival footprint — that is, their metadata may be retained permanently in some way.

One of the Python case studies is a relatively simple document retention framework based on the wftk workflow toolkit. You can use this example right out of the box as a full-fledged document management system, and you can use it as a base from which to build more elaborate support systems for your organization’s business processes. You can see that enterprise software is pretty powerful stuff.

People in Directories

The management of users and groups is a separate area of enterprise software infrastructure that is heavily relied upon, but is often a bit more complex than it seems it should be. Most earlier platforms included their own idiosyncratic systems for keeping track of who’s who. Fortunately, in the 90s, with the advent of open-directory standards and well-understood platforms such as X.500 and LDAP, directories started to be understood as a separate area of software infrastructure that needed to be standard.

One of the reasons why directories were a slow starter is that keeping track of users and groups is simply not very complicated. You can easily do good user and group management in a plain old database, so it’s not surprising that the writers of, say, document management systems would simply include users and groups into their database structures and not think twice about using an external system. (The fact that external global systems for user management didn’t exist until recently was another strong motivation.)

Organizationwide directories make sense for a few reasons. The first is simply the fact that they exist and have become a standard, and they can become the de facto location of information about people across one entire organization, or across many sub-organizations. The second reason is that they are so ubiquitous, because when they work well and are structured in such a way that they make a lot of sense for storing directory information, they fill a niche that every organization has. The third reason that modern directory servers are becoming so popular is that directory servers are now designed and written to be blindingly fast and low-overhead for reading data from them. Because organizational data tends to change very slowly in comparison to the data stored in most relational databases, you can think of a directory server as a sort of write-once read-often database — retrieving user information from a directory, even performing complex searches on directory information, can be cheaper than from a relational database. This makes it suitable for use by even large organizations with many dependent applications. Directory concepts can be better understood if they’re compared to relational databases. Directories contain entries, which are like database records. Entries have attributes that are named, searchable values, like a row of data in a database as well. Unlike relational databases, a directory system arranges these entries in a hierarchical, or tree, structure, a lot like the files and folders on a regular disk drive. The tree structure works well because it often corresponds to the organizational structure of a company, or to geographical relationships between entities, so that you might have an entry for the whole company, and sub-entries (branches from the tree) for divisions, departments, working groups, and finally individual employees.

431

TEAM LinG

Chapter 20

Beyond simply storing contact information, the directory can also store information about the functional groups to which a particular person belongs. These are usually just called groups, although this can result in some confusion with organizational groups. The functional groups to which a particular user belongs can be used to determine the roles that person can play in particular business processes. You’ll find out more about this, and in more detail, when the discussion turns to workflow systems.

The simplest and possibly the most widespread access API for directory systems, and the one you’ll be looking at in this chapter, is LDAP (Lightweight Directory Access Protocol.) The OpenLDAP project has standalone servers, which can talk to LDAP clients, and LDAP can also be used to talk to other directories, such as X.500. The API is simple and easy to understand when you get past the newness of directory names, and it’s well-supported in Python, so it’s an easy choice.

In the following sections, you’ll look at some simple ways to use the Python-ldap module from Python to retrieve information about users stored in an OpenLDAP directory, but first you’ll be introduced to the last of the three main categories of enterprise software covered in this chapter.

Taking Action with Workflow

The third category of enterprise software is workflow. Workflow systems model the processes that are used in the organization. From a programming perspective, a process is usually thought of as being a running program, but the processes modeled by workflow may be active for days, months, or even years; these are business processes, not system processes.

Whenever people — that is, actual human beings — must make decisions or take action during the course of a process, that process is an example of a workflow. Workflow processes can also be fully automatic, with any decisions made based on pre-set rules, but in general it’s the involvement of human beings that’s key.

An example of a simple workflow case would be as follows:

1.A user on the Internet submits a form.

2.The results of the form are used to add a record to a database — a situation with which you’re familiar.

3.The results need to be approved, depending on the contents. A workflow system makes it easy to model an approval step:

a.After the form is submitted, it is added to a queue

b.A clerk checks it to make sure it’s appropriate.

4.The record added to the database.

Only after every step has been completed is the workflow complete. If any step doesn’t succeed (the form isn’t fully filled out, the clerk can’t validate the data, or some other event interferes), the workflow doesn’t complete, and has to be expanded to accommodate that. For instance, do you get in touch with the user (you probably want this for a business lead) or do you just discard the data (it seems many companies choose to do this with customer feedback anyway).

Between the form submission and the addition of the database record, the situation is active in the workflow system as a (workflow) process. Any time an actor is involved and must perform an action, there’s

432

TEAM LinG

Python in the Enterprise

a task involved. For instance, in step 3a, there is a task active for the clerk. Now consider this: “the clerk” may be a single person or any one of several people — for example, the staff of a department, or even a program that automates the validation.

In the workflow world, you talk about a role — and then you can specify, perhaps by means of an attribute value in a directory, who can fill that role in any specific case by placing that user, another process or program, into that role.

The workflow system records each event relevant to the process in the process’s enactment. This includes things such as who (the actor) completed a particular task, who is assigned to a given role for the current process, and so forth.

This functionality makes workflow systems particularly useful for demonstrating compliance with regulatory requirements — a workflow system not only ensures that a documented process is followed; it enables you to demonstrate that this is actually the case.

Auditing, Sarbanes-Oxley, and

What You Need to Know

Auditing is a word to strike fear into the hearts of nearly everyone, but that’s only because the natural state of most people is one of disorganization. The point of auditing is to ensure the following:

1.

2.

3.

Things are documented.

The documentation is correct and complete.

The documentation can actually be found.

As you can well imagine, the features of enterprise software and the requirements of auditors coincide to a great deal, as they’re both intended to ensure that the people who own and run companies actually know what those companies are doing. By knowing more about enterprise software, you can be better prepared to help your organization meet auditing requirements.

This is a disclaimer:

While you will gain some understanding of auditing here (and the thinking that goes into it), the practice of auditing is a lot like the practice of law. You can only learn enough in one chapter to be dangerous. It is highly specific to the situation at hand; and in spite of the rules involved, it is as much an art as a science. The company, organization, and country in which an audit is being conducted, and even the people and regulators involved, combine to make the rules that set the conditions for the success or the failure of the audit. No two will be exactly alike!

433

TEAM LinG

Chapter 20

If you’re interested in working with products that will be used in an audit, or that need to be bulletproof for an audit, you need to speak with the people who will be involved with that audit — lawyers, accountants, business stakeholders, and anyone else relevant, and get their input! Without that, you shouldn’t have a feeling that you’re on firm ground in developing a program or process to meet those needs. Auditing requirements are prominent in the news lately. The Sarbanes-Oxley Act (often referred to as SOX by the people who love it) was signed into law in 2002 in order to strengthen the legal auditing requirements for publicly traded companies, although its different sections didn’t go into effect until later — some parts as late as the end of 2004. As is the case for most legislation, it is a sprawling 150 pages in length, and only two sections (Section 302 and Section 404) are really of interest to us as programmers, as they involve the “control frameworks” in place in a company, which includes IT installations.

It’s important to realize that while auditing started out as a discipline for enforcing financial controls (an “audit” to most people is something the IRS does when they scrutinize your financial records), in fact, auditing now concerns documentation and verification of all sorts. It encompasses disciplines from traditional financial documents to the documentation of compliance with any legal requirements, which might include OSHA or EPA regulations, for instance.

Some industries, such as medical care, have industry-specific regulatory frameworks (HIPAA is the example for the medical industry), and an audit must be able to show that a company is in compliance with those requirements. Another example of auditing guidelines is the ISO quality management series, such as ISO 9000 and ISO 9001. Compliance with these guidelines is typically not required by law, but can be a very potent factor in competitiveness, because many companies will not deal with suppliers who are not ISO compliant.

Auditing and Document Management

The central point of all these auditing requirements is document management. The ISO defines quality auditing, for instance, as a “systematic, independent and documented process for obtaining audit evidence and evaluating it objectively to determine the extent to which audit criteria are fulfilled.” Audit criteria are the requirements imposed by legal requirements, industry guidelines, or company policy; ISO defines them as a “set of policies, procedures, or other requirements against which collected audit evidence is compared.” The real crux of the matter, however, is the audit evidence, which ISO defines as “records, statements of fact or other information, relevant to the audit and which are verified.” Thus, the task of document management is to store audit evidence, and to make it available to auditors in an organized way whenever any of it is required.

Simple storage, however, is not the only requirement you have for document management. Document retention, which you’ve read about, is the flip side of the coin. Documentation retention policies specify, for different classes of documents, how long each one must be held and be retrievable, and what is to be done with it after a particular period of time (burned to optical storage, discarded, etc.) For instance, there are legal requirements in some countries, such as Italy, for ISPs to retain all router logs for a certain time, such as 90 days or three years. Once that period has elapsed, not only is there no reason to keep those logs, they may actually represent a liability because the ISP has to pay for storage space for all of that data. A document retention policy might specify that documents should be discarded after the specified period elapses.

Another example is the retention of e-mail logs to and from corporate users. There are some requirements for retention of internal communications, but in general, companies don’t want to keep them any longer than absolutely necessary from a legal standpoint.

434

TEAM LinG

Python in the Enterprise

Document retention is one of those subjects that people talk about in the abstract, around the water cooler with their friends, but it’s really not all that difficult in principle to build a simple document retention system (where it becomes complicated, of course, is when complex logic is required to decide when a document should be affected, which is part of the large job of document management). One of the programming examples that follow is actually the construction of a simple document retention framework in Python.

Working with Actual Enterprise Systems

The actual coding of most enterprise systems is relatively simple when you get right down to it. When you’ve documented the way your organization actually works, and you’ve modeled the system context and the business processes, all you really need to do is write various bits of glue code to move information around, which Python makes really easy.

You’ll be able to work through all three of the categories of software covered earlier (document management, directories, and workflow systems), but you’ll only be using two different packages to do all of that. The wftk open-source workflow toolkit is the package used to demonstrate data and document management and workflow systems; then the Python-ldap module is introduced to cover a few simple operations against LDAP directories.

Introducing the wftk Workflow Toolkit

The wftk is an open-source workflow toolkit written by Michael Roberts of vivtek.com, who has lavished much effort on it since its inception in 2000. It is written in ANSI C and has a well-tested interface into Python, so it’s a strong choice for writing enterprise-oriented code in Python.

The one thing you have to keep in mind when writing to the wftk, however, is that you’re not just sitting down to write code. As explained at the outset of this chapter, enterprise programming is all about modeling business processes, and the wftk reflects that basic approach. Before doing anything else, you have to describe the context of the actions and objects with which you’ll be working. After a while, this gets to be second nature, but until it does, it may feel unnatural.

It makes sense to talk about a repository first when starting to build a system using the wftk. The repository defines the context for everything else in a wftk system. The key part of the repository definition is the lists it defines. A list is simply a data source — in terms of relational databases, a list can be seen as a table, because it contains entries that have corresponding records. However, there are some differences:

An entry always has a unique key that can be used to reference it.

An entry need not consist of a fixed number or assortment of fields (data values).

An entry can also contain arbitrary XML.

An entry can include any number of named attachments, each of which can have a version list.

It’s important to realize that these are simply the capabilities of a general list; real lists stored in real actual places may only offer a portion of these capabilities based on your decisions.

435

TEAM LinG

Chapter 20

The other salient feature of the repository and the repository manager is that everything is managed by adapters. An adapter is a fairly compact set of code that defines an external interface. For instance, there are adapters to store list entries in a local directory as XML files, or in a relational database table (MySQL, Oracle, or a generic ODBC database) as lines in a text file, and so on. In addition, it’s simple to write new adapters to treat other resources as suitable locations for the storage of repository lists.

In other words, when the task is to write some simple code to do something in the context of an already existing set of information resources, the first task is to describe those resources as lists so that the repository manager knows what’s where. The bigger examples will step through a couple of systems in detail, but initially you can look at a few much simpler examples so you can see how it’s done. All of these examples assume that you have already installed the wftk and it is running in your Python installation (see Appendix B for the wftk home page, and this book’s web site at www.wrox.com for more thorough instructions on installing the wftk on your computer).

For your first foray into the wftk, set up an extremely simple repository and write a few scripts to save and retrieve objects. Generally, a wftk repository is a local directory in which the repository definition is saved as an XML file named system.defn. Subdirectories can then be used as default list storage (with each entry in an XML file of its own), and a convenient event log will be written to repository.log in the repository directory. If you haven’t read Chapter 15 on XML and Python yet, you can look there if you are confused. In addition, remember that all of this text will be on the book’s web site.

Try It Out

Very Simple Record Retrieval

1.Create a directory anywhere you want; this directory will be your repository (all the following examples will reside in this directory). Using the text editor of your choice, open the file system.defn in that directory, and define the repository’s structure like this:

<repository loglevel=”6”> <list id=”simple”>

<field id=”field1” special=”key”/>

<field id=”field2”/> </list> </repository>

2.Save the file and then create a subdirectory named simple. Just to make things easy, use your text editor to add a file in the new subdirectory named first and add the following contents to it:

<rec>

<field id=”field1”>value1</field>

<field id=”field2”>value2</field> </rec>

3.Save the record and then return to the main directory and add a new Python script named simple.py. This script is going to retrieve lists of keys and then the object you just created. Again using your text editor, enter the following code:

import wftk

repos = wftk.repository(‘site.opm’)

l = repos.list() print l

l = repos.list(‘simple’)

436

TEAM LinG

Python in the Enterprise

print l

e = repos.get(‘simple’, ‘first’) print e

print e.string() print

print e[‘field1’]

4.Now run the simple retrieval script:

C:\projects\simple>python simple.py [‘simple’]

[‘first’]

<rec key=”first” list=”simple”> <field id=”field1”>value1</field> <field id=”field2”>value2</field> </rec>

<rec key=”first” list=”simple”> <field id=”field1”>value1</field> <field id=”field2”>value2</field> </rec>

value1

How It Works

What did you just do? In step 1, you described a simple repository for the wftk. In step 2, you created a schema for a two-element list (within the <rec></rec> tags). In step 3, you then wrote a small program that accesses data from the fields you defined in your new repository. In step 4, you used that script to get some data.

Specifically, the repos.list call returns a list of keys that reside in that particular list. If you don’t provide a list name, though, all of the names of all of the lists defined in the system will be returned. Therefore, you can see that your file first.xml shows up as a record with the key first, in the second call to repos.list.

When that’s confirmed, the repos.get invocation is used to retrieve an entry. Its first parameter is the repository, “simple”, and the second parameter is the key, “first”. This entry is an xmlobj object, which is a specific wftk object:

The upshot is that it is an XML record that the wftk can use like a Python dictionary from which it can retrieve values, which is done in the final line of the example. (The xmlobj library has some additional extended capabilities, which you can explore on your own if you like.) If you choose to print just the entry (either by calling it or calling it with the string method), it renders itself back into XML.

The most important point to take away from this first opportunity to test the waters is this: Because the repository definition describes a context, this description can also be used to indicate storage of data in other locations, such as MySQL databases.

That flexibility makes it a lot more useful and exciting: By offering you the capability to describe data sources, you make any data source amenable to being integrated with your workflow. Even better, the adapter concept means that even if you have data in some idiosyncratic data format, all you have to do is write a few lines of relatively simple C code, and the wftk can use it — it’s as simple as that.

437

TEAM LinG

Chapter 20

Note a few things in the server definition in the first file: First is the log level set on the very first line. The log level determines how verbose the wftk will be (how much information will be printed) when logging events to the repository.log file (you can look at this now; it’ll be there after you’ve run the example script). A log level of 6 is very high, which means that it will print a lot of information; under normal circumstances, you probably won’t want to use it for anything except development and debugging. Instead, try a log level of 2 or 3 after you’ve seen the messages once; this will help you to get a general sense of what’s going on. If you don’t need the log at all, turn it off with a log level of 0.

Try It Out

Very Simple Record Storage

Record retrieval alone will only take you so far. Let’s create another simple bit of code, this time to store a new record into the same list that you just created, and from which you’ve already retrieved the record “first”:

1.Go to the repository directory you set up in the last example and add another new Python script named simple2.py. This script is going to store a record and then display the newly changed record to show the effect on the list:

import wftk

repos = wftk.repository()

e = wftk.entry (repos, ‘simple’) e.parse (“””

<rec>

<field id=”field2”>this is a test value</field> <field id=”extra”>here is an extra value!</field> </rec>

“””)

print “BEFORE SAVING:” print e

e.save()

print “AFTER SAVING:” print e

print

l = repos.list(‘simple’) print l

2.Now run the storage script:

C:\projects>python simple2.py BEFORE SAVING:

<rec>

<field id=”field2”>this is a test value</field> <field id=”extra”>here is an extra value!</field> </rec>

AFTER SAVING:

<rec list=”simple” key=”20050312_17272900”> <field id=”field1”>20050312_17272900</field> <field id=”field2”>this is a test value</field> <field id=”extra”>here is an extra value!</field> </rec>

[‘20050312_17151900’, ‘first’]

438

TEAM LinG

Python in the Enterprise

How It Works

In step 1, the program code creates a new entry for a given list, parses XML for a new entry into the list, and then saves it with the e.save method. Here, this is accomplished in two steps so you can see how it can be done in case you ever need to do something between these steps. For a real-world application, you have a simpler option: You can use the alternative repos.add(‘simple’, contents), which would have the same ultimate effect except that you wouldn’t have the record to print before and after storage because it’s all wrapped into one step. Here, the default storage adapter has found the column flagged for special key handling; and because a key requires a unique value in the repository (similar to how a primary key in a relational database works), it has created a unique value for that field.

In the following workflow examples, you can see another interesting application of this same behavior. Just as a record is extended by adding a new key, workflow actions can also be defined, which are triggered on particular events, specifically when a record is added. Because the workflow actions can be pretty advanced, you can take advantage of these triggers as a convenient way of running a script to perform maintenance, perform backups, or do other things you need, at periodic intervals.

Another feature demonstrated in the code example is the extra field in the stored object that isn’t defined in the list definition (see the file ‘server.defn’ in the previous code example to confirm this).

Unlike a database, the wftk really doesn’t care and will handle this without complaint. This underscores the fact that the server definition is descriptive — it simply tells the wftk what is out there in its environment. The philosophy of the wftk is to make as few further assumptions as possible and still do something sensible with what it finds.

Try It Out

Data Storage in MySQL

Next in this series of wftk building blocks is to store some data in your MySQL database and then use the same set of code to work with it in the database as you just used to work with files in the local directory.

The point here is to emphasize that the wftk works well with data no matter where it might reside — as long as it can be organized into lists and records. Once you have your data in this form, you’re good.

This exercise assumes that you have MySQL installed and available to you, and that you have compiled the wftk with MySQL capabilities (on Unix) or you have the LIST_mysql.dll adaptor installed (on Windows). If you don’t have these, see the web site for this book for more detailed instructions on finding and installing them.

1.From the MySQL prompt, define a table and add a record to it like this:

mysql> create table test (

-> id int(11) not null primary key auto_increment, -> entry datetime,

-> body text);

Query OK, 0 rows affected (0.53 sec)

mysql> insert into test (entry, body) values (now(), ‘this is a test’); Query OK, 1 row affected (0.08 sec)

mysql> select * from test;

439

TEAM LinG

Chapter 20

+----

+---------------------

+----------------

+

| id | entry

| body

|

+----

+---------------------

+----------------

+

|1 | 2005-03-11 23:56:59 | this is a test |

+

----+---------------------

+----------------

+

1

row in set (0.02 sec)

 

 

2.Now go to the repository directory from the first code example and modify server.defn to look like the following. All you’re doing is adding a list definition for your new table, and adding a connection definition for MySQL. Note that if this is anything other than a private test installation of MySQL, you will want to change the userid and password that wftk will use to log into your test database! Make it something that only you know, and use it wherever the userid “root” and password “root” is shown here in the examples.

<site loglevel=”6”>

<connection storage=”mysql:wftk”

host=”localhost” database=”test” user=”root” password=”root”/> <list id=”simple”>

<field id=”field1” special=”key”/> <field id=”field2”/>

</list>

<list id=”mtest” storage=”mysql:wftk” table=”test” key=”id”> <field id=”id” special=”key”/>

<field id=”entry” special=”now”/> <field id=”body”/>

</list>

</site>

3.Now add yet another script file; call it simplem.py, and add the following content:

import wftk

repos = wftk.repository()

l = repos.list() print l

l = repos.list(‘mtest’) print l

e = repos.get(‘mtest’, ‘1’) print e

e = repos.add (‘mtest’, “”” <rec>

<field id=”body”>this is a test value</field> </rec>

“””)

l = repos.list(‘mtest’) print l

print e print

print e[‘body’]

440

TEAM LinG