Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
158
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Building a Module

In most cases, you’ll want to place your Python modules in the site-packages directory. Look in the sys.path listing and find a directory name ending in site-packages. This is a directory for packages installed at a site that are not part of the Python standard library of packages.

In addition to modules, you can create packages of modules, a set of related modules that install into the same directory structure. See the Python documentation at http://docs.python.org for more on this subject.

You can install your modules using one of three mechanisms:

You can do everything by hand and manually create an installation script or program.

You can create an installer specific to your operating system, such as MSI files on Windows, an RPM file on Linux, or a DMG file on Mac OS X.

You can use the handy Python distutils package, short for distribution utilities, to create a Python-based installer.

To use the Python distutils, you need to create a setup script, named setup.py. A minimal setup script can include the following:

from distutils.core import setup

setup(name=’NameOfModule’,

version=’1.0’, py_modules=[‘NameOfModule’],

)

You need to include the name of the module twice. Replace NameOfModule with the name of your module, such as meal in the examples in this chapter.

Name the script setup.py.

After you have created the setup.py script, you can create a distribution of your module using the following command:

python setup.py sdist

The argument sdist is short for software distribution. You can try this out with the following example.

Try It Out

Creating an Installable Package

Enter the following script and name the file setup.py:

from distutils.core import setup

setup(name=’meal’,

version=’1.0’, py_modules=[‘meal’],

)

171

TEAM LinG

Chapter 10

Run the following command to create a Python module distribution:

$ python setup.py sdist running sdist

warning: sdist: missing required meta-data: url

warning: sdist: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied

warning: sdist: manifest template ‘MANIFEST.in’ does not exist (using default file list)

warning: sdist: standard file not found: should have one of README, README.txt writing manifest file ‘MANIFEST’

creating meal-1.0

making hard links in meal-1.0...

hard linking meal.py -> meal-1.0 hard linking setup.py -> meal-1.0 creating dist

tar -cf dist/meal-1.0.tar meal-1.0 gzip -f9 dist/meal-1.0.tar

removing ‘meal-1.0’ (and everything under it)

How It Works

Notice all the complaints. The setup.py script was clearly not complete. It included enough to create the distribution, but not enough to satisfy the Python conventions. When the setup.py script completes, you should see the following files in the current directory:

$ ls

MANIFEST dist/ meal.py setup.py

The setup.py script created the dist directory and the MANIFEST file. The dist directory contains one file, a compressed version of our module:

$ ls dist

meal-1.0.tar.gz

You now have a one-file distribution of your module, which is kind of silly because the module itself was just one file. The advantage of distutils is that your module will be properly installed.

You can then take the meal-1.0.tar.gz file to another system and install the module. First, uncompress and expand the bundle. On Linux, Unix, and Mac OS X, use the following commands:

$ gunzip meal-1.0.tar.gz $ tar xvf meal-1.0.tar meal-1.0/

meal-1.0/meal.py meal-1.0/PKG-INFO meal-1.0/setup.py

On Windows, use a compression program such as WinZip, which can handle the .tar.gz files.

You can install the module after it is expanded with the following command:

python setup.py install

172

TEAM LinG

Building a Module

For example:

$ python setup.py install running install

running build running build_py creating build creating build/lib

copying meal.py -> build/lib running install_lib

copying build/lib/meal.py -> /System/Library/Frameworks/Python.framework/ Versions/2.3/lib/python2.3/site-packages

byte-compiling /System/Library/Frameworks/Python.framework/Versions/2.3/lib/ python2.3/site-packages/meal.py to meal.pyc

The neat thing about the distutils is that it works for just about any Python module. The installation command is the same, so you just need to know one command to install Python modules on any system.

Another neat thing is that the installation creates documentation on your module that is viewable with the pydoc command. For example, the following shows the first page of documentation on the meal module:

$ pydoc meal

Help on module meal:

NAME

meal - Module for making meals in Python.

FILE

/Users/ericfj/writing/python/inst2/meal-1.0/meal.py

DESCRIPTION

Import this module and then call makeBreakfast(), makeDinner() or makeLunch().

CLASSES exceptions.Exception

SensitiveArtistException

AngryChefException

Meal

Breakfast Dinner Lunch

class AngryChefException(SensitiveArtistException) | Exception that indicates the chef is unhappy.

:

See the Python documentation at www.python.org/doc/2.4/dist/dist.html for more on writing distutils setup scripts.

173

TEAM LinG

Chapter 10

Summar y

This chapter pulls together concepts from the earlier chapters to delve into how to create modules by example. If you follow the techniques described in this chapter, your modules will fit in with other modules and follow the import Python conventions.

A module is simply a Python source file that you choose to treat as a module. Simple as that sounds, you need to follow a few conventions when creating a module:

Document the module and all classes, methods, and functions in the module.

Test the module and include at least one test function.

Define which items in the module to export — which classes, functions, and so on.

Create any exception classes you need for the issues that can arise when using the module.

Handle the situation in which the module itself is executed as a Python script.

Inside your modules, you’ll likely define classes, which Python makes exceedingly easy.

While developing your module, you can use the help and reload functions to display documentation on your module (or any other module for that matter) and reload the changed module, respectively.

After you have created a module, you can create a distributable bundle of the module using the distutils. To do this, you need to create a setup.py script.

Chapter 11 describes regular expressions, an important concept used for finding relevant information in a sea of data.

Exercises

1.How can you get access to the functionality provided by a module?

2.How can you control which items from your modules are considered public? (Public items are available to other Python scripts.)

3.How can you view documentation on a module?

4.How can you find out what modules are installed on a system?

5.What kind of Python commands can you place in a module?

174

TEAM LinG

11

Text Processing

There is a whole range of applications for which scripting languages like Python are perfectly suited; and in fact scripting languages were arguably invented specifically for these applications, which involve the simple search and processing of various files in the directory tree. Taken together, these applications are often called text processing. Python is a great scripting tool for both writing quick text processing scripts and then scaling them up into more generally useful code later, using its clean object-oriented coding style. This chapter will show you the following:

Some of the typical reasons you need text processing scripts

A few simple scripts for quick system administration tasks

How to navigate around in the directory structure in a platform-independent way, so your scripts will work fine on Linux, Windows, or even the Mac

How to create regular expressions to compare the files found by the os and os.path modules.

How to use successive refinement to keep enhancing your Python scripts to winnow through the data found.

Text processing scripts are one of the most useful tools in the toolbox of anybody who seriously works with computer systems, and Python is a great way to do text processing. You’re going to like this chapter.

Why Text Processing Is So Useful

In general, the whole idea behind text processing is simply finding things. There are, of course, situations in which data is organized in a structured way; these are called databases and that’s not what this chapter is about. Databases carefully index and store data in such a way that if you know what you’re looking for, you can retrieve it quickly. However, in some data sources, the information is not at all orderly and neat, such as directory structures with hundreds or thousands of files, or logs of events from system processes consisting of thousands or hundreds of thousands of lines, or even e-mail archives with months of exchanges between people.

TEAM LinG

Chapter 11

When data of that nature needs to be searched for something, or processed in some way, then text processing is in its element. Of course, there’s no reason not to combine text processing with other dataaccess methods; you might find yourself writing scripts rather often that run through thousands of lines of log output and do occasional RDBMS lookups (Relational DataBase Management Systems — you’ll learn about these in Chapter 14) on some of the data they run across. This is a natural way to work.

Ultimately, this kind of script can very often get used for years as part of a back-end data processing system. If the script is written in a language like Perl, it can sometimes be quite opaque when some poor soul is assigned five years later to “fix it.” Fortunately, this is a book about Python programming, and so the scripts written here can easily be turned into reusable object classes — later, you’ll look at an illustrative example.

The two main tools in your text processing belt are directory navigation, and an arcane technology called regular expressions. Directory navigation is one area in which different operating systems can really wreak havoc on simple programs, because the three major operating system families (Unix, Windows, and the Mac) all organize their directories differently; and, most painfully, they use different characters to separate subdirectory names. Python is ready for this, though — a series of cross-platform tools are available for the manipulation of directories and paths that, when used consistently, can eliminate this hassle entirely. You saw these in Chapter 8, and you’ll see more uses of these tools here.

A regular expression is a way of specifying a very simple text parser, which then can be applied relatively inexpensively (which means that it will be fast) to any number of lines of text. Regular expressions crop up in a lot of places, and you’ve likely seen them before. If this is your first exposure to them, however, you’ll be pretty pleased with what they can do. In the scope of this chapter, you’re just going to scratch the surface of full-scale regular expression power, but even this will give your scripts a lot of functionality.

You’ll first look at some of the reasons you might want to write text processing scripts, and then you’ll do some experimentation with your new knowledge. The most common reasons to use regular expressions include the following:

Searching for files

Extracting useful data from program logs, such as a web server log

Searching through your e-mail

The following sections introduce these uses.

Searching for Files

Searching for files, or doing something with some files, is a mainstay of text processing. For example, suppose that you spent a few months ripping your entire CD collection to MP3 files, without really paying attention to how you were organizing the hundreds of files you were tossing into some arbitrarily made-up set of directories. This wouldn’t be a problem if you didn’t wait a couple of months before thinking about organizing your files into directories according to artist — and only then realized that the directory structure you ended up with was hopelessly confused.

176

TEAM LinG

Text Processing

Text processing to the rescue! Write a Python script that scans the hopelessly nonsensical directory structure and then divide each filename into parts that might be an artist’s name. Then take that potential name and try to look it up in a music database. The result is that you could rearrange hundreds of files into directories by, if not the name of the artist, certainly some pretty good guesses which will get you close to having a sensible structure. From there, you would be able to explore manually and end up actually having an organized music library.

This is a one-time use of a text processing script, but you can easily imagine other scenarios in which you might use a similarly useful script on a regular basis, as when you are handling data from a client or from a data source that you don’t control. Of course, if you need to do this kind of sorting often, you can easily use Python to come up with some organized tool classes that perform these tasks to avoid having to duplicate your effort each time.

Whenever you face a task like this, a task that requires a lot of manual work manipulating data on your computer, think Python. Writing a script or two could save you hours and hours of tedious work.

A second but similar situation results as a fallout of today’s large hard disks. Many users store files willy-nilly on their hard disk, but never seem to have the time to organize their files. A worse situation occurs when you face a hard disk full of files and you need to extract some information you know is there on your computer, but you’re not sure where exactly. You are not alone. Apple, Google, Microsoft and others are all working on desktop search techniques that help you search through the data in the files you have collected to help you to extract useful information.

Think of Python as a desktop search on steroids, because you can create scripts with a much finer control over the search, as well as perform operations on the files found.

Clipping Logs

Another common text-processing task that comes up in system administration is the need to sift through log files for various information. Scripts that filter logs can be spur-of-the-moment affairs meant to answer specific questions (such as “When did that e-mail get sent?” or “When was the last time my program log one specific message?”), or they might be permanent parts of a data processing system that evolves over time to manage ongoing tasks. These could be a part of a system administration and performance-monitoring system, for instance. Scripts that regularly filter logs for particular subsets of the information are often said to be clipping logs — the idea being that, just as you clip polygons to fit on the screen, you can also clip logs to fit into whatever view of the system you need.

However you decide to use them, after you gain some basic familiarity with the techniques used, these scripts become almost second nature. This is an application where regular expressions are used a lot, for two reasons: First, it’s very common to use a Unix shell command like grep to do first-level log clipping; second, if you do it in Python, you’ll probably be using regular expressions to split the line into usable fields before doing more work with it. In any one clipping task, you may very well be using both techniques.

After a short introduction to traversing the file system and creating regular expressions, you’ll look at a couple of scripts for text processing in the following sections.

177

TEAM LinG

Chapter 11

Sifting through Mail

The final text processing task is one that you’ve probably found useful (or if you haven’t, you’ve badly wanted it): the processing of mailbox files to find something that can’t be found by your normal Inbox search feature. The most common reason you need something more powerful for this is that the mailbox file is either archived, so that you can access the file, but not read it with your mail reader easily, or it has been saved on a server where you’ve got no working mail client installed. Rather than go through the hassle of moving it into your Inbox tree and treating it like an active folder, you might find it simpler just to write a script to scan it for whatever you need.

However, you can also easily imagine a situation in which your search script might want to get data from an outside source, such as a web page or perhaps some other data source, like a database (see Chapter 14 for more about databases), to cross-reference your data, or do some other task during the search that can’t be done with a plain vanilla mail client. In that case, text processing combined with any other technique can be an incredibly useful way to find information that may not be easy to find any other way.

Navigating the File System with the os Module

The os module and its submodule os.path are one of the most helpful things about using Python for a lot of day-to-day tasks that you have to perform on a lot of different systems. If you often need to write scripts and programs on either Windows or Unix that would still work on the other operating system, you know from Chapter 8 that Python takes care of much of the work of hiding the differences between how things work on Windows and Unix.

In this chapter, we’re going to completely ignore a lot of what the os module can do (ranging from process control to getting system information) and just focus on some of the functions useful for working with files and directories. Some things you’ve been introduced to already, while others are new.

One of the difficult and annoying points about writing cross-platform scripts is the fact that directory names are separated by backslashes (\) under Windows, but forward slashes (/) under Unix. Even breaking a full path down into its components is irritatingly complicated if you want your code to work under both operating systems.

Furthermore, Python, like many other programming languages, makes special use of the backslash character to indicate special text, such as \n for a newline. This complicates your scripts that create file paths on Windows.

With Python’s os.path module, however, you get some handy functions that will split and join path names for you automatically with the right characters, and they’ll work correctly on any OS that Python is running on (including the Mac.) You can call a single function to iterate through the directory structure and call another function of your choosing on each file it finds in the hierarchy. You’ll be seeing a lot of that function in the examples that follow, but first let’s look at an overview of some of the useful functions in the os and os.path modules that you’ll be using.

178

TEAM LinG

 

 

 

Text Processing

 

 

 

 

 

Function Name, as Called

Description

 

 

 

 

 

 

 

os.getcwd()

Returns the current directory. You can think of this function

 

 

as the basic coordinate of directory functions in whatever

 

 

language.

 

 

os.listdir(directory)

Returns a list of the names of files and subdirectories stored

 

 

in the named directory. You can then run os.stat() on the

 

 

individual files — for example, to determine which are files

 

 

and which are subdirectories.

 

os.stat(path)

Returns a tuple of numbers, which give you everything you

 

 

could possibly need to know about a file (or directory). These

 

 

numbers are taken from the structure returned by the ANSI C

 

 

function of the same name, and they have the following mean-

 

 

ings (some are dummy values under Windows, but they’re in

 

 

the same places!):

 

 

 

st_mode:

permissions on the file

 

 

st_ino:

inode number (Unix)

 

 

st_dev:

device number

 

 

st_nlink:

link number (Unix)

 

 

st_uid:

userid of owner

 

 

st_gid:

groupid of owner

 

 

st_size:

size of the file

 

 

st_atime:

time of last access

 

 

st_mtime:

time of last modification

 

 

st_ctime:

time of creation

 

os.path.split(path)

Splits the path into its component names appropriately for the

 

 

current operating system. Returns a tuple, not a list. This

 

 

always surprises me.

 

os.path.join(components)

Joins name components into a path appropriate to the current

 

 

operating system

 

 

 

 

Table continued on following page

179

TEAM LinG

Chapter 11

Function Name, as Called

Description

 

 

os.path.normcase(path)

Normalizes the case of a path. Under Unix, this has no effect

 

because filenames are case-sensitive; but under Windows,

 

where the OS will silently ignore case when comparing file-

 

names, it’s useful to run normcase on a path before comparing

 

it to another path so that if one has capital letters, but the other

 

doesn’t, Python will be able to compare the two the same way

 

that the operation system would — that is, they’d be the same

 

regardless of capitalizations in the path names, as long as that’s

 

the only difference. Under Windows, the function returns a

 

path in all lowercase and converts any forward slashes into

 

backslashes.

os.path.walk(start, function, arg)

This is a brilliant function that iterates down through a direc-

 

tory tree starting at start. For each directory, it calls the function

 

function like this: function(arg, dir, files), where the arg is any

 

arbitrary argument (usually something that is modified, like a

 

dictionary), dir is the name of the current directory, and files is

 

a list containing the names of all the files and subdirectories in

 

that directory. If you modify the files list in place by removing

 

some subdirectories, you can prevent os.path.walk() from

 

iterating into those subdirectories.

 

 

There are more functions where those came from, but these are the ones used in the example code that follows. You will likely use these functions far more than any others in these modules. Many other useful functions can be found in the Python module documentation for os and os.path.

Try It Out

Listing Files and Playing with Paths

The best way to get to know functions in Python is to try them out in the interpreter. Try some of the preceding functions to see what the responses will look like.

1.From the Python interpreter, import the os and os.path modules:

>>>import os, os.path

2.First, see where you are in the file system. This example is being done under Windows, so your mileage will vary:

>>>os.getcwd()

‘C:\\Documents and Settings\\michael’

3.If you want to do something with this programmatically, you’ll probably want to break it down into the directory path, as a tuple (use join to put the pieces back together):

>>>os.path.split (os.getcwd())

(‘C:\\Documents and Settings’, ‘michael’)

180

TEAM LinG