Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Python (2005)

.pdf
Скачиваний:
159
Добавлен:
17.08.2013
Размер:
15.78 Mб
Скачать

Extension Programming with C

static int Encoder_init(pylame3_EncoderObject *self, PyObject *args, PyObjecti*kw) {

PyObject *outfp;

if (!PyArg_ParseTuple(args, “O”, &outfp)) { return -1;

}

if (self->outfp || self->gfp) { PyErr_SetString(PyExc_Exception, “__init__ already called”);

return -1;

}

self->outfp = outfp; Py_INCREF(self->outfp); self->gfp = lame_init(); lame_init_params(self->gfp); return 0;

}

You’ve modified the format string for PyArg_ParseTuple to contain “O” instead of “s”. “O” indicates that you want an object pointer. You don’t care what type of object it is; you just don’t want PyArg_ParseTuple to do any kind of conversion from the object to some primitive C data type.

After you’re sure you were passed the correct number of arguments and __init__ hasn’t been called before, you can store the object argument for later use. Here you’re using the Py_INCREF macro to increment the reference count. This will keep the object alive until you decrement the count.

Why did the previous macro, Py_XDECREF, have an “X” in it, while this one did not? There are actually two forms of these macros. The “X” versions check to ensure that the pointer isn’t NULL before adjusting the reference count. The other two don’t do that check. They’re faster, but you have to know what you’re doing in order to use them correctly. The documentation for PyArg_ParseTuple tells us that if it succeeds, the output pointer will be valid, so I felt safe using Py_INCREF here, but I didn’t feel that safe with Encoder_dealloc.

Making sure that you perfectly balance your increments with your decrements is the trickiest part of implementing extension modules, so be careful. If you don’t, you could leak memory, or you might access an object that’s already been deleted, which is never a good thing.

It’s also very important to pay attention to the documentation for the different API functions you use in terms of references. Some functions will increase the reference count before returning it. Others won’t. The documentation for PyArg_ParseTuple states that the reference count is not increased, which is why we have to increment it if we expect it to stick around for as long as we need it.

Now that you have an object (that hopefully has a write method), you need to use it. Instead of calling fwrite in Encoder_encode and Encoder_close, you want to call the write method on your object. The Python API has a function called PyObject_CallMethod that will do exactly what you need it to do. Here’s the snippet of code you would use in both Encoder_encode and Encoder_close to call the write method on your object:

PyObject* write_result = PyObject_CallMethod(

self->outfp, “write”, “(s#)”, mp3_buffer, mp3_bytes);

if (!write_result) { free(mp3_buffer); return NULL;

}

Py_DECREF(write_result);

381

TEAM LinG

Chapter 17

PyObject_CallMethod requires three parameters. The first is the object on which you’re invoking the method. This object will be the first argument into the method, usually called self. The second argument to PyObject_CallMethod is the name of the method. The third argument is a format string describing the arguments. This can be NULL if there are no arguments. When it’s not NULL, it looks very similar to a PyArg_ParseTuple format string except it’s always surrounded with parentheses. PyObject_CallMethod is basically calling Py_BuildValue for you with these parameters, and the tuple that results is being passed in to your method.

PyObject_CallMethod returns a PyObject *. All write method implementations probably return None, but you’re still responsible for decrementing the reference count.

Because most of pylame3.c hasn’t changed from pylame2.c, I won’t include the entire file here. It shouldn’t be too difficult to insert the changes described in this section.

Once the new version of the module is compiled, you can use any file-like object you want as a parameter to the Encoder object. Here’s an example that demonstrates this:

import pylame3

INBUFSIZE = 4096

class MyFile(file):

def __init__(self, path, mode): file.__init__(self, path, mode) self.n = 0

def write(self, s): file.write(self, s) self.n += 1

output = MyFile(‘test3.mp3’, ‘wb’) encoder = pylame3.Encoder(output) input = file(‘test.raw’, ‘rb’)

data = input.read(INBUFSIZE) while data != ‘’:

encoder.encode(data)

data = input.read(INBUFSIZE)

input.close()

encoder.close()

output.close()

print ‘output.write was called %d times’ % output.n

This example includes a class derived from the built-in file object to show off some of the stuff you can do. OK, it’s not that impressive, but it at least shows how flexible your new extension module can be. As long as you pass in an object that has a write method, your extension module is happy.

382

TEAM LinG

Extension Programming with C

Summar y

In this chapter, you learned how to expose simple functions implemented in C to Python developers by creating an extension module and defining a method table. Converting Python objects to C values is done using PyArg_ParseTuple. Going the opposite way, turning a C value into a Python object is done using Py_BuildValue.

You also looked at how to define new types in an extension module by defining the object and type structures. You set up the type object so that it could create new instances of your type and later destroy them. Making sure that you correctly increment and decrement the reference counts of objects that you use requires careful consideration.

There’s a lot more to writing extension modules, of course, but not enough room in one chapter to cover it all. Be sure to consult the documentation at http://docs.python.org/ext/ext.html and http://docs.python.org/api/api.html.

Exercises

1.Add a new module-level function to the foo module you created earlier in the chapter. Call the function reverse_tuple and implement it so that it accepts one tuple as an argument and returns a similarly sized tuple with the elements in reverse order. Completing this exercise is going to require research on your part because you need to know how to “unpack” a tuple. You already know one way to create a tuple (using Py_BuildValue), but that’s not going to work for this exercise, because you want your function to work with tuples of arbitrary size. The Python/C API documentation for tuples (at http://docs.python.org/api/tupleObjects.html) lists all of the functions you need to accomplish this. Be careful with your reference counting!

2.List and dictionary objects are an extremely important part of nearly all Python applications so it would be useful to learn how to manipulate those objects from C. Add another function to the foo module called dict2list that accepts a dictionary as a parameter and returns a list.

The members of the list should alternate between the keys and the values in the dictionary. The order isn’t important as long as each key is followed by its value. You’ll have to look up how to iterate over the items in the dictionary (hint: look up PyDict_Next) and how to create a list and append items to it (hint: look up PyList_New and PyList_Append).

383

TEAM LinG

TEAM LinG

18

Writing Shareware and

Commercial Programs

Python is not just for open-source applications, even though it is a successful open-source project. Two of Python’s greatest advantages are rapid application development because of how easy it is to develop in it, and how it can integrate with existing code even when it’s written in another language.

Commercial entities also want to take advantage of being able to develop application features faster than their competition does. This is not an advantage that can be overstated in the commercial software environment. Anything that one programmer can do another programmer can duplicate, but it takes time. The faster and more efficiently a company can develop its products, the more likely that company is to be able to meet their customers’ needs before one of their competitors manages to do the same.

Conservative estimates place Python’s productivity at one level of magnitude greater than that of other cross-platform development environments, such as .NET and Java, for large programs. That’s a lot of saved time and money for a development organization and a significant competitive advantage for the company that chooses to use Python rather than Java or .NET, or, of course, C or C++ (which can’t be considered either fast to develop in or cross-platform by any modern definition).

With these advantages in mind, this chapter should help you decide how to use Python in a commercial context, whether as a smaller company doing shareware or small applications or a large company doing large-scale development.

A Case Study: Background

Think of this section as a case study to which you can refer when designing your own solutions, so you can frame some of the decisions you will likely have to make when starting a new commercial project in Python.

A software security company, Immunity, was formed in 2002 in New York City, shortly after the attacks on the World Trade Center. This company had the traditional specialized software

TEAM LinG

Chapter 18

company’s tripod of services: training, consulting, and a proprietary software platform. The proprietary software, Immunity CANVAS, needed to do several complex things:

Replicate many complex network protocols, including SSL, OncRPC, DCE-RPC, FTP, HTTP, and so on

Conduct mathematical operations on data as if that data were composed of C-like primitives (unsigned integers, signed integers, C-style strings, and so on)

Port to at least Windows and Linux

Have a nice, commercial-quality GUI

Currently, some advantages to selling software to the software security market do exist; chief among them is the fact that most purchasers of CANVAS are likely to be highly technical. In fact, a large majority of them will be used to using Linux and installing software for Linux, making it a viable platform to target.

How Much Python Should You Use?

Commercial programs are not written in a vacuum. Ideally, when a commercial project is launched, it has been created to meet the requirements of clients who have a valid business need, and who have acknowledged that need and are looking for a solution. While an open-source program doesn’t face the threat of disappearing due to a lack of unit sales (as it can only be killed off by a lack of interest from its developers, not a lack of customers), a commercial enterprise represents an ongoing investment, often in the face of other software companies making counter investments with the hopes of occupying the same niche. It’s worth noting that the main competition of Immunity CANVAS, CORE Impact, also uses Python, although the bulk of the product is written in C++. Both CANVAS and Impact are exploitation frameworks, which means that they allow companies to test the security of their systems and networks by simulating attacks on known and suspected vulnerabilities, and both have their exploits written in Python.

Impact uses a closed-source engine written in C/C++ to help manage and run the exploits and to manage the GUI. This set the stage for the first decision the Immunity developers had to make: Given that their competition was also using Python and it provided obvious benefits, how could they maximize their advantage by using Python? Immunity chose to do two things at the design level to maintain a rapid application development advantage over CORE:

1.Move all of the CANVAS codebase to Python, including the GUI and underlying exploitation framework

2.Support a command-line interface, which enables a faster development cycle by isolating the engine from the GUI, and which enables the use of standard debuggers and profiling tools

The trade-off here is that by using a pure Python underlying framework, it becomes extremely difficult to inhibit software “piracy.” Python, as a (mainly) interpreted language, creates program files that are far easier to reverse-engineer than files created in a language like C or C++, where the source code is compiled to a binary format after being heavily mangled in the process by intermediary optimization and translation phases. Therefore, the decision to use Python has consequences that reach into every level of the development plan, and often reflects strongly in the production of the business plan. The first issue is keeping the customers honest. It’s common knowledge that very few companies, no matter how wellmeaning, are 100 percent honest about how many places they’ve installed software. It’s understood that,

386

TEAM LinG

Writing Shareware and Commercial Programs

typically, one paid copy of software will be used by 10 people at the same time. If this is something that you’re going to need to address, see the section “Pure Python Licensing” that follows for some ideas on how you can offset this.

If your competition is not using Python, or if you have a larger developer team than your competition, you can package and embed a Python interpreter in your program and still maintain some of the advantages that Python offers for rapid application development, while using traditional mechanisms to allow flexible software licensing, such as integrating a commercial license manager into your package. However, this isn’t possible if you’re expecting to use the Python interpreter that is already installed on your clients’ systems.

Even if you’ve decided to write all your code in Python, you still have another decision to make: Should you allow your developers the capability to import and use modules that are not from the base modules in the Python distribution? The base modules are guaranteed to be installed with every Python installation, and they have liberal BSD-like licenses. However, you might want to include features that already exist in another library in your software, but do it without the costly investment of having to re-implement these features. Including them directly in your software places the onus on you to determine the licensing requirements and handle any related issues yourself, whereas going the other route and relying on external libraries and modules having to exist on the clients’ systems means having to find a way of ensuring that your users can obtain the required libraries, if you’re not going to distribute them yourself.

When you write software that relies on any other software that you haven’t written, it’s important to get the licensing right. A BSD-style license allows you to use the source code in free or commercial products without requiring you to pay a license fee, and grants this right irrevocably. However, it does not grant you the right to claim any copyright on the product you are using. To understand some of the licenses that come with open-source software, you can read the following document: http://cyber.law.harvard.edu/openlaw/gpl.pdf. For commercial products, you should refer to the license that you received when you purchased it, as there are as many licenses as there are products.

Immunity decided to go with a pure-Python approach for CANVAS, except with regard to their GUI library, for which they chose pyGTK (although this means that the development is still done in Python; see Chapter 13 for information on pyGTK). Key to making your choice should be a survey of projects that are currently using Python to do commercial work, and determining whether the problems that you will be facing have already been solved by others. If they have, then there is a high probability that you will be able to solve your problems with Python as well. For some ideas about what has been done, look at Zope, Immunity CANVAS, WingIDE, and Ximian RedCarpet. These are all good products to look at as you frame your software and business architectures, and each one solves a different set of problems.

Pure Python Licensing

If you have chosen to go with a pure-Python model, you can still take steps to allow for lucrative licensing models. Your goal is as follows: You don’t want to turn off your customers or give up too many of the advantages of using Python in the first place, but you still want to be able to control the distribution of your software and restrict users to using what they’ve paid for.

387

TEAM LinG

Chapter 18

Whatever you do, don’t forget to market the advantages to your customers of your open development platform as strongly as you can. Immunity has found this to be a key differentiation between themselves and their competition; Immunity’s CANVAS is transparent from front to back, so customers can add to and customize the product, which enables them to do more with the product than they could have ever imagined. No business can truly predict what their customers are going to do with their product, and your business is no exception. That’s why one of the hardest decisions you have to make is how to create your software license, because it can affect how your customers can use your product.

Licenses are all about restrictions. The problem with restrictions regarding usage of your product is that they typically require a binary copy protection module that cannot be removed from your product, as shown in the following example:

if licensedDate < todaysDate:

print “You have run out of license time!” sys.exit(1)

Although this bit of code may work well in principle within a compiled C program because by its nature the code created from it is hard to discern, it’s trivial for even the most basic Python programmer to remove it from your program because it appears in readable text.

To avoid this obviousness, you could distribute your Python as a .pyc or .pyo if you like, but these are easily reverse-engineered as well. In other words, enforcing something like an annually renewed license scheme by restricting based on the date, as above, is nearly impossible. Likewise, programs that require “activation” are nearly impossible to create without a binary module in which you can hide the workings of this protection.

WingIDE 2.0 manages to do this form of activation-based license control well, so it might be worth your time to take a look at how they do it if you decide to go down the product activation road. The key

to their success is that they distribute their own Python engine, rather than use a provided Python on the host system. However, this style of distribution may not work for you. Keep in mind that even with this system in place, WingIDE is also distributed as a pure-source license on certain platforms.

Not being able to have binary modules important for the functioning of your program leaves you at a loss for restricting the usage of your product based on time or the number of installations at a site.

Web Services Are Your Friend

As you’ll see in Chapter 21, Python support for web services is thorough, complete, and once you’ve learned something about using it, drop-dead simple to use. This is a major advantage over almost every language except C#, and Python’s dynamic typing makes it even better for this sort of use. If you think of your product as a rich client to a number of server services implemented on a web server you control (which you should also in write in Python, keeping your Rapid Application Development cycle advantage), you have almost all the advantages of a binary-only model when it comes to licensing because you can control use of the product from the central web site, without the disadvantage of having to use a compiled language.

This, of course, requires that you have some kind of functionality that you can centralize efficiently and that your users are always able to contact your main server. You could also consider selling (at an appropriately high price) a version of the central server that can be installed on-site, in an enterprise client’s network.

388

TEAM LinG

Writing Shareware and Commercial Programs

Exploits

An exploit is a program that takes advantage of a buffer overflow or similar vulnerability to obtain unauthorized access to a remote computer. The most famous exploits, and those that are covered the most in the media, are those used by worms such as Code Red to spread themselves.

One example of this technique is an upcoming product from Immunity based on their flagship product CANVAS. CANVAS is an information security product that automates phases of security penetration tests. A penetration test often involves using a number of known security exploits against machines in your organization to determine whether those exploits will work.

One particular problem when running penetration tests is that often the targeted machine will need to connect back to the tester’s machine, and firewalls in between the tester’s machine and the target will have filters preventing outbound connections to the tester, leaving the target still vulnerable but untested. A simple solution is to have the target make fake DNS queries (a protocol that will usually be passed through firewalls internal to an organization). However, this requires a specialized and customized DNS server that enables you to use DNS as a covert channel and can integrate into the full CANVAS framework. If you put that DNS server on a central server that also covers licensing, it’s a perfect fit for the web services licensing model because it’s a publicly available central server, enabling Immunity to track compliance with per-seat licenses. Of course, for an additional charge, the company can sell the modified DNS server and have customers integrate with them locally. This is the sort of opportunity you should look for when trying to maximize your programming advantage using Python. Although you could still do product-activation-style licensing using a binary component from a compiled language, it’s better in the long run for your business to evolve past the need for restrictive licensing on the client side, which is always difficult and can be hard to troubleshoot if it misbehaves.

Pricing Strategies

Realistic pricing for software has to take into account the actual size of your market. If you assume your product will be widely pirated, as is true with most programs that don’t have per-computer activation, then your pricing needs to rise accordingly or depend on external factors such as support and training, while being weighed against the value it offers to those clients who will be paying.

Per-company licensing is a common tactic used by software vendors looking to capitalize on an “all you can eat buffet” desire among their customers. Selling an Enterprise license, which allows for unlimited use within an organization, can be an easy way for you to compensate for the lack of control your program inherently has by not placing any technical restrictions on the software, instead relying on the clients to not compromise the valuable product for which they’ve paid.

Likewise, you can compensate for your technology’s inability to protect itself from being copied by using legal means. An NDA with appropriately strict penalties can scare companies straight faster than a shrink-wrap license can. However, to truly enforce an NDA, you need a way to prove that a company is leaking your information. This is where the technology comes to your aid again.

389

TEAM LinG

Chapter 18

Watermarking

Proving someone is leaking your software is harder than it sounds. Unless you have an inside source, you need to be able to take a leaked piece of software and say “This came from Company X.” This can be more difficult to prove than it is to say. It means you have to manage a database of all your users, distribute uniquely different software packages to each one, and have a mechanism for dealing with leaks (some way to enforce the revocation of the right to use the software).

One way to make each distributed package different is through watermarking. Watermarking is a generic term for any process that is not observable to the user but that enables you to track down the origin of any particular piece of software traded on the Internet at large. The simplest form used is to include the name of the user in the executable somewhere. This sort of simplistic measure is easily detected and bypassed by the user, however.

What you want is something provable and hard to detect. In mathematical terms, you would want a reversible function that you can apply to the input text (your program), one that is a one-to-one mapping on your user database, so you can know with 100 percent confidence to whom the software was given and therefore who is responsible for its being leaked in violation of its license.

Python is a whitespace-sensitive language, which somewhat restricts your ability to change the contents of a file while still having a functional program, something most watermarking programs would normally do on an executable. You can still modify many parts of the Python program you’re distributing, however, and you have the advantage that Python is very good at parsing programs written in Python, so many changes would be harmless. Still, the basic techniques for doing this don’t require advanced and customized parsing of any kind. The following example comes from the Immunity CANVAS distribution code and is normally installed as a CGI. This will do the right thing when it comes to delivering the file to the client via their browser, which is not as easy as it might originally seem. If you’re running under Apache, and you do a print statement, Apache will sometimes add a \n to the end of the file, which corrupts it. Use sys.stdout.write instead of print to avoid this.

Security is an important part of your distribution scheme. Note that the use of md5 hashes (nicely importable in Python) and the general security of the Python system allows for a higher level of confidence than most web frameworks have. You’ll rarely see the source code to a production C CGI being distributed publicly in a book!

You’ll see chunks of code that perform these functions, interspersed with comments about what the CGI is doing, as follows:

def normalizeDate(date):

if date.count(“/”) == 0: return date

dates = date.split(“/”)

#add the year if just a 05 -> 2005 if len(dates[2]) == 2:

dates[2] = “20” + dates[2]

newdate = “%4.4d%2.2d%2.2d” % (int(dates[2]), int(dates[0]), int(dates[1])) return newdate

390

TEAM LinG