Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced PHP Programming

.pdf
Скачиваний:
67
Добавлен:
14.04.2015
Размер:
7.82 Mб
Скачать

228 Chapter 9 External Performance Tunings

Pre-Fork, Event-Based, and Threaded Process Architectures

The three main architectures used for Web servers are pre-fork, event-based, and threaded models.

In a pre-fork model, a pool of processes is maintained to handle new requests. When a new request comes in, it is dispatched to one of the child processes for handling. A child process usually serves more than one request before exiting. Apache 1.3 follows this model.

In an event-based model, a single process serves requests in a single thread, utilizing nonblocking or asynchronous I/O to handle multiple requests very quickly. This architecture works very well for handling static files but not terribly well for handling dynamic requests (because you still need a separate process or thread to the dynamic part of each request). thttpd, a small, fast Web server written by Jef Poskanzer, utilizes this model.

In a threaded model, a single process uses a pool of threads to service requests. This is very similar to a prefork model, except that because it is threaded, some resources can be shared between threads. The Zeus Web server utilizes this model. Even though PHP itself is thread-safe, it is difficult to impossible to guarantee that third-party libraries used in extension code are also thread-safe. This means that even in a threaded Web server, it is often necessary to not use a threaded PHP, but to use a forked process execution via the fastcgi or cgi implementations.

Apache 2 uses a drop-in process architecture that allows it to be configured as a pre-fork, threaded, or hybrid architecture, depending on your needs.

In contrast to the amount of configuration inside Apache, the PHP setup is very similar to the way it was before.The only change to its configuration is to add the following to its httpd.conf file:

Listen localhost:80

This binds the PHP instance exclusively to the loopback address. Now if you want to access the Web server, you must contact it by going through the proxy server.

Benchmarking the effect of these changes is difficult. Because these changes reduce the overhead mainly associated with handling clients over high-latency links, it is difficult to measure the effects on a local or high-speed network. In a real-world setting, I have seen a reverse-proxy setup cut the number of Apache children necessary to support a site from 100 to 20.

Operating System Tuning for High Performance

There is a strong argument that if you do not want to perform local caching, then using a reverse proxy is overkill. A way to get a similar effect without running a separate server is to allow the operating system itself to buffer all the data. In the discussion of reverse proxies earlier in this chapter, you saw that a major component of the network wait time is the time spent blocking between data packets to the client.

The application is forced to send multiple packets because the operating system has a limit on how much information it can buffer to send over a TCP socket at one time. Fortunately, this is a setting that you can tune.

Language-Level Tunings

229

On FreeBSD, you can adjust the TCP buffers via the following:

#sysctl –w net.inet.tcp.sendspace=131072

#sysctl –w net.inet.tcp.recvspace=8192

On Linux, you do this:

#echo 131072> /proc/sys/net/core/wmem_max

When you make either of these changes, you set the outbound TCP buffer space to 128KB and the inbound buffer space to 8KB (because you receive small inbound requests and make large outbound responses).This assumes that the maximum page size you will be sending is 128KB. If your page sizes differ from that, you need to change the tunings accordingly. In addition, you might need to tune kern.ipc.nmbclusters to allocate sufficient memory for the new large buffers. (See your friendly neighborhood systems administrator for details.)

After adjusting the operating system limits, you need to instruct Apache to use the large buffers you have provided. For this you just add the following directive to your httpd.conf file:

SendBufferSize 131072

Finally, you can eliminate the network lag on connection close by installing the lingerd patch to Apache.When a network connection is finished, the sender sends the receiver a FIN packet to signify that the connection is complete.The sender must then wait for the receiver to acknowledge the receipt of this FIN packet before closing the socket to ensure that all data has in fact been transferred successfully. After the FIN packet is sent, Apache does not need to do anything with the socket except wait for the FIN-ACK packet and close the connection.The lingerd process improves the efficiency of this operation by handing the socket off to an exterior daemon (lingerd), which just sits around waiting for FIN-ACKs and closing sockets.

For high-volume Web servers, lingerd can provide significant performance benefits, especially when coupled with increased write buffer sizes. lingerd is incredibly simple to compile. It is a patch to Apache (which allows Apache to hand off file descriptors for closing) and a daemon that performs those closes. lingerd is in use by a number of major sites, including Sourceforge.com, Slashdot.org, and LiveJournal.com.

Proxy Caches

Even better than having a low-latency connection to a content server is not having to make the request at all. HTTP takes this into account.

HTTP caching exists at many levels:

nCaches are built into reverse proxies

nProxy caches exist at the end user’s ISP

nCaches are built in to the user’s Web browser

230 Chapter 9 External Performance Tunings

Figure 9.5 shows a typical reverse proxy cache setup.When a user makes a request to www.example.foo, the DNS lookup actually points the user to the proxy server. If the requested entry exists in the proxy’s cache and is not stale, the cached copy of the page is returned to the user, without the Web server ever being contacted at all; otherwise, the connection is proxied to the Web server as in the reverse proxy situation discussed earlier in this chapter.

Internet

client

client

client

High Latency

Internet Traffic

reverse proxy

Is content

yes

return

 

cached?

 

cache

 

 

page

no

 

 

PHP webserver

low latency connection

Figure 9.5 A request through a reverse proxy.

Many of the reverse proxy solutions, including Squid, mod_proxy, and mod_accel, support integrated caching. Using a cache that is integrated into the reverse proxy server is an easy way of extracting extra value from the proxy setup. Having a local cache guarantees that all cacheable content will be aggressively cached, reducing the workload on the back-end PHP servers.

Cache-Friendly PHP Applications

231

Cache-Friendly PHP Applications

To take advantage of caches, PHP applications must be made cache friendly. A cachefriendly application understands how the caching policies in browsers and proxies work and how cacheable its own data is.The application can then be set to send appropriate cache-related directives with browsers to achieve the desired results.

There are four HTTP headers that you need to be conscious of in making an application cache friendly:

nLast-Modified

nExpires

nPragma: no-cache

nCache-Control

The Last-Modified HTTP header is a keystone of the HTTP 1.0 cache negotiation ability. Last-Modified is the Universal Time Coordinated (UTC; formerly GMT) date of last modification of the page.When a cache attempts a revalidation, it sends the LastModified date as the value of its If-Modified-Since header field so that it can let the server know what copy of the content it should be revalidated against.

The Expires header field is the nonrevalidation component of HTTP 1.0 revalidation.The Expires value consists of a GMT date after which the contents of the requested documented should no longer be considered valid.

Many people also view Pragma: no-cache as a header that should be set to avoid objects being cached. Although there is nothing to be lost by setting this header, the HTTP specification does provide an explicit meaning for this header, so its usefulness is regulated by it being a de facto standard implemented in many HTTP 1.0 caches.

In the late 1990s, when many clients spoke only HTTP 1.0, the cache negotiation options for applications where rather limited. It used to be standard practice to add the following headers to all dynamic pages:

function http_1_0_nocache_headers()

{

$pretty_modtime = gmdate(D, d M Y H:i:s) . GMT; header(Last-Modified: $pretty_modtime); header(Expires: $pretty_modtime); header(Pragma: no-cache);

}

This effectively tells all intervening caches that the data is not to be cached and always should be refreshed.

When you look over the possibilities given by these headers, you see that there are some glaring deficiencies:

232 Chapter 9 External Performance Tunings

nSetting expiration time as an absolute timestamp requires that the client and server system clocks be synchronized.

nThe cache in a client’s browser is quite different than the cache at the client’s ISP. A browser cache could conceivably cache personalized data on a page, but a proxy cache shared by numerous users cannot.

These deficiencies were addressed in the HTTP 1.1 specification, which added the Cache-Control directive set to tackle these problems.The possible values for a CacheControl response header are set in RFC 2616 and are defined by the following syntax:

Cache-Control = Cache-Control:l#cache-response-directive

cache-response-directive =

public| private| no-cache | no-store

| no-transform

| must-revalidate | proxy-revalidate

| max-age=delta-seconds | s-maxage=delta-seconds

The Cache-Control directive specifies the cacheability of the document requested. According to RFC 2616, all caches and proxies must obey these directives, and the headers must be passed along through all proxies to the browser making the request.

To specify whether a request is cacheable, you can use the following directives:

npublic—The response can be cached by any cache.

nprivate—The response may be cached in a nonshared cache.This means that the request is to be cached only by the requestor’s browser and not by any intervening caches.

nno-cache—The response must not be cached by any level of caching.The nostore directive indicates that the information being transmitted is sensitive and must not be stored in nonvolatile storage. If an object is cacheable, the final directives allow specification of how long an object may be stored in cache.

nmust-revalidate—All caches must always revalidate requests for the page. During verification, the browser sends an If-Modified-Since header in the request. If the server validates that the page represents the most current copy of the page, it should return a 304 Not Modified response to the client. Otherwise, it should send back the requested page in full.

nproxy-revalidate—This directive is like must-revalidate, but with proxyrevalidate, only shared caches are required to revalidate their contents.

nmax-age—This is the time in seconds that an entry is considered to be cacheable

Cache-Friendly PHP Applications

233

without revalidation.

ns-maxage—This is the maximum time that an entry should be considered valid in a shared cache. Note that according to the HTTP 1.1 specification, if max-age or s-maxage is specified, they override any expirations set via an Expire header.

The following function handles setting pages that are always to be revalidated for freshness by any cache:

function validate_cache_headers($my_modtime)

{

$pretty_modtime = gmdate(D, d M Y H:i:s, $my_modtime) . GMT; if($_SERVER[IF_MODIFIED_SINCE] == $gmt_mtime) {

header(HTTP/1.1 304 Not Modified);

exit;

}

else {

header(Cache-Control: must-revalidate); header(Last-Modified: $pretty_modtime);

}

}

It takes as a parameter the last modification time of a page, and it then compares that time with the Is-Modified-Since header sent by the client browser. If the two times are identical, the cached copy is good, so a status code 304 is returned to the client, signifying that the cached copy can be used; otherwise, the Last-Modified header is set, along with a Cache-Control header that mandates revalidation.

To utilize this function, you need to know the last modification time for a page. For a static page (such as an image or a “plain” nondynamic HTML page), this is simply the modification time on the file. For a dynamically generated page (PHP or otherwise), the last modification time is the last time that any of the data used to generate the page was changed.

Consider a Web log application that displays on its main page all the recent entries:

$dbh = new DB_MySQL_Prod();

$result = $dbh->execute(SELECT max(timestamp) FROM weblog_entries);

if($results) {

list($ts) = $result->fetch_row(); validate_cache_headers($ts);

}

The last modification time for this page is the timestamp of the latest entry.

If you know that a page is going to be valid for a period of time and you’re not concerned about it occasionally being stale for a user, you can disable the must-revalidate header and set an explicit Expires value.The understanding that the data will be some-

234 Chapter 9 External Performance Tunings

what stale is important:When you tell a proxy cache that the content you served it is good for a certain period of time, you have lost the ability to update it for that client in that time window.This is okay for many applications.

Consider, for example, a news site such as CNN’s. Even with breaking news stories, having the splash page be up to one minute stale is not unreasonable.To achieve this, you can set headers in a number of ways.

If you want to allow a page to be cached by shared proxies for one minute, you could call a function like this:

function cache_novalidate($interval = 60)

{

$now = time();

$pretty_lmtime = gmdate(D, d M Y H:i:s, $now) . GMT; $pretty_extime = gmdate(D, d M Y H:i:s, $now + $interval) . GMT;

//Backwards Compatibility for HTTP/1.0 clients header(Last Modified: $pretty_lmtime); header(Expires: $pretty_extime);

//HTTP/1.1 support

header(Cache-Control: public,max-age=$interval);

}

If instead you have a page that has personalization on it (say, for example, the splash page contains local news as well), you can set a copy to be cached only by the browser:

function cache_browser($interval = 60)

{

$now = time();

$pretty_lmtime = gmdate(D, d M Y H:i:s, $now) . GMT; $pretty_extime = gmdate(D, d M Y H:i:s, $now + $interval) . GMT;

//Backwards Compatibility for HTTP/1.0 clients header(Last Modified: $pretty_lmtime); header(Expires: $pretty_extime);

//HTTP/1.1 support

header(Cache-Control: private,max-age=$interval,s-maxage=0);

}

Finally, if you want to try as hard as possible to keep a page from being cached anywhere, the best you can do is this:

function cache_none($interval = 60)

{

//Backwards Compatibility for HTTP/1.0 clients header(Expires: 0);

header(Pragma: no-cache);

//HTTP/1.1 support

header(Cache-Control: no-cache,no-store,max-age=0,s-maxage=0,must-revalidate);

}

Content Compression

235

The PHP session extension actually sets no-cache headers like these when session_start() is called. If you feel you know your session-based application better than the extension authors, you can simply reset the headers you want after the call to session_start().

The following are some caveats to remember in using external caches:

nPages that are requested via the POST method cannot be cached with this form of caching.

nThis form of caching does not mean that you will serve a page only once. It just means that you will serve it only once to a particular proxy during the cacheability time period.

nNot all proxy servers are RFC compliant.When in doubt, you should err on the side of caution and render your content uncacheable.

Content Compression

HTTP 1.0 introduced the concept of content encodings—allowing a client to indicate to a server that it is able to handle content passed to it in certain encrypted forms. Compressing content renders the content smaller.This has two effects:

nBandwidth usage is decreased because the overall volume of transferred data is lowered. In many companies, bandwidth is the number-one recurring technology cost.

nNetwork latency can be reduced because the smaller content can be fit into fewer network packets.

These benefits are offset by the CPU time necessary to perform the compression. In a real-world test of content compression (using the mod_gzip solution), I found that not only did I get a 30% reduction in the amount of bandwidth utilized, but I also got an overall performance benefit: approximately 10% more pages/second throughput than without content compression. Even if I had not gotten the overall performance increase, the cost savings of reducing bandwidth usage by 30% was amazing.

When a client browser makes a request, it sends headers that specify what type of browser it is and what features it supports. In these headers for the request, the browser sends notice of the content compression methods it accepts, like this:

Content-Encoding: gzip,defalte

There are a number of ways in which compression can be achieved. If PHP has been compiled with zlib support (the –enable-zlib option at compile time), the easiest way by far is to use the built-in gzip output handler.You can enable this feature by setting the php.ini parameter, like so:

zlib.output_compression On

236 Chapter 9 External Performance Tunings

When this option is set, the capabilities of the requesting browser are automatically determined through header inspection, and the content is compressed accordingly.

The single drawback to using PHP’s output compression is that it gets applied only to pages generated with PHP. If your server serves only PHP pages, this is not a problem. Otherwise, you should consider using a third-party Apache module (such as mod_deflate or mod_gzip) for content compression.

Further Reading

This chapter introduces a number of new technologies—many of which are too broad to cover in any real depth here.The following sections list resources for further investigation.

RFCs

It’s always nice to get your news from the horse’s mouth. Protocols used on the Internet are defined in Request for Comment (RFC) documents maintained by the Internet Engineering Task Force (IETF). RFC 2616 covers the header additions to HTTP 1.1 and is the authoritative source for the syntax and semantics of the various header directives.You can download RFCs from a number of places on the Web. I prefer the IETF RFC archive: www.ietf.org/rfc.html.

Compiler Caches

You can find more information about how compiler caches work in Chapter 21 and Chapter 24.

Nick Lindridge, author of the ionCube accelerator, has a nice white paper on the ionCube accelerator’s internals. It is available at www.php-accelerator.co.uk/ PHPA_Article.pdf.

APC source code is available in PEAR’s PECL repository for PHP extensions. The ionCube Accelerator binaries are available at www.ioncube.com.

The Zend Accelerator is available at www.zend.com.

Proxy Caches

Squid is available from www.squid-cache.org.The site also makes available many excellent resources regarding configuration and usage. A nice white paper on using Squid as an HTTP accelerator is available from ViSolve at http://squid.visolve.com/ white_papers/reverseproxy.htm. Some additional resources for improving Squid’s performance as a reverse proxy server are available at http://squid.sourceforge.net/ rproxy.

mod_backhand is available from www.backhand.org.

The usage of mod_proxy in this chapter is very basic.You can achieve extremely versatile request handling by exploiting the integration of mod_proxy with mod_rewrite.

Further Reading

237

See the Apache project Web site (http://www.apache.org) for additional details. A brief example of mod_rewrite/mod_proxy integration is shown in my presentation “Scalable Internet Architectures” from Apachecon 2002. Slides are available at http://www. omniti.com/~george/talks/LV736.ppt.

mod_accel is available at http://sysoev.ru/mod_accel. Unfortunately, most of the documentation is in Russian. An English how-to by Phillip Mak for installing both mod_accel and mod_deflate is available at http://www.aaanime.net/pmak/

apache/mod_accel.

Content Compression

mod_deflate is available for Apache version 1.3.x at http://sysoev.ru/ mod_deflate.This has nothing to do with the Apache 2.0 mod_deflate. Like the documentation for mod_accel, this project’s documentation is almost entirely in Russian.

mod_gzip was developed by Remote Communications, but it now has a new home, at Sourceforge: http://sourceforge.net/projects/mod-gzip.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]