Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ajax Patterns And Best Practices (2006)

.pdf
Скачиваний:
39
Добавлен:
17.08.2013
Размер:
15.94 Mб
Скачать

C H A P T E R 4

■ ■ ■

Cache Controller Pattern

Intent

The Cache Controller pattern provides the caller a mechanism to temporarily store resources in a consistent manner, resulting in an improved application experience for the caller.

Motivation

There are many forms of web applications, and one form is a data-mining application. There are different types of data-mining applications, but they all have one thing in common: they query a repository, and the repository responds with data. This means an application will retrieve data based on a query that in structure is identical over the multiple queries.

Figure 4-1 shows a data-mining application that has a series of maps as a database. Looking a bit closer at the MapQuest application, there are a number of links and adver-

tisements. What is of interest in the context of this pattern are the navigational and zooming controls. The navigational controls are used to pan the map left, right, up, and down. The zooming controls are used to zoom in to or out of the map. These controls are necessary, of course, because the user will want to focus in on various parts of the map.

What is more important about the navigational and zooming controls is that they are predefined operations used to retrieve values from the same repository. This is in stark contrast to the links surronding the controls, which will result in the execution of some query on an unrelated repository (unrelated, that is, to the map database). The predefined queries can be converted into standard operations such as zoom in, zoom out, pan left, pan right, pan up, and pan down.

79

80

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

Figure 4-1. Example data-mining application

The predefined queries also can be converted into look-ahead queries; for example, to pan left, you want to preload the map left of Denver. Preloading the map by using a background task will make the map application appear fluid. Figure 4-2 is an example application that uses preloading.

Like MapQuest, Maps.google.com is another mapping web application that provides the capability to pan and zoom. What makes Maps.google.com unique is that the map pieces that could be referenced as a result of one of the predefined operations are preloaded. If you experiment with the mapping application, you’ll see that it is fluid. The application stops becoming fluid if you pan or zoom too quickly and the preloading task is busy loading other map pieces.

The Maps.google.com application is using a cache to preload map pieces. A cache can also be used to remember old pieces of data so that if they are referenced multiple times, they are not loaded multiple times.

A nontechnical reason for using a cache is for legal reasons. When creating web applications, very often you will be integrating other data sources. Those other data sources reference very large databases (for example, Amazon.com). The data contained within those very large databases is not yours, and you cannot store the data locally in your database for future reference. Most end-user license agreements will specifically state that the data that is retrieved does not belong to you. Having a cache will increase the performance of your application without having to illegally store the data locally.

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

81

Figure 4-2. Example data-mining application that preloads map pieces

Applicability

The Cache Controller pattern in all cases is a request proxy that makes a decision as to whether information should be retrieved from the cache or a request should be made. This pattern is used in the following contexts:

Passive caching: Passive caching occurs when the request proxy manages the resources but does not preload any data. The purpose of creating a passive cache is to keep a list of data that has already been loaded and will not be unnecessarily reloaded. An example of passive caching is the referencing of configuration information. Configuration information does not change for the most part and is considered read-mostly. Additionally, configuration information does not have any other data to preload, as illustrated in the mapping examples. There is typically only one piece of configuration information, and it can be loaded as a single block.

82

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

Predictive caching: Predictive caching implements passive caching but has an additional action: when a request is made, related items will also be loaded. An example of predictive caching is the Google Maps mapping application. The client makes a request for a map piece. The predictive cache will use an algorithm to determine whether related map pieces are loaded. It is important that the algorithm relates to the possible operations, which in the mapping example would be zooming and panning operations.

Associated Patterns

The Cache Controller pattern is used with other patterns. It is not used on its own because the pattern does not do anything by itself. As previously stated, the Cache Controller pattern is a proxy implementation that sits between the caller making the request and the server processing the request.

This does not mean that the Cache Controller pattern can be used with all patterns. The Cache Controller pattern can be used only in those situations where HTTP validation has been implemented on the server side. As you will see in the “Implementation” section of this chapter, validation on the server side is not typically implemented for custom functionality. As is illustrated in this chapter, it is possible to add HTTP validation for all situations, but there are still situations when HTTP validation does not make sense. That is usually when the data is not under the management of the web application or when the REST Based Model-View-Controller pattern is used.

Architecture

The essence of the Cache Controller pattern is the Proxy pattern. The Cache Controller pattern is a proxy to Asynchronous and implements the interface exposed by Asynchronous. The implementation of the Proxy pattern for the Cache Controller pattern is the implementation of a caching strategy. The focus of this section will be the definition and explanation of that caching strategy.

There are two ways to implement caching: let the Internet infrastructure do as much as possible for you, or write code to help the Internet infrastructure do its work. As much as I find writing a caching algorithm interesting and fun, doing so would be a waste of time. Doing your own caching is hard because so many elements in the HTTP request chain are already caching data that you have a good chance of re-caching already cached data. By caching yet again, you are providing no added value.

HTML and HTTP Cache Directives

Letting the Internet infrastructure manage the caching is called using the HTTP Expiration model. There are two ways to control the caching by using the Internet infrastructure: adding HTML tags or adding HTTP identifiers.

When you want to use HTML tags to control the cache, the following HTML uses the necessary HTML tags:

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

83

<html>

<head>

<title>Hanging Page</title>

<meta http-equiv="Cache-Control" content="max-age=3600">

<meta http-equiv="Expires" content="Tue, 01 Jan 1980 1:00:00 GMT"> </head>

<body>

...

The HTML tag meta has two attributes, http-equiv and content, that are used to mimic HTTP identifiers. The problem with using HTML meta tags is that they are intended to be consumed by a web browser. It is not possible to add the meta tag to an XML data stream. Therefore, it is not possible to use HTML-based cache control tags when streaming data other than HTML.

The second way to control caching by using the Internet infrastructure is to generate a set of HTTP tags, as illustrated by the following HTTP request result:

HTTP/1.1 200 OK

Cache-Control: Public, max-age=3600

Expires: Wed, 10 Aug 2005 10:35:37 GMT

Content-Type: text/html;charset=ISO-8859-1

Content-Length: 39

Date: Wed, 10 Aug 2005 09:35:37 GMT

Server: Apache-Coyote/1.1

<html><body>Hello world</body></html>

The HTTP identifiers Cache-Control and Expires manage how the page is supposed to be cached. The Cache-Control identifier specifies a caching of the content for 3600 seconds, or one hour. The Expires identifier defines when the retrieved content is considered expired. Both identifiers make it possible for proxies or browsers to cache the HTTP-retrieved content by using the HTTP Expiration model.

When used in the context of a script, the HTTP identifiers can be programmatically generated by using the following ASP.NET code:

<%@ Page Language = "C#" %>

<%@ Import Namespace="System" %> <%

Response.Cache.SetExpires(DateTime.Now.AddMinutes( 60 ) ) ; Response.Cache.SetCacheability(HttpCacheability.Public) ; %>

<html>

<head>

<title>Cached Page</title> </head>

<body>

Hello world! </body>

</html>

84

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

Using .NET, the methods SetExpires and SetCacheability will add the Expires and Cache-Control identifiers. To achieve the same effect by using a Java servlet, you would use the following code:

import javax.servlet.http.*; import javax.servlet.*; import java.io.*;

import java.util.*;

public class GenerateHeader extends HttpServlet {

protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { resp.addHeader("Cache-Control", "Public, max-age=3600"); resp.addHeader("Expires", "Fri, 30 Oct 2006 14:19:41 GMT"); resp.setContentType("text/html");

PrintWriter out = resp.getWriter(); out.println("<html><body>Hello world</body></html>");

}

}

HTTP Expiration Caching Is a Bad Idea (Generally)

It is generally not a good idea to use the HTTP Expiration model, but to use the second way of managing caching by writing code to help the Internet infrastructure do its work. The second way is called the HTTP Validation model.

To understand why the HTTP Expiration model is problematic, consider the following scenario. You are running a website hosting news. So that there is less traffic on the website, you enable HTTP caching and assign an expiry of 30 minutes. (The expiry time is an arbitrary value used for illustrative purposes.) This means that when a browser downloads some content, the next version of the content will be available in 30 minutes. Indicating a wait period of 30 minutes is a bad idea because in that 30 minutes news can dramatically change. A client who has downloaded some content is then restricted to retrieving news in 30-minute cycles. Of course the client could ignore or empty the cache, resulting in downloads of the latest information. If the client always empties the cache, the client will always get the latest news, but at a cost of downloading content that may not have changed. The resource cost should not surprise anyone because always getting the latest content means using no caching whatsoever. Scripts such as Java servlets/JSP or ASP.NET pages very often use this strategy, and the administrator managing the website wonders why there are performance problems.

A Better Approach: Using HTTP Validation

The better approach is to use the HTTP Validation model. This model sends each response with a ticket that references the uniqueness of the data. If the client wants to download the content again, the client sends the server a ticket from the last download. The server compares the sent ticket with the ticket that it has; if the server notices the tickets are identical, it sends an HTTP 304 to indicate no changes have occurred. At that point, the client can retrieve the old content from the cache and present it to the user as the latest and greatest. The HTTP Validation

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

85

model still requires an HTTP request, but does not include the cost of generating and sending the content again.

In terms of an HTTP conversation, the HTTP Validation model is implemented as follows. This example illustrates a request from a client and the response from the server.

Request 1:

GET /ajax/chap04/cachedpage.html HTTP/1.1

Accept: */*

Accept-Language: en-ca

Accept-Encoding: gzip, deflate

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;

Windows NT 5.1; SV1; .NET CLR 2.0.50215)

Host: 127.0.0.1:8081

Connection: Keep-Alive

Response 1:

HTTP/1.1 200 OK

ETag: W/"45-1123668584000"

Last-Modified: Wed, 10 Aug 2005 10:09:44 GMT

Content-Type: text/html

Content-Length: 45

Date: Wed, 10 Aug 2005 10:11:54 GMT

Server: Apache-Coyote/1.1

<html>

<body>

Cached content </body> </html>

The client makes a request for the document /ajax/chap04/cachedpage.html. The server responds with the content, but there is no Cache-Control nor Expires identifier. This seems to indicate that the returned content is not cached, but that is not true. The server has indicated that it is using the HTTP Validation model, and not the HTTP Expiration model. The page that is returned has become part of a cache identified by the unique ETag identifier. The ETag identifier, called an entity tag, could be compared to a unique hash code for an HTML page. The letter W that is prefixed to the entity tag identifier means that the page is a weak reference and the HTTP server may not immediately reflect updates to the page on the server side.

The next step is to refresh the browser and ask for the same page again. The HTTP conversation is illustrated as follows.

Request 2:

GET /ajax/chap04/cachedpage.html HTTP/1.1

Accept: */*

Accept-Language: en-ca

Accept-Encoding: gzip, deflate

If-Modified-Since: Wed, 10 Aug 2005 10:09:44 GMT

If-None-Match: W/"45-1123668584000"

86

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;

Windows NT 5.1; SV1; .NET CLR 2.0.50215)

Host: 192.168.1.100:8081

Connection: Keep-Alive

Response 2:

HTTP/1.1 304 Not Modified

Date: Wed, 10 Aug 2005 10:11:58 GMT

Server: Apache-Coyote/1.1

When the client makes the second request, the additional identifiers If-Modified-Since and If-None-Match are sent in the request. Notice how the identifier If-None-Match references the identifier of the previously sent ETag value. The server queries the URL and generates an entity tag. If the entity tag is identical to the value being sent, the server returns an HTTP 304 code to indicate that the content has not changed.

When using entity tags, the client can send an If-Match or an If-None-Match. If the client sends an If-Match, and the data on the server is out-of-date, the server returns a cache miss error, and not the new data. If the client sends an If-None-Match identifier when the server data is unchanged, the server sends an HTTP 304 return code. If the data is out-of-date, new data is sent.

The advantage of using the HTTP Validation model of caching is that you are always guaranteed to get the latest version at the time of the request. The clients can make the request every couple of seconds, hours, weeks, or whatever period they choose. It is up to the client to decide when to get a fresh copy of the data. Granted, there is still some HTTP traffic due to the requests, but it has been reduced to a minimum.

Having said all that, there are situations when using the HTTP Expiration model does make sense—for example, when the HTML content is static and changes rarely. For the scope of this book and this pattern, it does not make sense to use the HTTP Expiration model because Ajax applications are inherently using data that does change.

Implementing HTTP validation is simple because the most popular web browsers and HTTP servers already implement it. In this chapter, I will discuss the details of implementing HTTP validation because there are some things the web browser and HTTP server do not do. However, building a more sophisticated infrastructure that supposedly enhances HTTP validation is not recommended because that would be defeating the facilities of HTTP 1.1.

Using the HTTP 1.1 infrastructure means that the server you are communicating with must have implemented the HTTP 1.1 protocol properly. If you are using Microsoft Internet Information Server, Apache Tomcat, or Jetty, you will have no problems. If you are using anything else, check that the server fully understands the HTTP 1.1 protocol. Otherwise, you will have problems with excessive network communications. As an example recommendation, it you plan on using Mono, then use mod_mono with Apache, and not just XSP. Although XSP (1.0.9) is a promising web server, it is not quite ready for prime time, at least at the time of this writing.

Some Findings Regarding Server-Side Caching

There is a fly in the soup of HTTP validation: when implementing the server side of a web application, there is an inconsistency. Any file that the HTTP server manages directly has entity tags. But for content managed by an external application such as Java Servlet, ASP.NET, or script,

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

87

there are no entity tags nor HTTP cache control directives. Consider the following request and response, which is an HTTP conversation of a JSP page.

Request:

GET /ajax/chap04/index.jsp HTTP/1.1 Host: 127.0.0.1:8081

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Accept: text/xml,application/xml,application/xhtml+xml, text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300

Connection: keep-alive

Response:

HTTP/1.1 200 OK

Set-Cookie: JSESSIONID=1B51C170A3F24A376BF2C3B98CF1C2C9; Path=/ajax

Content-Type: text/html;charset=ISO-8859-1

Content-Length: 333

Date: Thu, 11 Aug 2005 12:25:41 GMT

Server: Apache-Coyote/1.1

Additionally, for illustration purposes, consider the following HTTP conversation that retrieves an XML data set from the Amazon.com catalog.

Request:

GET /onca/xml?Service=AWSECommerceService&SubscriptionId=aaaaaaaa&Operation= ItemSearch&Keywords=Stephen+King&SearchIndex=Books HTTP/1.1

User-Agent: Wget/1.9.1

Host: webservices.amazon.com:8100 Accept: */*

Connection: Keep-Alive

Response:

HTTP/1.1 200 OK

Date: Thu, 11 Aug 2005 15:26:55 GMT

Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) mod_fastcgi/2.2.12 x-amz-id-1: 1VQ2V7MESPAC6FNGFGDR

x-amz-id-2: lpxEwchCrLJfO3qopULlUMYzbcVx1QmX Connection: close

Content-Type: text/xml; charset=UTF-8

In both responses, there was neither an ETag nor HTTP cache control directives. This means that if the same HTTP request is repeatedly made, there will be multiple HTTP identical requests with multiple identically generated response sets. As web application developers, we are trained to write server-side applications that generate content dynamically, and that has a ramification:

88

C H A P T E R 4 C A C H E C O N T R O L L E R P A T T E R N

Content can never, ever, ever be cached. This is ironic because for many of our web applications the supposed dynamic data is in fact static data, or at least mostly static data, that is converted from one form (for example, a database) into another form (for example, HTML).

It must be questioned whether the HTTP server is taking the right approach by not doing anything. The HTTP server cannot validate the content and therefore cannot know when the content has changed or not changed. With respect to the server-side application framework (for example, JSP), the assumption is completely correct. What is incorrect is that a script does not do anything to implement HTTP validation. When a script generates content, the script has an understanding of the underlying data structures and hence can determine whether the data has changed. Therefore, the server-side application can implement HTTP validation.

There are two ways to implement HTTP validation: dynamic and static validation.

Defining Static HTTP Validation

In static HTTP validation, the HTTP server does the difficult work of calculating the entity tag. An HTTP server, when it encounters a file that is not processed by some framework (for example,

.html or .png), will read the file and calculate a number that uniquely identifies the content of the file. Suppose a server-side framework were to generate a static form of the content that is generated. If the server-side framework were JSP, a Java filter could convert the generated JSP content into a static HTML file that is managed by the HTTP server and retrieved by the client. This requires that the server-side application know the difference between posting and retrieving data, as there is an updated state and saved state. In technical implementation terms, it means a state that previously existed only in a database form must also be saved in the form of a file or another persistent storage medium that the HTTP server manages. When the state is modified, the server application is responsible for modifying the database and file at the same time.

When the HTTP server manages the entity tag calculations, each resource has two separate representations. The retrieving representation is static and is managed by the HTTP server. The posting representation is dynamic and is managed by the server application framework. Technically speaking, from a browser perspective HTTP GET results in a file being retrieved, and HTTP POST or PUT results in data being posted to a JSP or ASP.NET file.

From a URL perspective, a static HTTP validation application would be similar to Figure 4-3. An individual book is retrieved by using its ISBN number, which is unique for every book.

When retrieving a book, the static URL /ajax/books/[ISBN].xml is used. The URL maps to a file managed by the HTTP server. Because the file is managed by the HTTP server, when the client attempts to retrieve the document, the HTTP server will send an ETag identifier based on the file.

To update the file, the dynamic URL /ajax/servlet/LibrarianServlet is used. It is impossible to update the data by using the static URL because the static URL is a file that when posted to results in nothing being updated. That a static file does nothing is fairly logical and is the reason why server application frameworks were created in the first place. The defined URL will be processed by a Java servlet, but could just as easily have resulted in the activation of an ASP.NET page or some other web application framework. To update the content, the URL uses an HTTP POST or an HTTP PUT, but in the case of the example an HTML form was used to update the content and hence HTTP POST is required.