Laboratory For Communications Engineering, Engineering Department
Trumpington Street, Cambridge, CB2 1PZ, UK
2 Computer Laboratory, William Gates Building, JJ Thompson Avenue Cambridge, CB3 0FD, UK
Copyright is held by the author/owner(s).
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
Application-level web security refers to vulnerabilities inherent in the code of a web-application itself (irrespective of the technologies in which it is implemented or the security of the web-server/back-end database on which it is built). In the last few months application-level vulnerabilities have been exploited with serious consequences: hackers have tricked e-commerce sites into shipping g oods for no charge, usernames and passwords have been harvested and confidential information (such as addresses and credit-card numbers) has been leaked.
In this paper we investigate new tools and techniques which address the problem of application-level web security. We (i) describe a scalable structuring mechanism facilitating the abstraction of security policies from large web-applications developed in hetero genous multi-platform environments; (ii) present a tool which assists programmers develop secu re applications which are resilient to a wide range of common attacks; and (iii) report results and experience arising from our implementation of these techniques.
On the 25th January, 2001, an article appeared in a respected British newspaper entitled Security Hole Threatens British E-tailers . The article described how a journalist hacked a number of e-commerce sites, successfully buying goods for less than their intended prices. The attacks resulted in a number of purchases being made for 10 pence each including an internet domain name (ivehadyou.org.uk), a ``Wales Direct'' calendar and tickets for a Jimmy Nail pop concert. The author of the article rightly observes that the process ``requires no particular technical skill''; the attack merely involves saving the HTML form to disk, modifying the price (stored in a hidden form field) using a text editor and reloading the HTML form back into the browser. A recent article published in ZD-Net  suggests that between 30% and 40% of e-commerce sites throughout the world are vulnerable to this simple attack. Internet Security Systems (ISS) identified eleven widely deployed commercial shopping-cart applications which suffer from the vulnerability .
The price-changing attack is a consequence of an application-level security hole. We use the term application-level web security to refer to vulnerabilities inherent in the code of a web-application itself (irrespective of the technology in which it is implemented or the security of the web-server/back-end database on which it is built). Most application-level security holes arise because web applications mistakenly trust data returned from a client. For example, in the price-changing attack, the web application makes the invalid assumption that a user cannot modify the price because it is stored in a hidden field.
Application-level security vulnerabilities are well known and many articles have been published advising developers on how they can be avoided [22,23,28]. Fixing a single occurrence of a vulnerability is usually easy. However, the massive number of interactions between different components of a dynamic website makes application-level security challenging in general. Despite numerous efforts to tighten application-level security through code-review and other software-engineering practices  the fact remains that a large number of professionally designed websites still suffer from serious application-level security holes. This evidence suggests that higher-level tools and techniques are required to address the problem.
In this paper we present a structuring technique which helps designers abstract security policies from large web applications. Our system consists of a specialised Security-Policy Description Language (SPDL) which is used to program an application-level firewall (referred to as a security gateway). Security policies are written in SPDL and compiled for execution on the security gateway. The security gateway dynamically analyses and transforms HTTP requests/responses to enforce the specified policy.
The remainder of the paper is structured as follows: Section 2 surveys a number of application-level attacks and discusses some of the reasons why application-level vulnerabilities are so prevalent in practice. In Section 3 we describe the technical details of our system for abstracting application-level web security. Our methodology is illustrated with an extended example in Section 4 and we discuss how the ideas in this paper may be generalised in Section 5. We have implemented the techniques discussed in this paper. The performance of our implementation is evaluated in Section 6. Related work is discussed in Section 7; finally, Section 8 concludes.
We start by briefly categorising and surveying a number of common application-level attacks. We make no claims regarding the completeness of this survey; the vulnerabilities highlighted here are a selection of those which we feel are particularly important.
The stateless nature of the HTTP protocol leaves designers with the task of managing application state across multiple requests. It is often easier to thread state through a series of request/responses using hidden form fields than it is to store data in a back-end database. Unfortunately using hidden form fields in this way enables the client to modify internal application state, leading to vulnerabilities such as the price changing attack described in the Introduction. It is interesting to note that a respected textbook on HTML  recommends this dangerous practice without any mention of security issues.
Form modification is often used in conjunction with other attacks. For example, changing MaxLength constraints on the client may expose buffer overruns and SQL errors on the server side. Information gleaned from such failures provides insights into the internal structure of the site possibly highlighting areas where it is particularly vulnerable.
Although writing server-side code to handle form input securely may not be cerebrally taxing, it is a tedious, time-consuming and error-prone task which is rarely undertaken correctly (if at all) in practice.
Web applications commonly use data read from a client to construct SQL queries. Unfortunately constructing the query naïvely leads to a vulnerability where the user can execute arbitrary SQL against the back-end database. The attack is best illustrated with a simple example:
Consider an Employee Directory Website (written in the popular scripting language PHP ) which prompts a user to enter the surname of an employee to search for by means of a form-box called searchName. On the server-side this search string (stored in the variable $searchName) is used to build an SQL query. This may involve code such as:
$query = "SELECT forenname,tel,fax,email FROM personal WHERE surname=`$searchName'; ";
However, if the user enters the following text into the searchName form box:
'; SELECT password,tel,fax,email FROM personal WHERE surname=`Sharp
then the value of variable, $query will become:
SELECT forenames,tel,fax,email FROM personal WHERE surname=`'; SELECT password,tel,fax,email FROM personal WHERE surname=`Sharp';
When executed on some SQL databases, this will result in Sharp's password being returned instead of his forename. (Even if only a hash of the password is leaked, a forward-search attack against a standard dictionary stands a reasonable chance of recovering the actual password.)
It is well known that XSS vulnerabilities can be fixed by encoding HTML meta-characters explicitly using HTML's #&<n> syntax, where <n> is the numerical representation of the encoded character. However, the flexibility of HTML makes this a more complicated task than many people realise . Furthermore, for large applications, it is a laborious and error-prone task to ensure that all input from the user has been appropriately HTML encoded.
In this section we discuss a number of factors which contribute to the prevalence of application-level security vulnerabilities. We believe that each of the problems listed below points to the same solution: the security policy should be applied at a higher-level, removing security-related responsibilities from coders whenever possible.
A major cause of application-level security vulnerabilities is a general lack of language-level support in popular untyped scripting languages. For example, consider the languages PHP  and VB-Script . When using these languages it is the job of the programmer to manually verify that all user input is appropriately HTML-encoded. Inadvertently omitting a call to the HTML-encoding function results in a vulnerability being introduced. For large applications written in such languages it is inevitable that a few such vulnerabilities will creep in. (Note that some technologies provide greater language-level support in this respect: when using typed languages, such as Java, the type-system can be employed to statically verify that all user input has been passed through an HTML-encoding function; Perl's taint mode offers similar guarantees but through run-time checks rather than compile-time analysis).
If web applications were written in a single programming language by a small number of developers then one could separate the security policy from the main body of code by abstracting security-related library functions behind a clean API. However, in reality large web applications often consist of a large number of interacting components written in different programming languages by separate teams of developers. To complicate the situation further, some of these components may be bought in from third-party developers (possibly in binary form). In such an environment it is difficult to abstract common code-blocks into libraries. The inevitable consequence is that security-critical code is scattered throughout the application in an unstructured way. This lack of structure makes fixing vulnerabilities difficult: the same security hole may have to be fixed several times throughout the application.
Another major issue, albeit a non-technical one, is a lack of concern for security in the web-development community. Although we realise that this is a generalisation, evidence suggests that factors such as time-to-market, graphic design and usability are generally considered higher priority than application-level security. We recently talked with some web-developers working for a large telecommunications company; they were surprised to hear of the attacks outlined in Section 2 and had taken no steps to protect against them.
In this paper we present tools and techniques which protect websites from application-level attacks. Whilst we recognise that our proposed methodology is not a panacea, we claim that it does help to protect against a wide-range of common vulnerabilities.
Our system consists of a number of components:
Figure 1 shows a diagrammatic view of the components of our system and the interactions between them. Note that the security-gateway does not have to run on a dedicated machine: it could be executed as a separate process on the existing web-server or, to achieve better performance, integrated into the web-server directly.
A designer codes a set of validation constraints and transformation rules in SPDL. Validation constraints place restrictions on data in cookies, URL parameters and forms. For example, typical constraints include ``the value of this cookie must be an integer between 1 and 3'' and ``the value of this (hidden) form field must never be modified''. The transformation rules of an SPDL specification allow a programmer to specify various transformations on user-input. The kind of transformations which may be specified are ``pass data from all fields on form f through an HTML-encoding function'' or ``escape all single and double quotes in text submitted via this URL parameter''. A detailed description of SPDL is given in Section 3.2.
The policy compiler translates SPDL into code which enforces validation rules and applies the specified transformations. The generated code is dynamically loaded into the security gateway where it is executed in order to enforce the specified policy. The security gateway acts as an application-level firewall; its job is to intercept, analyse and transform whole HTTP messages (see Section 3.4). As well as checking HTTP requests, the security gateway also rewrites the HTML in HTTP responses, annotating it with Message Authentication Codes (MACs)  to protect state which may have been malliciously modified by clients (see Section 3.4.2).
At the top level an SPDL specification is an XML document. The DTD corresponding to SPDL is shown in Figure 2. A policy element contains a series of URL and cookie elements.
<!ELEMENT policy (URL*, cookie*)> <!ELEMENT URL (parameter*)> <!ATTLIST URL prefix CDATA #REQUIRED> <!ELEMENT parameter (validation*, transformation*)> <!ATTLIST parameter method (GET | POST | GETandPOST) "GETandPOST"> <!ATTLIST parameter name CDATA #REQUIRED> <!ATTLIST parameter maxlength CDATA #REQUIRED> <!ATTLIST parameter minlength CDATA "0"> <!ATTLIST parameter required (Y | N) "N"> <!ATTLIST parameter MAC (Y | N) "Y"> <!ATTLIST parameter type (int | float | bool | string) #REQUIRED> <!ELEMENT cookie (validation*, transformation*)> <!ATTLIST cookie name CDATA #REQUIRED> <!ATTLIST cookie maxlength CDATA #REQUIRED> <!ATTLIST cookie minlength CDATA "0"> <!ATTLIST cookie MAC (Y | N) "Y"> <!ATTLIST cookie type (int | float | bool | string) #REQUIRED> <!ELEMENT validation (#CDATA)> <!ELEMENT transformation (#CDATA)> <!ATTLIST transformation htmlencode (Y | N) "Y">
For each URL element a number of parameters are declared. The attributes of a parameter element with name=p place constraints on data passed via p:
The method attribute determines whether the specified constraints apply to p passed as a GET-parameter (i.e. a URL argument) or a POST-parameter (i.e. returned from a form). Setting method to GETandPOST means that the constraints within the parameter element are applicable to both GET and POST parameters with name=p (The GETandPOST option is particularly useful if parts of a web-application are written in a language which does not force a distinction between GET and POST parameters with the same name--e.g. PHP.)
For example, consider the following security policy description:
<policy> <URL prefix="http://example"> <parameter name="p1" maxlength="4" type="int" required="Y" MAC="N"> </parameter> <parameter name="p2" method="POST" maxlength="3" type="string"> </parameter> </URL> </policy>
This example specifies constraints on parameters passed to URLs with prefix ``http://example''.
The first parameter element defines constraints to be applied to a parameter named p1 (either GET or POST); the second parameter element defines constraints to be applied to a POST parameter named p2.
We hope that the attributes of parameter elements cover the majority of validation constraints that designers require. However, in some circumstances a greater degree of control is required: this is provided by the validation element. The validation element allows complex constraints to be encoded in a general purpose validation language. The content of the validation element is a validation expression written in a simple, call-by-value, applicative language which is essentially a simply-typed subset of Standard ML . (Note that the precise details of the language are not the main focus of this paper. In principle any language could be used to express validation constraints. For expository purposes, we choose to make the language as simple as possible.)
The abstract syntax of the validation language is shown in Figure 3. A well-formed validation expression has type boolean. If the validation expression of parameter, p , evaluates to true then this signifies that p contains valid data; conversely evaluating to false highlights a validation failure. Badly typed validation programs are rejected by a compile-time type-checking phase (see Section 3.3). Within validation expressions, the value of the field specified in the enclosing parameter element is referred to as this. Values of other (declared) GET and POST parameters can be referenced as getparam.name and postparam.name respectively. In this way validation rules can be dependent on the values of multiple parameters.
A number of primitive-defined functions and binary operators are provided. Although we do not list them all here, those of particular importance are outlined below:
Transformation rules are much simpler than validation expressions and are delimited by the <transformation> tag. The contents of a transformation element nested within a parameter element, p , specifies a pipeline of transformations to be applied to data received via p . For example, if we always wanted to apply transformation t1 followed by t2 to parameter p passed via a given URL then our SPDL specification would contain:
<URL prefix="..."> <parameter name="p" ...> <transformation> t1 | t2 </transformation> </parameter> </URL>
Transformations are selected from a pre-defined library. In our current implementation we have defined the following transformations:
Facility is provided for the user to define other transformations and include them in the library.
We consider the HTML-encoding transformation to be of particular importance since inadvertently forgetting to HTML-encode user-input leads to Cross-Site Scripting vulnerabilities (see Section 2). For this reason we adopt the convention that all parameters are HTML-encoded unless explicitly specified otherwise in the security policy. To turn off HTML-encoding one must set the htmlencode attribute of the transformation element to N. For example one may write:
... <transformation htmlencode="N"> PartialHTMLEncode | EscapeSingleQuotes </transformation> ...
Recall from Figure 2 that, at the top-level, an SPDL description consists of a series of URL and cookie elements. We have already discussed URL elements in detail; in a similar fashion, cookie elements allow designers to place validation constraints on cookies returned from clients' machines. In this presentation we make the simplifying assumption that cookies are global across the whole site (i.e. the path attributes of all Set-Cookie headers in HTTP responses are set to ``/''). Under these circumstances the client sends the values of all the application's cookies with each HTTP-request. Since all client-side state is sent to the server in each request we can generate MACs securely without requiring server-side state in the security gateway (see Section 3.4.2).
Compilation is performed in two passes. In the first pass the declared parameters and their types are enumerated; in the second pass the contents of the validation and transformation elements are compiled. Using a two-pass architecture allows the use of forward parameter references. For example, consider a URL element, u , which contains declarations of parameters p1 and p2 , where p1 is declared before p2 . It is perfectly acceptable for the validation code of parameter p1 to refer to p2 (and vice-versa).
Validation expressions are type-checked at compile-time, helping to eliminate errors from SPDL validation code. In the current incarnation of the system, validation expressions are simply-typed (that is, we do not allow parametric polymorphism). However, should experience show this to be too inflexible, there is no reason why more sophisticated type-systems (e.g. ML style polymorphism ) could not be employed in future versions.
Figure 4 shows the algorithm executed by the Security Gateway on receipt of an HTTP request. First, the URL is extracted from the HTTP header. This is used to select the appropriate validation rules and transformations to apply. If the URL does not match any of those specified in the security policy then the request is not propagated to the web-server and an error page is returned to the user. By forbidding all URLs that do not match those explicitly in our database we prevent a cracker using obscure, non-standard URL encoding techniques to circumvent the security gateway (thus avoiding attacks of the kind recently used on Cisco's Intrusion Detection System ). Rejecting unspecified URLs also provides an engineering benefit: since each URL requires a corresponding SPDL definition engineers are forced to keep the security policy in sync with the application.
Having identified a valid URL, the security gateway proceeds to check the names of all parameters and cookies passed in the HTTP request. Errors are generated if (i) any of the parameters present are not declared in the SPDL policy; (ii) any of the required parameters are missing; or (iii) the cookies present do not precisely match those specified in the SPDL specification. Once we are sure that the HTTP message contains a valid combination of cookies and GET/POST parameters, type and length constraints are checked. If any violations occur at this stage then a descriptive error message is returned to the client. The security gateway then checks that the message authentication code is valid. Section 3.4.2 describes this process in detail.
Next the transformations specified in the security policy are applied. Transformations are total functions on strings--well written transformation code should not generate exceptions. However, if a badly written transformation function does generate a run-time exception then the process is aborted and an error message is returned to the client. Finally all validation expressions are evaluated. If all of the validation expressions evaluate to true then the HTTP request is forwarded to the web-server and the page is fetched.
We have already seen that an SPDL specification can declare that certain URL parameters must only contain data accompanied by a Message Authentication Code (MAC)  generated by the security gateway. As data is sent to the client, the security gateway annotates it with MACs; as data is returned from clients the MACs are checked. In this way we prevent users from modifying data which should not be changed on the client-side (e.g. security-critical hidden form-fields).
Consider an ordered list of values, l. We write, mac(l) to denote the message authentication code corresponding to l. In our current implementation the value of mac(l) is calculated as the MD5-hash  of a string containing the values of l concatenated together along with a time-stamp and a secret. The secret is a value which is not known by the client; since clients do not know the secret they cannot construct their own MACs.
Before describing the algorithm used to annotate the generated HTML with MACs we make a few auxiliary definitions. Consider a list of pairs, l = [(k1, v1), ..., (kn, vn)] . We define sort(l) to be l sorted by k -values and vals(l) to be the list [v1, ..., vn] . Function sortVals is the composition of sort and vals (i.e. sortVals(l) = vals sort (l) . Appending of lists is performed by the binary infix operator ` @ '. Now consider an HTTP request, req , which triggers response res . The algorithm for annotating the body of res with MACs is as follows:
For example consider the URL:
In the case where both p1 and p2 require a MAC then this URL will be re-written, taking the form:
http://example/a.asp?p1=4&p2=5 &mac=3a53fe1d995a23 &time=13eaf49b
where the parameter mac stores the message authentication code corresponding to p1=4 , p2=5 with the time-stamp stored in parameter time.
When data is received from a client via GET/POST parameters then the values of those parameters which have their MAC attribute set to ``Y'' are fed back into the MAC generation algorithm described above. Note that the original time-stamp (returned from the client as a GET parameter) is also required to recompute the MAC. We compare the recalculated MAC with the MAC returned from the client in order to determine whether any parameters were tampered with.
When designing the MAC algorithm one of our major concerns was to avoid replay attacks  where clients replay messages already annotated with MACs in unexpected contexts. We take two steps to avoid such attacks:
Despite these preventative measures, the responsibility for ensuring that replay attacks are not damaging ultimately rests with the security policy designer. For example, in the case study of Section 4 a MAC is generated for both the productID and Price fields. Although users can replay such messages this results in multiple purchases of the same product for the correct price. The key is that the MAC prevents the Price and productID being modified independently.
As well as applying the validation and transformation rules of SPDL specifications, our security gateway performs a number of other tasks. Two of these are described in this section.
Select parameters (delimited using the <select> tag in HTML forms) invite users to choose options from a pre-specified list. Although web designers often make the assumption that clients will only select values present in the list, a simple form-modification attack allows clients to submit arbitrary data in select parameters.
The security gateway protects against such an attack, preventing clients from submitting values for select parameters that were not present in the original HTML form. Our implementation involves the use of a control field which encodes dynamically generated lists of valid values for select parameters. The control-field is a hidden form-field generated automatically by the security-gateway and inserted into forms which contain <select>s. When form parameters are returned to the server, the value of the control field is decoded and used to validate values of select parameters. To prevent the control field being maliciously modified by clients, its value is included in the calculation of the form's message authentication code.
Web applications often consist of a number of files containing embedded code (e.g. PHP or VBscript) which is executed on the server-side in order to generate dynamic responses to client requests. A number of attacks on web-servers (or indeed badly configured web-servers) can result in this embedded code being transmitted to the client in source form. This can be potentially devastating since it gives crackers detailed information about the inner-workings of the application. For example, in some (badly written) applications the code contains plaintext passwords used to authenticate against back-end database servers.
We have demonstrated that our security gateway can protect against such attacks by searching for sequences of characters which delimit embedded code in HTTP responses (e.g. <%, %> for ASP or <?php, ?> for PHP). Detection of such delimiters implies (with reasonable probability) that server-side code is about to leaked. Hence, if any delimiters are found the security gateway filters the HTTP response and returns a suitable error message to the client informing them that the page they requested is unavailable.
To illustrate our methodology we consider using our system to secure a simple e-commerce system. Consider the following scenario:
As a final step in a purchasing transaction, users are sent an HTML form requesting their surname, credit-card number and its expiry date. The price and product-ID are stored in hidden form fields on the form. For example, when purchasing a product with productID=144264 , the form sent to the client is as follows:
<form method="POST" action="http://www.example/buy.asp"> <input type="hidden" name="price" value="423.54"> <input type="hidden" name="productID" value="144264"> <input type="text" name="surname"> <input type="text" name="CCnumber"> <input type="text" name="expires"> </form>
A single cookie, sessionKey, is used to uniquely identify clients' sessions. Once purchases have been made, an order record is entered into the company's back-end database which can be subsequently viewed on their local intranet.
For the purposes of this example let us assume that the system is vulnerable in the following ways:
The SPDL specification corresponding to the form's action URL is presented in Figure 5. Each of the parameters shown in the form above are declared and a number of validation and transformation rules specified. Most of the SPDL specification is self-explanatory although a few points are worth noting. The validation element for the price field simply states that negative prices are not allowed; the more complicated validation expression for the CCnumber field is an implementation of the Luhn-formula commonly used as a simple validation check for credit-card numbers; the validation expression for the expires field ensures that it is of the form mm/yy and also checks that the month is in the range 1-12.
<policy> <url prefix="http://www.example/buy.asp"> <parameter name="price" method="POST" maxlength="10" minlength="1" required="Y" type="float" > <validation> this isGreaterThan 0.0 </validation> </parameter> <parameter name="productID" method="POST" maxlength="10" minlength="1" required="Y" type="int" /> <parameter name="surname" method="POST" maxlength="30" minlength="2" required="Y" MAC="N" type="string"> <transformation> EscapeSingleQuotes | EscapeDoubleQuotes </transformation> </parameter> <parameter name="CCnumber" method="POST" maxlength="16" minlength="16" MAC="N" required="Y" type="int"> <validation> let fun first(s:string):string = String.mid(s,1,1) fun rest(s:string):string = String.mid(s,2,String.length(s)-1) fun double(s:string,a:bool):string = if s="" then "" else (if a then first(s) else String.fromInt ( Int.fromString( first(s) ) * 2 )) ++ (double (rest (s), not a)) fun sum(s:string):int = if s="" then 0 else (Int.fromString (first(s))) + (sum (rest(s))) in sum(double(this,false)) % 10 = 0 end </validation> </parameter> <parameter name="expires" method="POST" maxlength="5" minlength="5" MAC="N" required="Y" type="string"> <validation> format(this,"\d\d/\d\d") and Int.fromString( mid(s,1,2) ) <= 12 and Int.fromString( mid(s,1,2) ) >= 0 </validation> </parameter> </url> <cookie name="sessionKey" maxlength="15" minlength="15" type="int" /> </policy>
Using the policy description of Figure 5 we are able to fix all of the system's vulnerabilities (described above) without modifying any of the code:
Whilst we advocate the use of a specialised Security Policy Description Language in the majority of cases, we recognise that there are circumstances where the increased flexibility of a general purpose programming language may be desirable.
In such cases we argue that using a security gateway to abstract security-related code is still a useful technique. Instead of generating code for the security gateway via the SPDL compiler, we observe that the security policy can be encoded in a programming language of choice and compiled directly for execution on the security gateway. Although one loses the specialised features of the SPDL, using a general purpose programming language provides designers with a greater degree of freedom. (Of course with great power comes great responsibility . Programmers must take extra care to structure their security policy enforcement code carefully).
We experimented with this idea by programming our security gateway directly in OCAML , using a comprehensive HTTP library to process HTTP requests and responses. We found that, even when using a general purpose programming language to express the security policy, using a security gateway to structure an application still provides a number of advantages. In particular:
In this section we discuss performance issues and present experimental results derived from our implementation of the security gateway.
Figure 6 shows the latency of the security gateway and compares it to the latency of other common types of HTTP processing. The results were measured by fetching the homepage of the Laboratory for Communications Engineering (University of Cambridge) augmented with the web-form described in our case-study of Section 4. The leftmost bar shows the latency added by a Squid  proxy cache when fetching a statically compiled version of the page; the middle bar shows the added latency of dynamically generating the page using PHP and a MySQL  backend; the rightmost bar shows the latency of using the security gateway to enforce the security policy of Figure 5. The final bar is divided into two sections: the (lower) solid black section represents the latency due to buffering the HTTP messages; the (upper) striped section shows the latency due to parsing the HTTP messages and annotating the HTML with MACs.
The latency of our system is large compared with the latencies incurred in proxy caching and dynamic page generation. To some extent this is due to the fact that our naïve implementation is completely unoptimised. However, we recognise that the complexity of the application-level tasks performed by the security gateway will necessarily incur more latency than the lower level manipulation performed by proxies such as Squid. We regard our current implementation as a proof-of-concept. In future work we intend to concentrate on performance. Potential optimisations include (i) using a specialised HTML parser to concentrate only on relevant parts of HTML syntax (we currently use a general HTML parser which performs a great deal of unnecessary work); (ii) reducing latency by streaming the HTTP messages and processing them on-the-fly whenever possible; (iii) writing speed critical parts of the security gateway directly in C.
Figure 7 shows how the total throughput of a single security gateway varies as the number of concurrently connected clients increases. The measurements were taken running the security gateway on a dual P-III 500 MHz. The throughput quickly reaches a maximum value as the CPUs become saturated. Again, we are confident that optimising our code for performance and running the filter on a higher spec machine would yield a significantly higher maximum throughput.
We designed the security gateway in a stateless manner, choosing to annotate URL parameters, form fields and cookies with MACs rather than storing session state in a back-end database. Since the security gateway is stateless, one may increase throughput linearly simply by deploying multiple security gateways and using a load balancing scheme to distribute work between them (see Figure 8). (Note that stateful systems do not scale linearly in this way since, ultimately, the centralised state becomes a bottleneck across the cluster.)
The measurements presented here are worst case in the sense that the HTML used to test the system was long and complicated, containing both URLs and form parameters. In reality we believe that many HTML pages would be simpler for the security gateway to process (i.e. shorter, without form parameters), leading to better average case performance. Note that many of the HTTP messages would contain graphics and hence would not require any processing at all. A performance-optimised security gateway could examine the content-type header of HTTP responses, using streaming instead of buffering if no HTML processing is required.
Furthermore, note that most of the overhead of the security gateway is due to annotating HTML with MACs. If the SPDL policy does not require the use of MACs then only HTTP-request parameters need to be checked; the security gateway can stream HTTP-responses directly.
We believe that the performance figures presented in this section demonstrate that our techniques are applicable in practice.
The idea of using firewalls to prevent unauthorised activity at the application-protocol level is not new. A large number of companies provide application-level firewalls as commercial products. Typical services provided by such firewalls include virus protection and access control. However, we are not aware of any application-level firewalls which apply user-specified validation and transformation rules.
Damiani et al.  describe a method for enforcing rôle-based access control policies for remote method invocations via the SOAP protocol . The type of policies described are very different to ours: they consider access control issues whereas we try to prevent application-level attacks in general. However, the similarity between the two systems lies in the use of a firewall to enforce restrictions at the HTTP-level.
Sanctum Inc. provide a product called AppShield  which, like our Security Gateway, inspects HTTP messages in an attempt to prevent application-level attacks. However, despite this apparent similarity, there are significant differences between the two systems: we take the programmatic approach of specifying a security policy explicitly; in contrast AppShield has no SPDL or compiler and attempts to infer a security policy dynamically. Whilst this allows AppShield to be installed quickly, it limits the tasks it can perform. In particular, since there is no policy description language for describing validation or transformation rules, AppShield knows very little about what constitutes valid parameter values in HTTP-requests and can only perform simple checks on data returned from clients. AppShield is intended as a plug-and-play tool which provides a limited degree of protection for existing websites with application-level security problems. In contrast, we see our approach as a suite of development tools and methodologies which aid in the design-process of secure applications.
Enforcing a security policy across a large web-application is difficult because:
In this paper we have presented a method for abstracting security-critical code from large web applications which addresses the problems outlined above. A specification language for describing application-level security policies was described and illustrated with a realistic example.
We hope that the tools and techniques described in this paper will be useful in the development process of new web applications. By abstracting the security policy from the outset programmers have the advantage of a well-defined, centralised set of assertions laid out in the SPDL security specification. As well as reducing the amount of code written by each developer we hope that the project's SPDL specification would act as a useful document, aiding communication between teams of developers and speeding up code-review processes. Justifying these claims with reference to real-life case studies is high priority for future work.
Another direction for future work is to augment the security gateway with a library of security-related services (e.g. authentication and generation of secure session IDs). These services could be called from the web-application using protocols such as XML-RPC or SOAP .
On their website Sanctum claim that their ``AppShield software secures your site by blocking any type of application manipulation through the web''. Clearly this is false: if it were possible to solve all application-level security problems with a black-box tool then there would be no need for further security research. In contrast, we do not claim that we have found a automatic fix for all application-level security problems: although our tool helps to secure a web application it still requires a competent, security-aware engineer to write/check the security policies by hand.
Based on the research reported in this paper, we claim that our methodology provides a stronger foundation for secure web applications than conventional tools and development techniques. In addition, we believe that applying this methodology in practice would make a significant and immediate impact to the many websites which currently suffer from application-level security vulnerabilities.
This work was supported by (UK) EPSRC GR/N64256 and The Schiff Foundation. Both authors are sponsored by AT&T Laboratories, Cambridge. The authors wish to thank Alan Mycroft, Andrei Serjantov and Richard Clayton for their valuable comments and suggestions.
This document was generated using the LaTeX2 HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996, Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -white current.tex
The translation was initiated by David Scott on 2002-02-13
Document was converted to XHTML by tidy.
Subsequent editing was performed by hand.