Security Guideline

Our goal is to provide a type system that ensures that a Java programmer follows a given programming guideline or "best practice" to prevent code injection attacks.

Use case

In our particular use case,

  • the Java program to be analyzed is a servlet that generates HTML code
  • the servlet has access to certain untrusted "user input" strings
  • the programming guideline says that the untrusted strings need to be sanitized before they are embedded (passed to an output function)
  • this prevents the injection of malicious code into the generated HTML page

The use of the sanitization function may depend on where the string is embedded. For example, if the string is embedded into HTML, a certain escapeToHtml method is to be used; if it is embedded into JavaScript code occurring on the generated page, it must be sanitized with escapeToJs. We call such a requirement for the programmer the security guideline.

Here is an example program that satisfies the security guideline:

public void doGet(HttpServletRequest request) {
  String input = request.getInputParameter();

  // HTML embedding
  String s = "<body>" + escapeToHtml(input) + "</body>";

  // JavaScript embedding
  if (showAlert) {
    output("<script >");
    output(" alert(’" + escapeToJs(input) + "’);");

Note: We want to ensure that the security guideline is followed. It is not our goal to evaluate the security guideline with respect to actual security gains.


Formalization of the security policy

We classify strings according to their contents and origin: (the names of the classifications have historical reasons)

  • all strings coming from getInputParameter get the class Input
  • escapeToHtml returns C1-classified strings, escapeToJs returns C2-classified strings
  • literals have type Lit, except for "<script>" and "</script>", which have types Script and /Script, respectively

The calls to the output function define the (abstract) output trace of a program. The example program generates two possible output traces, depending on the value of the variable showAlert:

  1. Lit + C1 + Lit + Script + Lit + C2 + Lit + /Script
  2. Lit + C1 + Lit

The programming guideline is formalized by specifiying the set of allowed output traces in form of a finite state machine (automaton):

A concrete machine could for example

  • contain states for "normal mode" and "script mode", such that "normal mode" accepts C1, but not Cstrings, while "script mode" does the reverse
  • switch to "script mode" whenever a Script string is encountered, and back to "normal mode" at a /Script string
  • have a special (inescapable) "fail" state whenever an Input string is encountered.

The described machine would allow the two traces of the example program above, but not e.g. Script + C1 + /Script (which means the wrong sanitization function is used), or Input (which means the the output trace contains a string that is directly coming from getInputParameter, and is hence not sanitized).


The syntactic monoid

To make our analysis easier, we do not handle complete sets of output traces. Instead, we partition the possible output traces into a set of equivalence classes, such that all traces in an equivalence class have the same effect on the automaton. For example, the [Script] class contains the traces Script, or Script + Lit, or Script + C2 + Lit, etc. Furthermore, we can define a concatenation of such equivalence classes [a] + [b] as the equivalence class that contains the pointwise concatenations of the members of [a] and [b].
Both the set of equivalence classes and the concatenation operation form the so-called syntactic monoid of the automaton. The monoid can be directly computed from the automaton.
We assume the designer of the security guideline formalizes her requirements in form of a finite state machine, and then computes its syntactic monoid. The analysis is then parametric with respect to a given syntactic monoid.  For the implementation, this means the monoid can be specified in a file, which is then parsed by the tool and used for the analysis.

The type system

Our analysis is a type system. Whenever a program is typable with respect to a given security automaton, it means the programmer has followed the security guideline that the automaton describes. If the program is not typable, then there is either a violation of the guideline, or the type system is not expressive enough to show that the guideline has been followed.
There are a number of reasons why we use a type system for our analysis:
  • A type system makes the analysis easier to understand and to use for the typical Java programmer, as he/she is already familiar with the concept of types. In fact, our type system is a proper extension of the standard Java class type system.
  • From a theoretical point of view, type systems make it easier to give a formal proof of correctness, because they seperate the definition of valid types from the inference algorithm to find such types.
  • Finally, we can build on our earlier work on type systems for Java-like languages.

Please refer to the Analyzer page for a detailed description of the implementation of the type system in the TJSA tool.