In this example, content is the dynamic data that is written to the HTML page using the Java Expression Language (EL) notation (so ${content}is the interesting data).



Dissecting where content is added, we see that content is encapsulated inside a:

HTML attribute: the value of the attribute is quoted using "

URL: since we're in an href attribute, we know this is supposed to be a URL for the browser

JavaScript: we recognize the javascript: scheme and what comes after will be considered as JavaScript

JavaScript string: content belongs in a string that is passed to the hello() JavaScript function

To understand the HTML contexts, we need to see what the browser will do to eventually get to this content in JavaScript:

HTML parser: Get the content of the href attribute and unescape the HTML entities. Parse the resulting URL and recognize that this is a javascript: scheme. Take the content of the URL (after the javascript:) and URL decode it. Pass the content of the URL to JavaScript. JavaScript parser: Get the JavaScript string that contains our content. Process the content of the JavaScript string for string escape sequence: JavaScript string decoding.

These steps indicate what decoding sequence the browser executes. To fix cross-site scripting, you need to reproduce this in reverse order to make the content safe for its stack of HTML contexts:

Quoted HTML attribute URL JavaScript string

So if you want to safely output content, you need to do something like this: