Saturday, March 30, 2024

MediaWiki edit summary XSS write-up

 Back in January, I discovered a stored XSS vulnerability in core MediaWiki (T355538). Essentially by setting a specific edit summary when editing a page, you could run javascript (And take over the account of anyone viewing the edit summary, for example on the history page or recentchanges)

MediaWiki core is generally pretty good when it comes to security. There are many sketchy extensions, and sometimes there are issues where an admin might be able to run javascript, but by and large unauthenticated XSS vulns are fairly rare. I think the last one was CVE-2021-44858 from back in 2021. The next one before that was CVE-2017-8815 in 2017, which only applied to wikis configured to have a site language of certain languages (e.g. Serbian and Chinese). At least, those were the ones I found when looking through the list. Hopefully I didn't miss any. In any case, finding XSS triggerable by an unprivleged attacker in MediaWiki core is pretty hard.

So what is the bug? The proof of concept looks like this - Create an edit with the following edit summary:

[[Special:RecentChanges#%1b0000000|link1]] [[PageThatExists#/autofocus/onfocus=alert("xss\n"+document.domain)//|link2]]

This feels a bit random at first glance. How does it work?

The edit summary parser

Whenever you edit a page on MediaWiki, there is a box for your edit summary. This is essentially MediaWiki's version of a commit message.

Very little formatting is allowed in this summary. A major exception is links. You can link to other pages by enclosing the link in [[ and ]].

So this explains a little bit about the proof-of-concept - it involves 2 links. But why 2? It doesn't work with just 1. What is with the weird link targets? They are clearly abnormal, but they also don't look like typical XSS. There are no < or >, there aren't even any unclosed quotes.

Lets take a deeper look at how MediaWiki applies formatting to these edit summaries. The code where all this happens is includes/CommentFormatter/CommentParser.php.

The first thing we might notice is the following line in CommentParser::preprocessInternal: "// \x1b needs to be stripped because it is used for link markers"

In the proof of concept, the first part is [[Special:RecentChanges#%1b0000000|link1]], where %1b appears. This is a good hint that it has something to do with link markers, whatever those are.

Link markers

But what are link markers?

When MediaWiki makes a link, it needs to know whether the page being linked to exists or not, since missing pages use a red colour. The most natural way of doing this is, when encountering a link, to check in the DB whether or not the page exists.

However, there is a problem. When rendering a long page with a lot of links, we have to do a lot of DB lookups. The lookups are simple, but still on a separate (albeit nearby server). Each page to lookup involves a local network request to fetch the page status. While that is happening, MW just sits and waits. This is all very fast, but even still it adds up a little bit if you have say 500 links on a page.

The solution to this problem was to batch the queries. Instead of immediately looking up the page, MW would put a small link marker in the page at that point and carry on. Once it is finished, it would look up all the links all at once, and then do another pass to replace all the link markers.

So this is what a link marker is, just a little marker to tell MW to come back to this spot later after it figured out if all the links exist. The format of this marker is \x1B<number> (So \x1B0000000 for the first one, \x1B0000001 for the second, and so on). \x1B is the ASCII escape character.

Back to the PoC

This explains the first part of the proof of concept: [[Special:RecentChanges#%1b0000000|link1]] - the link target is a link marker. The code has a line:

                                // Fix up urlencoded title texts (copied from Parser::replaceInternalLinks)
                                if ( strpos( $match[1], '%' ) !== false ) {
                                        $match[1] = strtr(
                                                rawurldecode( $match[1] ),
                                                [ '<' => '&lt;', '>' => '&gt;' ]
                                        );
                                }


Which normalizes titles using percent encoding to use the real characters. Thus the %1B gets replaced with an actual 0x1B byte sequence. The code did try and strip 0x1B characters earlier, but at that point, it was still just %1b and did not match the check.

We now have a link with a link marker inside of it. An important note here is that Special:RecentChanges is not a normal page. It is a special page. MediaWiki knows it exists without having to consult the database, so it does not get the link marker treatment. This is important because we cannot use it as a fake link marker if it gets replaced by a real link marker.

At this stage after inserting link markers, the proof of concept becomes the following string:

<a href="/w/index.php/Special:RecentChanges#\x1B000000" title="Special:RecentChanges">link1</a> \x1B0000000

A link with a link marker inside it!

The second link

The \x1B0000000 is a stand in for [[PageThatExists#/autofocus/onfocus=alert("xss\n"+document.domain)//|link2]].

The replacement at the end is a normal replacement, and everything is fine. However there are now two replacements - there is also the replacement inside the link: href="/w/index.php/Special:RecentChanges#\x1B000000"

This is the fake link marker that we contrived to get inserted. Unlike the normal link markers, this is inside an attribute. The replacement text assumes it is being inserted as normal HTML, not as an attribute. Since it is a full link that also has quotes inside it, the two layers of quotes will interfere with each other.

Once the replacements happen we get the following mangled HTML for our proof of concept:

<a href="/w/index.php/Special:RecentChanges#<a href="/w/index.php/Test#/autofocus/onfocus=alert(&quot;xss\n&quot;+document.domain)//" title="Test">link2</a>" title="Special:RecentChanges">link1</a> <a href="/w/index.php/Test#/autofocus/onfocus=alert(&quot;xss\n&quot;+document.domain)//" title="Test">link2</a>

This obviously looks wrong, but its a bit unclear how browsers interpret it. A little known fact about HTML - /'s can separate attributes so long as no equal signs have been encountered yet. After the browser hits the second " mark, it thinks the href attribute is closed and that the remaing is some additional attributes. The browser essentially parses the above html as if it was:

<a href="/w/index.php/Special:RecentChanges#<a href=" w="" index.php="" Test#="" autofocus onfocus="alert(&quot;xss\n&quot;+document.domain)//&quot;" title="Test">link2</a>" title="Special:RecentChanges"&gt;link1</a> <a href="/w/index.php/Test#/autofocus/onfocus=alert(&quot;xss\n&quot;+document.domain)//" title="Test">link2</a>

In other words, an <a> tag, that has an attribute named autofocus and an onfocus event handler. On page load, the link is automatically focused, which triggers the javascript in the onfocus attribute to run, allowing the attacker to do what they want.

Take aways

I think the major take aways is that running Regexes over partially parsed HTML is always scary. We've had similar issues in the past, for example T110143.

The general pattern we've used to fix this and similar issues, is make sure the replacement token has special characters that would be mangled if it appeared in an unexpected context. Concretely, we added " and ' to the token, which would get escaped if placed in an attribute, and thus no longer matching and no longer being replaced.

More generally though, I think this is a good example of why even a minimal CSP policy would be helpful.

CSP is a complex standard, that can do a lot of things and has a lot of pieces. One of the things it can do, is disable "unsafe-inline" javascript. This means javascript from attributes (like onfocus) and javascript URLs. Usually this also includes inline <script> tags without a nonce, but that part is optional. A key point here, is this also generally means you cannot execute javascript via .innerHTML anymore, which is a fairly common vector for XSS via javascript.

Normally disabling unsafe-inline would be part of a broader effort to secure javascript, however its possible to take things a step at a time. This vulnerability would have been stopped just by disabling event attributes. A surprising portion of MediaWiki & extension XSS vulns [Excluding boring - an admin can change something in an unsafe way issues] involve just html attributes (or javascript: urls), which is a web feature that nobody really needs for legit reasons and is generally considered bad practise in normal usage. Even the most minimal CSP policy might really help MediaWiki's overall security posture against XSS vulns.

For more info about the vulnerability, please see the original report at https://phabricator.wikimedia.org/T355538.