Thursday, February 21, 2013

Tech related IEG proposals

The deadline for IEG proposals has now passed. IEGs (Individual engagement grants) are a new pilot program by the Wikimedia foundation to give small amounts of money to people who promise to do cool things with it. Ok, the criteria are a bit more complicated than that, but that is the gist of it.


I thought I'd take some time to look through the technical proposals. To be honest, I was hoping to see more programming proposals, something like google summer of code, but for already experienced devs. By and large that did not seem to happen. This may be due to some mixed messages on technical projects - contrast this mailing list post and this late addition to the rules I just discovered today. Also it may simply be because its a new program, and developers weren't its primary audience. Perhaps it has to do with the timing, which would interfere with students going to school (as opposed to google summer of code, which coincides with summer break), resulting in less student programmers participating. Who knows.

Additionally, of the technical proposals made, only one actually consulted with the developer community :( at large (by which I mean wikitech-l). I should note that some of the proposals I'm listing below as technical, are more of the form "develop a vision statement" which doesn't really require consulting with dev community. However, I still expected more people to be chatting up the devs in relation to IEG proposals.


tl;dr: My favourites are: Elaborate Wikisource strategic vision, and The Wikipedia Adventure. My runner up favourite is backlog pages for WikiProjects (That one is a runner up as its too vague on what actually will be accomplished).

Anyhow, here's my take on the technical proposals that were submitted. Note, I have mostly just read through the proposals once, so if I misunderstand anything in any of the proposals, I apologize in advance.

Backlog pages for WikiProjects

This is an interesting proposal. Basically the author notes that Wikipedia has categories and organizational pages for its various backlogs. However individual WikiProjects do not have such per-project backlog pages, or if they do, they're very limited.

The actual proposal for what to do is rather vague. It sounds slightly like figuring out what to do is part of the proposal. From what I've gathered the proposal breaks down into two related wants:

  • (Efficient) Category intersection - The ability to get all pages that are in the intersection of a set of categories. There are some tools that do this already - like DPL [Not enabled on Wikipedia, but is on other wikis like meta], and CATSCAN. Neither scales well once things get big.
  • A snazzy interface for showing the backlog - The authors point to WikiHow's CommunityDashboard as an example to potentially emulate. This is the first time I've heard of WikiHow's tool, and while I only gave it a brief glance, it is very cool looking.
Category intersection is an interesting problem, one that has been wanted for quite a long time by many people. I'm currently the maintainer of the DynamicPageList extension (however I mostly ignore it, and simply fix the rare bug that pops up). DynamicPageList does category intersection in the naive way, which simply does not scale to Wikipedia-size wikis (or even wikis significantly smaller than enwikipedia). (By naive method, I mean doing a bunch of self-joins on the categorylinks table) Some people have suggested that it may be possible to implement this efficiently using full-text indexes and a program like lucene. There's even a proof of concept extension written using this approach. Adapting DynamicPageList to use this type of method is certainly something I would personally like to investigate if I ever had a large swath of free time.

The authors of this proposal suggest $1000 to hire a developer to implement their feature requests. While its hard to be certain, as the actual project requirements of this proposal are basically not defined, that seems like way too low a number given the amount of work wanted for the project. Particularly if efficient category intersection is a requirement.

Elaborate Wikisource strategic vision

I really like this proposal. Wikisource has always been a bit of a mystery to me. I know it has something to do with digitizing documents, and proofreading the resulting text, but I don't know much beyond that. In particular, I have almost no knowledge about how their main tool, ProofreadPage, actually works.

Having a strategic vision for Wikisource would help more people understand, and thus appreciate the work of Wikisource. In turn, this may result in more people using Wikisource.

One of things I noticed about the proposal, is that they make it very clear they want to in the short term concentrate on things that do not need Wikimedia technical staff attention. This is probably a reaction to how Wikisource has been ignored by both the foundation and the larger developer community. The vast majority of work done on Wikisource related extensions has been done by volunteer developers who come from the Wikisource project. Personally I would caution this proposal from ignoring potential wmf tech resources too much. Well its important to consider what is do-able and what is not, it is also good to first decide what is wanted, and then figure out how to do it (Where there's a will there's a way). Wikisource may even find that once there is a clear picture of what is needed, much more resources are available to them. WMF employees aren't the only developers, there are also (non-wikisource) volunteers. Who knows, perhaps these people would be willing to help if they knew what needed doing (More generally, if you want some new feature for your wiki, a good first step is always to produce a good design document of precisely what is wanted. Developers aren't mind readers, and would much rather code than try to figure out what the user wants. Having a clear statement about what you need may be half the battle to getting what you need). Also just because the WMF isn't willing to devote tech resources to Wikisource, doesn't mean that employees might not help. Employees do have 20% time, and occasionally even commit code unrelated to foundation goals in their free time.

I really wish this project luck, and should it be accepted, I look forward to reading the final report.
Edit: I wrote this section before I saw the new part of the rules where nothing involving WMF-tech resources is allowed. With that in mind, the no-wmf-tech parts of this proposal make much more sense.

Mapping History: Revision History Visualizer and Improvement Suggester using Geo-Spatial Technologies

This one gets points for being the only tech proposal to actually talk to the developer community.

Basically what they want to do, is create a map from an articles edit history, to highlight which region is editing the article the most. Afterwards they want to do some fancy machine learning stuff to see if any automatic inferences can be made from this geo-spatial data (For example, if only one country edits an article, maybe its POV).

Unfortunately the proposal has several problems. First of all the privacy policy. The authors want to get the IP addresses of logged in users, in order to find out roughly where they live, so they can be plotted on a map. That's not going to happen for privacy reasons, end of story. Hence the visualizations will be a lot less complete (If they only use anon locations). The proposal could perhaps parse user pages for location based infoboxes, but not everyone specifies that sort of information

Beyond not sufficiently researching the privacy policy, the authors seem not to understand what sort of access different technical projects (extensions vs gadgets vs third party hosted thingies) have, along with what data the API provides. I would expect that someone making such a proposal would understand the limitations of the technology that they intend to use before making the proposal.

Last of all the $30000 budget request seems a little high relative to the amount of work (I believe would be required) and the impact the project would make.

MediaWiki and Javanese script

This is an interesting one. It would be interesting to see what someone from wmf's i18n team thought of it.

As far as I understand, the main points are:

  • There are no input methods generally available for the Javanese script except in MediaWiki (which sounds odd to me)
  • People should be able to type in their own script easily
  • Therefore we should distribute MediaWiki-on-a-stick (A Wiki on a usb stick, so you can take the wiki with you).
First of all, it would be kind of cool if Wiki on a stick was supported for MediaWiki. The author mentions XAMPP, but there might be simpler options (Using PHP's built in webserver, combined with sqlite). However the wiki on a stick part seems to be a means to an end, not the main goal of this project.

For the actual project, I'm not sure what the end goal is - Have Javanese speaking people start using MediaWiki as a personal word processor? It seems like making an input method for X11/where ever input methods go in general for various operating systems would be much more effective at accomplishing the authors goals. Its also unclear how this benefits Wikimedia, other than wiki on a stick support would benefit MediaWiki. Having more Javanese speakers familiar with MediaWiki might make them more likely to contribute to a Javanese project, but that seems like a rather indirect benefit.

Replay Edits

This is an interesting proposal.

From what I understand, what is being proposed is that you could replay the history of an article, having text being added and removed in front of your eyes. Somewhat similar to how edits happen in front of your eyes in etherpad (?) (but replaying the past, not real time editing). This would allow a cool visualization of how articles change with time.

I'm unclear on this proposal if its meant to operate on the wikitext or on the rendered page. I'm also unclear if as it goes forward in time, does it highlight the changes, or just show the new page. Some of the comments on the talk page, and this mockup suggest it would work on rendered pages. Having diffs that highlight what changed, but on the rendered page instead of the wikitext source (so-called visual diff), is a feature that would be awesome in and of itself. (There was once upon a time some experimental support in MW for this, but it was removed due to being incomplete).

If it is indeed the authors intention to provide visual diffs, then this project becomes quite exciting. It also becomes quite a bit more difficult, and I would be hesitant supporting it, unless the author stated his implementation plans in much more detail, in order to verify he understands the issues involved. If this is more just a visualization of how articles change in time, it is a much lower impact project, but still an interesting one. I would support it, especially because the proposer is only asking for $200 to do this.

MediaWiki data browser

This proposal is from Yaron, who is (among other things) a very prominent developer of Semantic MediaWiki. This is by far the most ambitious technical project of any proposed and could potentially have a huge impact.

At the same time it is a little unclear what is actually being proposed. The author says a framework to create drill down interfaces. Perhaps my confusion stems from only having a vague idea of what a drill-down interface is. A picture of an example interface would really be worth a thousand words.

With that said, the idea seems to be creating an interface where the user could filter or select pages by some criteria based on information in an infobox. This all sounds really cool, but it also sounds very hand-waving, to really evaluate this proposal, I think I would need to better understand what is actually being proposed. A concrete example of what an app designed with the framework would look like, including what sort of scope in terms of data processing a potential app could have, would be helpful.

An interesting part of this proposal is that all the processing is done on the client side. The author mentions that (obviously) only a small portion of wikipedia's data would be downloaded. I would be interested to know more about how much data would be downloaded, what data would be downloaded (is it wikitext of relevant pages), how the framework would find the relevant information it needs to download (this is part of my confusion over what the relevant information the framework would be working on is).

Certainly an interesting proposal, and one with much potential.

TapAMap

Basically author has an apple iPhone app that gives you a map. You click somewhere on the map, and it takes you to the nearest article to where you clicked. This is the opposite of most geo-location efforts, although is somewhat similar to WikiMiniAtlas type things that provide wikilinks to various places at their location on the map. It appears this one tries to be different by not showing textual links on the map, but instead concentrating on the geographical location only.

The developer wants grant money to port his App to Android. Apparently the iPhone version is fairly popular.

My main concerns with this proposal is that while it is different from other article mapping things, its similar enough to make it relatively low impact. Additionally it seems the author is reluctant to open source his app, or possibly only willing to open source the Android port that would be funded by the grant. I feel this would be a show stopper. Anything funding using Wikimedia money should be Free Software, no ifs ands or buts. I would not support this proposal unless the entire thing (including the existing iPhone app, and the proposed android port) were GPL'd (or another OSI approved license).

Wiki Makes Video

I'm only going to briefly mention this, as its mostly non-technical, but does include implementing a video capture/upload? app for phones. Making videos easier is certainly a useful thing, and something we could do much better at. Would probably want to check that the Mobile and TMH tech teams aren't already doing anything in this direction (I don't think they are, but should check).

The Wikipedia Adventure

Last but certainly not least (Whew, there was actually a lot more of these than I originally thought) comes The Wikipedia Adventure. This is a proposal to continue some work originally started as a fellowship, to create an educational game to show people how to edit Wikipedia.

This is an interesting approach to help break down barriers. While I am personally a fan of manuals and what not, I understand that most people aren't, and this could serve as a very effective introduction to editing Wikipedia.

I (very) briefly tried the prototype, and I must say its pretty cool. I would be interested to see where people can go with this, if given the proper opportunities to pursue it. This is definitely a proposal I would support.

And that is the end. There were actually quite a few more tech proposals than I thought, and it took a lot longer to read through them then I thought it would. If you've stuck through reading this blog post for this long, thanks for reading :)