Tuesday, October 1, 2024

Hashtags and implementing extensions in MediaWiki by modifying services

 Recently I decided to make a Hashtags extension for MediaWiki.

The idea is - if you add something like #foo to your edit summary, that becomes a link in RecentChanges/history/etc, which you can click on to view other edits so tagged.

I implemented this using MediaWiki's newish services framework. This was the first time I used this as an extension mechanism, so I thought I'd write a blog post on my thoughts.

Why Hashtags

The main idea is that users can self-group their edits if they are participating in some sort of campaign. There are already users doing this and external tools to gather statistics about it. For example Wikimedia hashtag search so there seems to be user demand.

MediaWiki also has a feature called "change tags". This allows edits to be tagged in certain ways and searched. Normally this is done automatically by software such as AbuseFilter. It is actually possible for users to tag their own edits, provided that tag is on an approved list. They just have to add a secret url parameter (?wpChangeTags) or edit by following a link with the appropriate url parameter. As you can imagine, this is not easily discoverable.

I've always been excited about the possibility of hashtags. I like the idea that it is a user-determined taxonomy (so-called folksonomy). You don't have to wait for some powerful gate-keeper to approve your tag. You can just be bold, and organize things in whatever way is useful. You don't have to discover some secret parameter. You can just see that other people write a specific something in their edit summaries and follow suite.

This feels much more in line with the wiki-way. You can be bold. You can learn by copying.

The extension essentially joins both methods together, by automatically adding change tags based on the contents of the edit summary.

Admittedly, the probability of my extension being installed on Wikipedia is probably pretty low. Realistically i am not even trying for that. But one can dream.

For more information about using the extension see https://www.mediawiki.org/wiki/Extension:Hashtags

How the extension works

There are a couple things the extension needs to do:

  • When an edit (or log entry) is saved, look at the edit summary, add appropriate tags based on that edit summary
  • If an edit summary is revision deleted or oversighted, remove the tags
  • Make sure hashtag related change tags are marked as "active" and "hidden".
  • When viewing a list of edits (history, Special:RecentChanges, etc), ensure that hashtags are linked and that clicking on them shows only edits tagged with that tag

 The first three parts work like a traditional extension, using normal hooks.

I used the RevisionFromEditCompleteHook and ManualLogEntryBeforePublishHook to run a callback anytime an edit or log entry is saved. Originally I used RecentChange_saveHook, however that didn't cover some cases where an edit was created but not published to RecentChanges, such as during page moves. ManualLogEntryBeforePublishHook covers more cases than it might appear at first glance, because it will also tag the revision associated with the log entry. All this is pretty straightforward, and allowed tagging most cases. Restricted (confidential) logs still do not get tagged. It seems difficult to do so with the hooks MediaWiki provides, but perhaps its best not to tag such log entries, lest information is leaked.

Similarly, I used the ArticleRevisionVisibilitySetHook to delete/undelete tags after revdel. Unfortunately MediaWiki does not provide a similar hook for log entries. I proposed adding one, but that is still pending code review.

Marking tags as active was also quite straightforward using normal hooks.

All that leaves is ensuring hashtags are linked. For this I leveraged MediaWiki's newish dependency injection system.

Services and dependency injection

For the last few years, MediaWiki has been in the progress of being re-architectured, with the goal to make it more modular, easier to test, and easier to understand. A core part of that effort has been to move to a dependency injection system.

If you are familiar with MediaWiki's dependency injection system, you might want to skip past this part.

Traditionally in MediaWiki, if some code needed to call some other code from a different class, it might look like this:

class MyClass { 
    function foo($bar) {
        $someObject = Whatever::singleton();
        $result = $someObject->doSomething( $bar );
        return $result;
    }
}

In this code, when functionality of a different class is needed, you either get it via global state, call some static method or create a new instance. This often results in classes that are coupled together.

In the new system we are supposed to use, the code would look like this:

class MyClass {
    private SomeObject $someObject;
    public function __construct( SomeObject $someObject ) {
         $this->someObject = $someObject;
    }
    public function foo( $bar ) {
         return $this->someObject->doSomething( $bar );
    }
}

The idea being, classes are not supposed to directly reference one another. They can reference the interfaces of objects that are passed to it (typically in the constructor), but they are not concretely referencing anything else in the system. Your class is always given the things it needs; it never gets them for itself. In the old system, the code was always referencing the Whatever class. In the new system, the code references whatever object was passed to the constructor. In a test you can easily replace that with a different object, if you wanted to test what happens when the Whatever class returns something unexpected. Similarly, if you want to reuse the code, but in a slightly different context, you can just change the constructor args for it to reference a different class as long as it implements the required interface.

This can make unit testing a lot easier, as you can isolate just the code you want to test, without worrying about the whole system. It can also make it quite easy to extend code. Imagine you have some front-end class that references a back-end class that deals with storing things in the database. If you suddenly need to use multiple storage backends, you can just substitute the implementation of the backend class, without having to implement anything special.

MediaWikiServices

All this is great, but at some point you actually need to do stuff and create some classes that have concrete implementations.

The way this works in "new" MediaWiki, is the MediaWikiServices class. This is responsible for keeping track of "services" which are essentially the top level classes.

Services are classes that:

  • Have a lifetime of the entire request.
  • Normally only have one instance existing at a time.
  • Do not depend on global state (Config is not considered state, but anything about the request, like what page is being currently viewed is state. Generally these services should not depend on RequestContext)

You can register services, in a service wiring file. This also allows you to register what services your service needs as constructor arguments.

Some classes do not fit these requirement. For example, a class that represents some data we would expect to have multiple instances with shorter lifetimes, to represent the data in question. Generally the approach for such classes is to create a Factory class that is a service, which makes individual instances and passing along dependencies as appropriate.

But still the question remains, how do you get these services initially. There is an escape hatch, where you can call MediaWikiServices::getInstance()->getService( $foo ), however that is strongly discouraged. This essentially uses global state, which defeats the point of dependency injection, where the goal is that your class is passed everything it needs, but never reaches out and gets anything itself.

The preferred solution is that the top level entrypoint classes in your extension are specified in your extension's extension.json file.

Typically extensions work on the levels of hooks. This is where you register a class, which has some methods that are called when certain events in MediaWiki happen. You register these hook handlers in your extension's manifest (extension.json) file.

In old mediawiki, you would just register the name of some static methods. In new MediaWiki, you register a class (HookHandlers), along with details of what services its constructor needs. MediaWiki then initiates this class for you with appropriate instances of all it dependencies, thus handling all the bootstrapping for you.

To summarize, you tell MediaWiki you need certain services for your hook class. When the hook class is instantiated, it is constructed with the default version of the services you need (Possibly including services you defined yourselves). Those services are in turn instantiated with whatever services they need, and so on.

All this means you can write classes that never reach out to other classes, but instead are always provided with the other classes they need.

What does this have to do with Hashtags?

All this is not generally meant as an extension mechanism. The goal is to make it so classes are isolated from each other for easier testing, modification and understanding.

However, the services bootstrap process does have an idea of default services, which it provides when creating objects that have dependencies. The entire point is to be able to easily replace services with different services that implement the same interface. Sure it is primarily meant as a means of better abstraction and testability, but why not use it for extensibility? In this extension, I used this mechanism to allow extending some core functionality with my own implementation.

One of those services is called CommentParserFactory. This is responsible for parsing edit summaries (Technically, it is responsible for creating the CommentParser objects that do so). This is exactly what we need if we want to change how links are displayed in edit summaries.

In the old system of MediaWiki, we would have no hope in making this extension. In the old days, to format an edit summary, you called Linker::formatComment(). This was a static method of Linker. There would be no hope of modifying it, unless someone explicitly made a hook just for that purpose.

Here we can simply tell MediaWiki to replace the default instance of CommentParserFactory with our own. We do this by implementing the MediaWikiServicesHook. This allows us to dynamically change the default instantiation of services.

There are two ways of doing this. You can either call $services->redefineService() which allows you to create a new version of the Service from scratch. The alternative is to call $services->addServiceManipulator(). This passes you the existing version of the service, which you can change, call methods on, or return an entirely different object.

Hashtags uses the latter method. I essentially implemented it by wrapping the existing CommentParser. I wanted to just replace hashtags with links, but continue to use MediaWiki core for all the other parsing.

Using Services as an extension mechanism

How did this go?

On the bright side, i was able to make this extension, which i otherwise would not have been able to. It indeed works.

Additionally, I feel like simply replacing a class and implementing its interface, is a much cleaner abstraction layer for extensibility than simply executing a callback at random points in the code that can do anything.

There are some frustrations though with using this approach.

First of all, everything is type hinted with the concrete class. This meant i had to extend the concrete class in order for all the type hints to work even though I didn't really need or want to inherit any methods. Perhaps it would be nice to have some trait that automatically adds all the forwarding magic with __get().

I also intentionally did not call the parent constructor, which i guess is fine because i never call any of the parent's methods, however it does feel a bit odd. The way i implemented my class was to take in the constructor an instance of the "base" class that i forward calls to after wrapping them in my own stuff. Part of the reason for this is I wanted to use the old CommentParser object, because constructor functions in MediaWiki are highly unstable between versions, so I didn't want to have to manually construct anything from MW core.

The part where this really falls down is if you want to have multiple extensions layering things. This really only works if they both do so in the same way.

At one point in my extension, I wanted to access the list of tags parsed. In retrospect, perhaps this would have been better implemented as having a separate service alias that is explicitly for my class.  Instead, i implemented it by adding an additional method to return this data and just relied on the normal service name. But i ran into the problem of i can't be sure nobody else redefined this service, so how do i know that the service i got is the one i created with this method? I just tested using instaceof, and if not tried to wrap the returned service again in my class. This has the benefit of if anyone else modifies the CommentParser service in a layered way, I will hopefully run their code and the list of tags will match the result. On the other hand its icky, so maybe the better method was the aforementioned having a separate service alias explicitly for my class. However again, that has the problem of how to bootstrap, since I didn't want to ever have to instantiate the base CommentParser class, with MediaWiki's highly unstable constructors.

This whole part seemed icky, and the various options seemed bad. In practice it maybe doesn't matter as its unlikely two extensions will wrap this service, but it seems like there should be a better way. I guess wrapping the service is fine, its extracting additional information that's the problem.

I think documenting best practices on how to "decorate" services would go a long way here to making this better. Right now it is very unclear how to do this in a way that is composable and extensible.

I've also seen examples of people using reflection when trying to "extend" services, but luckily i didn't need to resort to that.

Testability

The last thing I wanted to mention, is i do think this general system has had dividends. The path to get here was a little rocky, but I do think this style of code is much clearer.

I used to absolutely hate writing tests for mediawiki code. It was such a frustrating developer experience. Now it is much less painful. For this extension I have 100% test coverage (excluding maintenance scripts).  Admittedly it is a relatively small extension, but five years ago, I couldn't imagine bothering to do that for any MediaWiki code.

Conclusion

I had fun writing this extension and am proud of the result. It was interesting to experiment with using MediaWikiServices as an extension mechanism

Consider trying it out on your wiki, and let me know what you think! The extension should be compatible with MediaWiki 1.39 and higher (Using appropriate REL branch)

More information about it can be found at https://www.mediawiki.org/wiki/Extension:Hashtags