Monday, March 17, 2008

Mediawiki on Postgresql: not ready for prime time?

Last weekend I spent a fair amount of time trying to install Mediawiki (MW) 1.12 with a Postgresql (PG) database backend. In the end I gave up and decided to remain on a MySQL backend because Mediawiki's support for Postgresql seems not ready for prime time, to say the least. I wonder if anyone out there is actually using it in production?

The background to this is that I have an old Mediawiki at work. I installed version 1.3 in 2004 mostly as a lark, because I enjoyed Wikipedia and thought wikis were a neat idea. I installed it on a disused old Dell Celeron computer running Debian GNU/Linux.

Fast forward 4 years and the wiki somehow became a mission critical enterprise collaboration tool. Yet it's still running on this old minitower tucked away behind a desk in my office. Getting the official IT crew to take over the wiki and run it on a faster system in the server rack has been on the IT crew's todo list for approximately forever. There has been some resistance on the rather shaky grounds that MySQL was not an 'approved' service - mainly because Postgresql is the open source database of choice where I work, and we already have a large, well administered PG server for just this kind of application.

I have nothing against MySQL per se though I definitely think PG is the superior system by far, for a number of reasons that I may expound on in another post. In truth the database backend didn't matter much; there wasn't much ongoing DB-level maintenance to do no matter which the backend. I already had the backups taken care of; both the SQL dump and the mediawiki directory itself are compressed and copied to a backup server every night.

So anyway, I've been trying migrate to the Mediawiki to Postgresql in preparation for moving it to the server rack.

I was able to create a blank new site running under PG 8.3 using the latest dev checkout from subversion. In other words, the most up to date code. That worked fine I guess, but I started running into problems migrating the data from the current production wiki.

These are the problems I ran into:

  • The mediawiki_mysql2postgresql.pl script included was very out of date. This script is designed to connect to your MySQL database and create a output file suitable for import into Postgresql. I ran into numerous problems, tried fixing a few of them, ran into others, then gave up. I talked to someone on irc who I believe was a maintainer for this script and he agreed that yes, the script was out of date and that at some point maybe someone would fix it.
  • OK, so the actual sanctioned migration script doesn't work. I've heard something about a platform independent XML mediawiki dump command so I try that out. A comment in the top of that Perl script suggests running the XML dump instead of doing a direct conversion (perhaps an admission of the broken state of the conversion script?). Slight problem with that; the XML format does NOT include quite a bit of data such as user-related data. It just has the pages themselves. Clearly my users won't be happy if their user accounts are gone; so much for the XML idea. I did at least take this opportunity to upgrade the production wiki from MW 1.4 to 1.12 in order to gain access to the XML dump feature and presumably a bunch of other bugfixes and features. That upgrade went pretty smoothly, though I did have to run some maintenance scripts afterwards to rebuild things like watchlists.
  • I briefly toyed with writing my own conversion script in Python. It directly connected to both the MySQL and PG databases and copied the data over directly, while performing a few type conversions and table name changes. This actually worked pretty well except I ran into a problem with the PG sequence objects when trying to use the resulting wiki. However once I noticed the next problem, I gave up on trying to fix the sequence issue.
  • The straw that broke the camel's back, as it were, was that when running under PG, many of the command line tools in the 'maintenance' directory either stopped with random tracebacks, or produced logging output that indicated some query problems and yet continued on. This did not inspire confidence and I elected to just give up on this endeavor and continue using MySQL backend.

1 comments:

NSK Nikolaos S. Karastathis said...

Citizendium uses Postgres with MediaWiki: http://en.citizendium.org/wiki/Special:Version