Forums | Mahara Community
How "Bad" is MySQL
27 February 2009, 10:59 AM
I was just reading the thread in the 1.1 upgrade where someones Mahara Borked, and then noticed something in the Wiki about using Postgres if you possibly can.
I am using MySQL as that is what Moodle is running on and it keeps things simpler for me... however, having read the comments about postgres I am wondering if it is worth rebuilding with that database instead.
I would rather not do that... so really I am trying to understand what the issue is with MySQL... the big one seems to be no roll back on the DB during upgrade / migration... but as I always keep copies of the DB anyway I could still get back to the previous state.
Other than that are there many benefits to Postgres?
27 February 2009, 11:31 PMHehe.. cue the flamewar!
Rather than simply saying MySQL is terrible and Postgres is better, I'll try and outline a few good reasons based on the years I've spent (initially using MySQL, but now exclusively on Postgres if at all possible), and based on my "developer's viewpoint" - that is, looking whether the databases behave and perform the way they should, and not so much at whether they're "easy to use/newbie friendly", whatever that means.
And a disclaimer from the start.. firstly this is a long post, and secondly that we are committed to MySQL support in Mahara.
The most important thing you can ever have in a relational database system is data integrity - which means that under no circumstances does the database allow your data to get into an incorrect state. Many databases have lots of safeguards built in to ensure your data remains sane - foreign keys, check constraints, transactions etc... and so when we write Mahara we make want to make sure we can use these features to prevent people having issues like poor William had when he upgraded his site.
The main reason for this is because we're not perfect! Sometimes we make a mistake in the upgrade process, or a bug in the software, and then other people run into our bugs. The last thing we want is for someone to hit a bug that corrupts their data, because without that data, their site is worthless.
So hopefully that gives you a background into what we care most about. We also want to use a solution that both performs well (speed when returning a query or doing any database operation), scales well (is still speedy when many people are accessing it), is well documented (so we can find out how it works easily) and has a good security policy (so our users don't get stuck using something that hackers can take advantage of), among many other things. But data integrity is the most important thing.
Given our need for data integrity, the first thing we see is that we can't use MySQL's "MyISAM" storage engine for any tables. It doesn't support transactions, which clearly will allow data to become corrupt. It doesn't matter that the engine is fast - it doesn't do the most important job a database should do. So MyISAM is out. From Mahara 1.1, if your MySQL setup only allows MyISAM tables, you won't be allowed to install Mahara.
Also, MySQL's "NDB Advanced Clustering Engine" is also out. If you want to try and scale your installation this way, it's a bad idea - it requires you turn off foreign keys.
The only MySQL engine that has most of the major data integrity features is InnoDB. It supports some transactions, and it supports foreign keys. But it is still missing some important features too: CHECK constraints and transactions on database modification operations (CREATE/ALTER TABLE etc.). So we see that we can use InnoDB and it will work "most" of the time, but it's not perfect.
On the other hand, PostgreSQL doesn't have the concept of database engines. It's just got one way to store tables, that uses its MVCC model for data integrity. This model fully supports transactions (on database modification as well as INSERT/UPDATE etc.), supports CHECK constraints, foreign keys, and also supports some other useful features such as indexes on functions of columns, that Mahara sometimes uses. Here are some examples of where Mahara uses these data integrity features:
- Transactions - these are used everywhere! I won't bother to explain what they are, I'm assuming you know what they are. But I will re-iterate the point that PostgreSQL supports transactions for altering the database structure itself, something that MySQL can't do. Not even Oracle can do this! This is explained on the installation instructions page, but I'll point it out again that this means we can do upgrades without fear that if they fail, the database will be in a corrupted state. The benefits of this are obvious!
- Foreign keys - used everywhere again! Everyone knows what these are. But it amazes me that so few open source software projects don't use them. Drupal doesn't, Moodle doesn't, and so many more - yet it's a near criminal offence. Corrupted data is a nightmare to work with, it causes software bugs and "strange behaviour" that can be a mess to deal with.
- Check constraints - again, common everywhere except open source, for some reason. Mahara 1.1 ships with one in particular that will be useful. In the view and artefact tables, only one of the owner, group or institution columns can have a value. In Postgres, we have a check constraint for this, in MySQL we can't have one. So users on MySQL are more likely to suffer if there is a bug that results in more than one of these columns being set.
- Indexes on functions of columns - best demonstrated with an example. In Mahara, usernames are case insensitive, so we want to make sure that nobody can insert the user 'BOB' if there is already a user 'bob'. In postgres, we can simply create a unique index on LOWER(username), and our problem is solved. We can't create this index in MySQL, so again MySQL users miss out.
The other things that people seem to care about most are performance and scalability. As explained before, we can't use MyISAM, so we compare InnoDB against Postgres. The simple answer is that, speed wise, they're comparable - although Postgres provides all the extra integrity features. The truth is though, the database isn't the bottleneck on small sites - the main bottleneck is CPU when processing the PHP pages.
On bigger sites, the scalability of the database becomes important. And sadly, MySQL simply falls over flat. Under high concurrent user access, MySQL's performance drops through the floor. What's worse is that it has bad locking characteristics. For example, when moodle.org does its backups, the site is basically unusable. Whereas Postgres is much smarter about locking, and doesn't need full table locks for almost any operation it performs.
The last thing I'll mention is this: have you ever had to use REPAIR TABLE in MySQL? What a joke. It's sad the command even exists, let alone the fact that it's needed so often. There are clearly some deep-seated bugs in MySQL to do with integrity that they need to sort out.
Anyway, this has been an extremely long post, but I hope it can serve as a reference for everyone when choosing what database to use for their Mahara installation. I'll summarise with my personal opinion, and a bunch of links you can read to get some more background.
Personally, I wish we didn't have to support MySQL in Mahara. It requires us to do more testing, work around more problems and make compromises. If anyone tells me they're a "MySQL expert", I just laugh to myself, because I think that if they were a "database expert" they'd be using Postgres. But most people use MySQL, because it's the "database" with all the mindshare right now, so we support it in Mahara.
MySQL support in Mahara isn't ever going to disappear, don't get me wrong. But I think you're selling yourself short if you use it when you could use Postgres instead.
Here are some links:
- http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL - a reasonable, recent comparison. It has some facts about Postgres incorrect, such as the entire section about Insert Ignore/Replace (postgres can easily insert more than one row at a time, in 8.3 it adds support for the multiple form of INSERT, but it's also supported COPY for ever)
- http://tweakers.net/reviews/657/6 - MySQL vs. Postgres scalability
- http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 - Yes it's on the postgres wiki, but that at least means the facts about postgres are correct
- http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis - my example to the author of the previous report encouraged this page
28 February 2009, 2:03 PM
Your arguments are convincing, and I'm willing to give Postgres a try, hopefully as I do the upgrade from Mahara 1.0.9 to 1.1.0.
So... how difficult is it to convert a MySQL database (our existing Mahara installation) to Postgres?
28 February 2009, 3:08 PM
Hi Dave -
I've done a few conversion, I did both Moodle and Serendipity - unfortunately I don't know where the Moodle script I wrote is anymore, but I wrote a blog post about how I migrated Serendipity here: http://she.geek.nz/archives/428-migrated-to-postgres.html
01 March 2009, 8:13 AM
I want to add two more things. The initial data we are seeing from people registering their Mahara installs is that, not surprising, most installations are on MySQL. But there's a few out there on Postgres, which is encouraging to see. And if I meet any Mahara admin out there, perhaps at a conference or other future event, who is running Mahara on postgres, I will buy you a beer to celebrate you making the right choice . Just approach me and say "I'm running Mahara on Postgres, buy me a beer" and you've got it!
So there you go. Postgres is the smarter choice, and it'll win you a beer. It can't get much better!
Last thing - if you're having trouble switching for whatever reason, you're welcome to ask postgres related questions in the open discussion forum, I for one will be happy to help
01 March 2009, 11:13 AM
Nice to know I'll have a free beer when we meet! I'll buy you a mojito in return
26 October 2009, 11:18 AMYay! I have several postgres server running. Are you going to buy me one beer per cluster? :-p
26 October 2009, 6:09 PM
I was careful about the wording of my previous post - I am going to interpret it as "if you run a mahara on a postgres, you are entitled to one beer in total"
Though for you I would buy more beers, for adminlang and other such assistance
25 October 2009, 10:20 AM
We should have listened to Nigel earlier
We started with MySQL and found ourself helpless, we have finally shifted to PostgreSQL.
==== Details ====
We started with MySQL simply because we had experiance in it and we had no problems in past.
I think the real problem is not with MySQL but with the way Mahara is written, if for any reason Mahara installation is interepted (browser crash or something) then with MySQL it generates an error and the only way to continue is by dumping all tables. This can be very bad during an upgrade. I am not at all saying that Mahra is bad, it is only it's support with MySQL. We encountered problems with Mahara-MySQL only with installations and upgrades, normal operations and working of Mahara was found to be OK with MySQL.
With PostgreSQL Mahara rocks! We tested mahara many times with PostgreSQL by forcing intereption in installation. All Mahara-PostgreSQL installations continued perfectly.
26 October 2009, 2:49 AM
Glad you're having a good experience with Postgres! I feel like I need to reply to your points about MySQL support though.
The Mahara developers do take MySQL support seriously, but unfortunately there are underlying features of the database that just don't work.
Postgres supports transactional DDL statements. That means that you can start a transaction, send a mixture of schema and data statements, and then commit them all together. If this is interrupted halfway through, the transaction is not committed, and your database is left in the state as it was in when you started the transaction.
The problem with MySQL is not the way that Mahara is written, it is that it doesn't support transactional DDL statements. Anything can go wrong during an upgrade, even at the underlying network level between the webserver and the database server (eg, nothing to do with Mahara at all), and you'll still have the same problem with MySQL.
I hope that clarifies a bit where the problem lies :)