Forums | Mahara Translation
Automatic validation of language packs
01 July 2010, 0:47
Recently while doing a security review of Mahara, we thought it'd be a good idea to try to remove anything potentially dangerous from the language packs, because we currently don't check through them ourselves, and because they can include any old php code in them.
So we now have a script that removes pretty much anything from the language files that isn't just of the form "$string['x'] = 'y';" (and a few variations on this). I've run it over the existing language packs, and then also checked the resulting files for any php syntax errors and also for non-utf8 characters.
The results are at http://langpacks.dev.mahara.org/
On that page, 'diff' is a link to show what changes were made by the php code remover, in diff format. The 'errors' link shows any php syntax or utf8 errors.
A script is running on that server once every hour to pull the most recent changes from git and then rerun the validation automatically. Eventually (once we get rid of all the errors), it'd be great to change the official links from wiki.mahara.org to point to these cleaned up language packs instead of the raw versions from git. (Because these new language packs won't be on gitorious.org, it'll have the added advantage of avoiding the annoying "Regenerating tarball, try again later" message).
When the script runs and strips php out, it will still regenerate the tarball on that page. The worst that can really happen in that situation is that some strings go missing. Syntax errors and utf8 errors will stop the tarball from being generated, because these errors could stop Mahara from working properly (see for example https://bugs.launchpad.net/mahara/+bug/513331).
For any of you who are committing directly to gitorious, it'd be great if you could have a look at those files and fix any errors there. For the others, I'll try to clean the php stuff myself in the next couple of weeks. I can't easily fix the utf8 errors without knowing what characters should be used.
01 July 2010, 1:41
Luckily, Japanese language packs are all fine now.
01 July 2010, 5:39
I have just fixed the es.utf8 language pack :-)
01 July 2010, 6:44Hello, I just saw Iñaki correptions (we were looking at the repository package at the same time). Gracias, Iñaki. I will change the links in the Language page to point to the validated package. Regards
01 July 2010, 6:54Just one last remark: the code that the php remover deleted (apart from the non-compliant utf8 characters) was added because in some Mahara installations, forum dates were displayed in English, rather than in Spanish. Here is the discussion threat: http://mahara.org/interaction/forum/topic.php?id=723 So , is not langconfig.php the best file to insert the code? Regards Mari
01 July 2010, 16:32
I think I've fixed that problem for the master branch at least, by calling setlocale when the site language is first determined and adding an actual locales string into langconfig, see this bug:
I hope all you need to do is change that langconfig stuff to this:
$string['locales'] = 'es_ES.UTF-8,spanish';
I'll also need to apply that fix to 1.1/1.2, so it's probably best to at least wait for the next 1.1/1.2 releases before we link to the new tarballs.
02 July 2010, 6:46Thanks, Richard.
08 August 2010, 16:26
nl (Dutch) is fixed too.
10 August 2010, 20:42
Cool, thanks Koen. By the way, those strings that are being removed by the processor (e.g. in http://langpacks.dev.mahara.org/nl-1.2_STABLE-diff.txt) can be fixed by removing the "get_config('wwwroot')" from the string.
A while back we removed all those from the default English langpack. If you check the English version of those strings you should see what you need to do -- in most cases it'll just be a matter of replacing "get_config('wwwroot')" with "%s".
12 August 2010, 6:45
Thanks for the follow up Richard. Should all be fixed now.