Forums | Mahara Translation
Translations
/
Unused Strings
05 November 2010, 14:03
It appears to me that the $string['privacydefaultcontent'] in htdocs/lang/en.utf8/install.php is unused. I'm going to keep the string in en_US for now, just in case it gets used in the future, but it raises the question:
What is the appropriate way to deal with unused strings?
Post a query about them, like this post?
Ignore them?
File a bug in the tracker?
I guess a related question is: Am I mistaken about that string being unused?
06 November 2010, 14:20
That is a very good question, Rich, and I also like to know how other translators are dealing with this issue.
I can recall that I deleted the obsolete strings from v.1.1 when upgrading the Spanish language package to v1.2.
07 November 2010, 14:41
We should delete unused strings from the default language pack, otherwise people will waste time translating them, so if you find them, file a bug I guess.
There are probably quite a few unused strings lying around because it's so easy to forget to remove them when you're deleting the code that uses them.
It'd be quite an effort to find them all, though I guess I'd start with a simple script to grep through and find all the strings that obviously *are* being used and produce a shorter list to be checked by a human.
It can be a bit annoying trying to determine whether strings are used because they're not always referenced by the entire key in the code. For example, I think 'privacydefaultcontent' is used during the initial installation where the placeholder content for 'site pages' gets entered into the database. But the line of code that uses it is something like:
$page->content = get_string($page->name . 'defaultcontent', 'install');
so grepping for 'privacydefaultcontent' doesn't find it.
08 November 2010, 12:31
OK, so we can't just grep for the key, because sometimes it is created through concatenation or it is represented by a variable or whatever.
What are the things I should grep for to get a list of everything (even if it's variable expansion, concatenation, etc.) that is used? Obviously get_string. get_string_from_language too, yeah? Anything else? Seems like nothing outside of mahara.lib uses get_string_location, get_string_local, or get_string_from_file.
Maybe I'll start the tedious process....
Maybe...
Oh, and of course, you're totally right about the $page->name . defaultcontent. It's in upgrade.php, exactly as you remember it. So, I guess privacydefaultcontent gets used after all.
08 November 2010, 14:35
Hi Rich,
That would be great if you could be bothered to clean these up!
The other thing to grep for would be the template function str:
{str tag=foo section=bar}
The cases like privacydefaultcontent where the key doesn't appear anywhere are hopefully rare, so if you did grep for the keys, I think you'd end up with a list mostly made up of unused strings that wouldn't be too long.
A lot of the strings in that shorter list will have keys that are obviously referenced by concatenation, for example there will be several strings whose keys end in "defaultcontent", and lots of strings beginning with "country.".
09 November 2010, 10:15
To find everything, I ran this on a UNIX-like operating system from the mahara directory:
find htdocs -type f -exec perl -wnle '/\{\s*str\s|get_string[\s\(]|get_string_from_language[\s\(]/ and print "$ARGV: $_"' {} \;
(The above is one big line. Ignore any line breaks your browser is inserting. Just cut and paste.)
When I run that I get 3335 lines of output.
Ouch.
09 November 2010, 14:32
That's a daunting list alright but it just tells us all the places where strings are used. We could get a smaller list if we first turned that output into (key, section) pairs (might require more context for get_string calls spanning >1 line), and then did a pass over all the language pack files looking for occurrences of every key,section and spitting out ones that don't match at all. I hope that'd be shorter!
10 November 2010, 10:57
Dealing with multiple-line entries: Just strip all whitespace out of the file before processing.
At the bottom of this post is a Perl script to find all instances of get_string(). We can add get_string_from_language and the {str} stuff later.
Problems with the output:
1) There are false positives because the code base has a JavaScript function called get_string() in a few files.
* Proposed solution: Ignore files with a .js extension. Sound about right?
2) If there are other method calls inside the parentheses that delineate the parameters for the get_string method, that messes up the regexp.
* Possible workaround: No need to worry about it as the output it provides is nonetheless sufficient. That may be wishful thinking.
* Proposed solution: Brush up on the regexp-fu and fix it. Or maybe regexp is the wrong tool for the job and we need to use a parser instead. Um...ugh.
3) False positives from comments?
*Possible workaround: Maybe there aren't any of those once we get rid of the JavaScript false positives....
As is, the script finds 2551 instances of get_string() using 1819 different parameter variations. The most common parameter is ('cancel') followed by ('accessdenied','error'), ('yes'), ('no'), ('save'), and ('submit').
#!/usr/bin/perl use strict; use warnings; use File::Find; if (! defined($ARGV[0])) { print "Usage: $0 /path/to/mahara/htdocs\n"; exit; } my $pathToMaharaHtdocs = $ARGV[0]; find(\&wanted, $pathToMaharaHtdocs); sub wanted { # Ignore directories -d and return; my (@matches, $argument, $match); open(FILE, $File::Find::name) or die "Can't read file $File::Find::name : $! "; local $/; my $document = ; if (! defined($document)) { die "Can't read $File::Find::name : $!"; } close (FILE); $document =~ s/\s+//g; @matches = $document =~ /get_string\((.*?)\)/g; foreach $match (@matches) { print "$match\n"; } }