Last modified: 2014-07-19 01:19:54 UTC
Sometimes a lot of languages need to be processed in one batch. Initializing a Language object in such cases wastes a lot of memory. Some refactoring there is needed - maybe another lightweight class just for getting simple language info, maybe a lazier initialization, etc.
Being more specific on the use case than "sometimes" would be very helpful. As a first action, you may want to add profiling calls to provide some more detailed insight.
(In reply to comment #1) > Being more specific on the use case than "sometimes" would be very helpful. In both cases I need to retrieve the dir value for a long list of languages. See the blocked bugs.
(In reply to comment #2) > (In reply to comment #1) > > Being more specific on the use case than "sometimes" would be very helpful. > > In both cases I need to retrieve the dir value for a long list of languages. Seems like a good candidate to stuff in cache object.
I have been thinking for some time about whether we should make a static function for getting the direction, so you have an array of language codes that are RTL instead of putting $rtl = true; in the Messages files. So like: static function isRtl( $langcode ) { $rtl = array ( ... ); return in_array( $langcode, $rtl ); } Benefits: * Then you don't need a Language object for such a simple boolean (in many cases, when making <div lang="" dir=""> you just have a language code and you currently need to call a Language object just for the direction.) * We can add this info very easily for languages that don't have a Messages file yet. What do you think?
I considered this, although I'd love to make as little duplication as possible. To be smarter, I would: 1. Investigate what makes the current Language so heavy and try to make it lazy. 2. Try to merge it with the YAML langdb, which is used in the Universal Language Selector.
(In reply to comment #5) > I considered this, although I'd love to make as little duplication as possible. How would that be duplication? (I very much dislike duplication, to be clear) We can make isRtl() static (with benefits mentioned above) regardless of fixing the heavy initialization of Language.
*** Bug 41596 has been marked as a duplicate of this bug. ***
Caching the LTR values for all languages is solving only a very limited issue while leaving the larger one unsolved: we sometimes need information about *languages* without needing the messages. For example, for Wikidata we also need the custom to-lowercase function for each language, to normalize search terms. I expect there are more things we will need to know about languages. So, I don't think caching specific values is a solution (although we can do that in addition). We really need to be able to get Language objects without loading all the messages. This could be done by lazy initialization - only loading the messages when they are first used.
Amir suggested me to put this here: https://gerrit.wikimedia.org/r/#/c/35383/ (I don't think it's supposed to be merged, but please keep in mind)
The most used ones in Wikidata is language names, direction, lists, truncating and inserting end marker. I suspect that other functions could be more important later on.
Is there that much to lazy in terms of messages? I can see only $preloadedMessages loaded in LocalisationCache for other languages in ViewItem (plus everything needed for wgContLang/wgUserLang, and for $preloadedMessages, fallback chain is respected, thus loading every language in a fallback chain). RTL, as it was stated above, is not a big deal either. On the other hand, even $preloadedMessages for, let's say, 25 languages... how much memory does it take? We can postpone self::getLocalisationCache() call until the first message is requested, but the effect of such "optimization" will be smoothed, because $rtl, fallback encodings, namespaces, etc. all belong to Messages file, which is loaded and cared by LocalisationCache (which will anyway load $preloadedMessages). Writing a work-around for LocalisationCache (if the issue is actually in $preloadedMessages) is the worst case resolution for the issue imho.
(In reply to comment #8) > ... We really need to be able to get Language objects without > loading all the messages. This could be done by lazy initialization - only > loading the messages when they are first used. Strongly agreed on that. For the majority of languages other than English, loading the messages includes loading two fallback language message files, too, see: http://www.mediawiki.org/wiki/File:MediaWiki_fallback_chains.svg A 100% percent coverage is rare. Yet, most often, the untranslated messages are also the ones hardly ever used. So loading messages *is* heavy and should be lazy, as well as loading fallback language messages.
[replacing wikidata keyword by adding CC - see bug 56417]
Re-assessed severity based on https://www.mediawiki.org/wiki/Bugzilla/Fields#importance