Last modified: 2014-07-19 01:19:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43103, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41103 - initialization of the Language object is very heavy
initialization of the Language object is very heavy
Status: NEW
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n, performance
: 41596 (view as bug list)
Depends on:
Blocks: 38674 41723 38439 40238
  Show dependency treegraph
 
Reported: 2012-10-17 10:42 UTC by Amir E. Aharoni
Modified: 2014-07-19 01:19 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Amir E. Aharoni 2012-10-17 10:42:14 UTC
Sometimes a lot of languages need to be processed in one batch. Initializing a Language object in such cases wastes a lot of memory. Some refactoring there is needed - maybe another lightweight class just for getting simple language info, maybe a lazier initialization, etc.
Comment 1 Siebrand Mazeland 2012-10-17 12:07:22 UTC
Being more specific on the use case than "sometimes" would be very helpful.

As a first action, you may want to add profiling calls to provide some more detailed insight.
Comment 2 Amir E. Aharoni 2012-10-17 12:08:51 UTC
(In reply to comment #1)
> Being more specific on the use case than "sometimes" would be very helpful.

In both cases I need to retrieve the dir value for a long list of languages.

See the blocked bugs.
Comment 3 Siebrand Mazeland 2012-10-17 13:11:22 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Being more specific on the use case than "sometimes" would be very helpful.
> 
> In both cases I need to retrieve the dir value for a long list of languages.

Seems like a good candidate to stuff in cache object.
Comment 4 Robin Pepermans (SPQRobin) 2012-10-18 23:30:04 UTC
I have been thinking for some time about whether we should make a static function for getting the direction, so you have an array of language codes that are RTL instead of putting $rtl = true; in the Messages files. So like: static function isRtl( $langcode ) { $rtl = array ( ... ); return in_array( $langcode, $rtl ); }

Benefits:
* Then you don't need a Language object for such a simple boolean (in many cases, when making <div lang="" dir=""> you just have a language code and you currently need to call a Language object just for the direction.)
* We can add this info very easily for languages that don't have a Messages file yet.

What do you think?
Comment 5 Amir E. Aharoni 2012-10-19 06:36:38 UTC
I considered this, although I'd love to make as little duplication as possible. To be smarter, I would:
1. Investigate what makes the current Language so heavy and try to make it lazy.
2. Try to merge it with the YAML langdb, which is used in the Universal Language Selector.
Comment 6 Robin Pepermans (SPQRobin) 2012-10-29 00:30:15 UTC
(In reply to comment #5)
> I considered this, although I'd love to make as little duplication as possible.

How would that be duplication? (I very much dislike duplication, to be clear)
We can make isRtl() static (with benefits mentioned above) regardless of fixing the heavy initialization of Language.
Comment 7 Andre Klapper 2012-11-02 11:41:09 UTC
*** Bug 41596 has been marked as a duplicate of this bug. ***
Comment 8 Daniel Kinzler 2012-11-02 12:50:18 UTC
Caching the LTR values for all languages is solving only a very limited issue while leaving the larger one unsolved: we sometimes need information about *languages* without needing the messages. For example, for Wikidata we also need the custom to-lowercase function for each language, to normalize search terms. I expect there are more things we will need to know about languages. 

So, I don't think caching specific values is a solution (although we can do that in addition). We really need to be able to get Language objects without loading all the messages. This could be done by lazy initialization - only loading the messages when they are first used.
Comment 9 Pavel Selitskas [wizardist] 2012-11-27 14:03:04 UTC
Amir suggested me to put this here:

https://gerrit.wikimedia.org/r/#/c/35383/ (I don't think it's supposed to be merged, but please keep in mind)
Comment 10 jeblad 2012-11-28 07:46:18 UTC
The most used ones in Wikidata is language names, direction, lists, truncating and inserting end marker. I suspect that other functions could be more important later on.
Comment 11 Pavel Selitskas [wizardist] 2012-12-07 00:01:39 UTC
Is there that much to lazy in terms of messages? I can see only $preloadedMessages loaded in LocalisationCache for other languages in ViewItem (plus everything needed for wgContLang/wgUserLang, and for $preloadedMessages, fallback chain is respected, thus loading every language in a fallback chain). RTL, as it was stated above, is not a big deal either.

On the other hand, even $preloadedMessages for, let's say, 25 languages... how much memory does it take? We can postpone self::getLocalisationCache() call until the first message is requested, but the effect of such "optimization" will be smoothed, because $rtl, fallback encodings, namespaces, etc. all belong to Messages file, which is loaded and cared by LocalisationCache (which will anyway load $preloadedMessages).

Writing a work-around for LocalisationCache (if the issue is actually in $preloadedMessages) is the worst case resolution for the issue imho.
Comment 12 Purodha Blissenbach 2013-05-27 10:05:14 UTC
(In reply to comment #8)
> ... We really need to be able to get Language objects without
> loading all the messages. This could be done by lazy initialization - only
> loading the messages when they are first used.

Strongly agreed on that.

For the majority of languages other than English, loading the messages includes loading two fallback language message files, too, see:
http://www.mediawiki.org/wiki/File:MediaWiki_fallback_chains.svg
A 100% percent coverage is rare. Yet, most often, the untranslated messages are also the ones hardly ever used. So loading messages *is* heavy and should be lazy, as well as loading fallback language messages.
Comment 13 Andre Klapper 2013-10-31 12:17:22 UTC
[replacing wikidata keyword by adding CC - see bug 56417]
Comment 14 Siebrand Mazeland 2014-04-24 20:27:26 UTC
Re-assessed severity based on https://www.mediawiki.org/wiki/Bugzilla/Fields#importance

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links