Last modified: 2014-11-17 09:21:33 UTC
Tracking bug for wikis that have requests to be renamed/moved to other subdomains.
* Bug 16976 - Wikis ready for creation (tracking)
I believe this requires a script to be written to automatically change "alot" of things for it to work.
Adding bug 8217 to here.
Adding 'tracking' keyword.
(In reply to comment #0)
> Tracking bug for wikis that have requests to be renamed/moved to other
I tried to clean this up by adding the ones that were missing but were discussed.
> I believe this requires a script to be written to automatically change "alot"
> of things for it to work.
Can we at least list all the things that would need to be changed here on this bug? That would probably be the easiest way to start things.
Who does maintain this block of bugs? There are bugs being unresolved (but having appropriate consensus) for more than 3 years.
Was asked to chime in with some background detail on this topic; a few items off the top of my head that'd need poking:
* if not changing DB name, may need to trace down various things making assumptions about dbname <-> key/language/subdomain
* if changing DB name, need to make sure anything using the dbname as a key for shared data like CentralAuth is either transitioned or isn't harmed by the change
* need to make sure interwiki keys / entries are kept or forwarded
* need to make sure hostnames forward correctly (from MW config or does this need loving in the web server config?)
* if keeping old DB location, may need to force cache clear of everything with full URLs etc
* if changing DB name, need to either make sure the versions of MySQL in use support renaming databases (and confirm this doesn't have any replication problems) or that the process of moving tables from one DB to another doesn't explode
* if changing dbname, check also for things that key on dbname *outside* of mediawiki itself, such as global config files, assignment of dbname <-> cluster, etc
(In reply to comment #5)
> Was asked to chime in with some background detail on this topic; a few items
> off the top of my head that'd need poking:
> * if not changing DB name, may need to trace down various things making
> assumptions about dbname <-> key/language/subdomain
These assumptions seem to be all over the place, so I haven't even tried to tackle them.
> * if changing DB name, need to make sure anything using the dbname as a key for
> shared data like CentralAuth is either transitioned or isn't harmed by the
> * need to make sure interwiki keys / entries are kept or forwarded
> * need to make sure hostnames forward correctly (from MW config or does this
> need loving in the web server config?)
> * if keeping old DB location, may need to force cache clear of everything with
> full URLs etc
> * if changing DB name, need to either make sure the versions of MySQL in use
> support renaming databases (and confirm this doesn't have any replication
> problems) or that the process of moving tables from one DB to another doesn't
> * if changing dbname, check also for things that key on dbname *outside* of
> mediawiki itself, such as global config files, assignment of dbname <->
> cluster, etc
Sounds like we need to ask Domas for advice here.
Just for the record (to avoid it being forgotten):
"Think about toolserver as well."
That may or may not need seperate attention.
JeLuF wrote a documentation on our wikitech at https://wikitech.wikimedia.org/view/Rename_a_wiki . It never got tested nor fully reviewed though.
(In reply to comment #8)
> JeLuF wrote a documentation on our wikitech at
> https://wikitech.wikimedia.org/view/Rename_a_wiki . It never got tested nor
> fully reviewed though.
Wouldn't it be better to just copy the database (without locking) first to minimize downtime? That way all edits up till the first run would be coppied before actions afterwards can be applied to a locked wiki?
This might not be worth the time for smaller wikis but I imagine renaming larger ones would take a considerable amount of time.
Instead of a database dump, the ibd could be copied (I am assuming innodb_file_per_table, but without it, it would be quite hard to split clusters).
Is there any clue what it takes to get the ball rolling on renaming projects? These projects are waiting for this to happen for multiple years now.
so, we tried this recently in a sprint. (Reedy/mutante), and we could fix one closed wiki (pa.us to pa-us), but for the others, renaming the databases turned out to be a bigger issue. Reedy talked to Asher about renaming them and it seemed all possible.. until we ran into external storage nodes. We could not rename there because they are read-only and then Reedy had to roll things back. Ideas to work-around it?
talked to Asher, it is possible but not easy. a DBA would have to do some scripting to fix the external storage issue
Does ES contain a reference to the source db? I thought the renamed wiki could keep the pointer to the ES.
In fact, $wgDBname doesn't appear on includes/externalstore
$wgDBname is used as the mysql database name in external store. we then shard across generically named tables under each wiki db (i.e. blobs_cluster25).
the es1 databases (containing ES shards up to blobs_cluster24) are no longer replicated, so as part of the rename, each wiki db must be renamed on each of those servers individually.
Do we really need to rename a wiki, when all we need is to change the existing DNS entry to become an alias (CNAME) ?
This way all URLs continue working, we just get a deprecated synonym and we all all the time to change the various wikis and local templates to use the new interwiki code.
For HTTPS, it may be needed to adjust the certificate to be valid on the new domain and the old one (aliased to the new one).
This would apply to:
- be-x-old > be-tarask
- zh-classical > lzh
- zh-yue > yue
- zh-min-nan > nan
We'll then have years to use the new prefered interwiki codes, and all external sites will continue working with their existing URLs.
Smart search engines will detect the aliasing of the old domain, but they may require that the server returns some server identification to make sure they have identical contents and replace their past references in their caches.
HTTP also offers a header to indicate this to the client, simply by using HTTP redirects from the old URL to the new one where only the domain name is changed or by using cache-control headers specifying the prefered equivalent canonical URL to keep in cache, if we don't want to reduce the performance by forcing clients to reemit their request.
The short answer is yes.
Changing the DNS is worthwhile in its own right as it solves some immediate visual issues.
There have been all too many discussions in the past, changing the DNS does not change the names of the dumps. This point has been made repeatedly.
One other aspect is that several codes have been used illegitimately and, it does not free up the names for legitimate use.
names of dumps are very minor issues, this is technical and does not affect lots of existing contents in the database itself. As they are just local filenames, hosted on Linux servers, just some basic symbolic links will solve this minor aspect on these servers.
There are also very few users of these dumps that need them via automated tools. These tools are changed easily, their users have enough technical skills and involvement on the projects to know that they should use another name after some reasonnably small delay (they will be alerted of the change by the technical news on Meta, that they should read regularly or should have subscribed for their email or Wiki account).
Occasional users of dumps that just select them in a list will not be affected.
For the very few codes that are used illegitimately, like "nrm" for Norman, the other language coded with "nrm" is still for a minority language, for which we have never seen any demand, so we have ample time to do it immediately, and keep the old alias for long, until the time when it will start being needed and used.
Our immediate need is to allow fast resolution of BCP47 violations with the existing content and propagation of these codes in other non-Wikimedia projects that don't like this pollution (when they need these codes for their own projects or localizations), even if we keep aliases on Wikimedia projects for interwikis (and we(ll locally use resolvers to replace the codes where appropriate for BCP47 conformance, for example in HTML/XML/CSS lang attributes, or for better indexing of our contents by search engines, to avoid them performing wrong guesses about the effective language used).
Wikiemdia domains are perfectly valid with their codes. But interwiki codes cause a problem in Wikidata when they are used as if they were legitimate standard language codes. We urgently need this cleanup of interwiki codes in Wikidata, because this affects many other non-Wikimedia projects using the contents of Wikidata.
As far as I know "nan" was made as redirect to "zh-min-nan" (bug 8217 comment 10)
It should make it the other way round, i.e. "zh-min-nan" -> "nan".
The other names that needs to make them as the DNS redirect as well, as the first step of the name transition.
As I understand from the discussion so far, there are four diferent things that we are planning to be "rename", listed in ascending order of difficulty:
1. Change the interwiki prefixes (old-style / hard-coded links)
Trivial, because it just involves *adding* interwiki links. In most cases the old links should remain valid for the sake of backward compatibility.
2. Change the Wikidata language code for centralised interwiki links
3. Change the domain names of the wikis
More difficult, though Philippe seems to have thought it out.
4. Rename the database
It seems that the process is stalled because 4 cannot be done safely at the moment, but for the be-tarask, zh-min-nan, zh-yue, and zh-classical 4 is far less unimportant than 1-3. Most editors and users won't encounter the database names very often.
A reasonable approach would be to single out those four wikis with wrong language codes, and perform 1-3 without 4, so the editors are happy. The renaming of the database can be done at a later point in time under the hood.
(In reply to Daniel Zahn from comment #12)
> so, we tried this recently in a sprint. (Reedy/mutante), and we could fix
> one closed wiki (pa.us to pa-us), but for the others, renaming the databases
> turned out to be a bigger issue. Reedy talked to Asher about renaming them
> and it seemed all possible.. until we ran into external storage nodes. We
> could not rename there because they are read-only and then Reedy had to roll
> things back. Ideas to work-around it?
(In reply to Daniel Zahn from comment #13)
> talked to Asher, it is possible but not easy. a DBA would have to do some
> scripting to fix the external storage issue
Was any progress made with this? Is Sean Pringle aware of it?
(In reply to Alex Monk from comment #21)
> Is Sean Pringle aware of it?
(In reply to Shinjiman from comment #19)
> As far as I know "nan" was made as redirect to "zh-min-nan" (bug 8217
> comment 10)
> It should make it the other way round, i.e. "zh-min-nan" -> "nan".
Same case for the redirect to the other way: "zh-yue" -> "yue".
One problem is that existing templates or modules are looking for fallbacks to "zh" for missing translations by just looking for the presence of the "zh-" prefix.
The complete BCP47 mechanism for language fallbacks remains to be implemented in core MediaWiki (and then used in Lua, #language, the Translate extension (including Translate admin interfaces), and various Lua modules, and various legacy templates using #if or #switch).
And we need to port language fallback lists registered in each locale with CLDR data. But here also the CLDR data are largely incomplete. We should then still use our own more complete fallback rules even if some of them are criticable notably when they are for minority languages as spoken in different countries with very different major languages!
E.g. the Fallback for "gsw" should be "de" before "fr" in Germany and Swizerland, but it should be "fr" before "de" in France; if a country code is appended:
* "gsw-de" will only fallback to "de" ("fr" will be ignored, except possibly near Aachen, or in Kehl and in Saarland very near from the French border), before English
* "gsw-ch" will fallback to "de" before "fr" and then "en";
* "gsw-fr" (Alsacien) will fallback only to "fr" before English (German is not considered acceptable locally).
Existing CLDR libraries also lacks support for the metadata in the IANA subtag registry, when it suggests replacement codes for deprecated codes (that are still conforming, and never deleted, even of their use is strongly discouraged for new data, e.g. "jw" is still used instead of "jv", or "iw" instead of "he", in many Java applications and in Android's version of Dalvik VM that has no decent support of BCP47)
We also need a way to detect legacy BCP47 codes with unclear replacement, such as "bi" (the only language family encoded with 3-letters in ISO 639-5 which is also inherited with a 2-letter code from ISO 639-1, in addition to the 3-letter code that was also added in ISO 639-2 long before, when it was incorrectly considered as an isolated langage and today not even as a macrolanguage like "qu" Quechua that is hardly a macroanguage and should become a family too in ISO 639-5)
The implicit replacement subtags for many redundant ISO 639-2 and ISO 639-3 codes that have equivalent mappings to ISO 639-1 recommended by BCP47) should be implemented without troubles, and other 3-letter codes rom ISO639-2(B) also have a recommanded mapping to ISO639-2(T) if there's no ISO 639-1 (this is compatible with the single code used either in ISO639-3 or ISO639-5).
Most of the last comment is unrelated to this report.
(In reply to Andre Klapper from comment #24)
> Most of the last comment is unrelated to this report.
Does your own comment add anything relevant? I just recall that renaming cannot be done simply on a single code, we need to handle the legacy usage by hadling them as equivalents until everthing is fixed (this cans be difficult to check without using trackers to measure their remaining usage, spread across multiple wikis including wikis in other languages).
(In reply to Philippe Verdy from comment #25)
> (In reply to Andre Klapper from comment #24)
> > Most of the last comment is unrelated to this report.
> Does your own comment add anything relevant?
It might save people some time reading.
Please file separate aspects as separate issues.