Last modified: 2014-11-17 09:21:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21986, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 19986 - (wikis-to-rename) Wikis waiting to be renamed (tracking)

(wikis-to-rename)
Summary:	Wikis waiting to be renamed (tracking)

Status:	NEW

Product:	Wikimedia
Classification:	Unclassified
Component:	Site requests (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low enhancement with 7 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n, ops, tracking

Depends on:	8217 9823 15988 23215 23216 23217 25522 26725 28441 28442 28443 29186 29919 34217 39482 39968 46141 62717 4793 8540 15357 17047 17592 23537 30376 31335 34866 38763 69670
Blocks:	tracking
	Show dependency tree / graph

Reported:	2009-07-29 09:23 UTC by p858snake
Modified:	2014-11-17 09:21 UTC (History)
CC List:	38 users (show)

See Also:	42396
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description p858snake 2009-07-29 09:23:52 UTC

Tracking bug for wikis that have requests to be renamed/moved to other subdomains.

Also See:
* Bug 16976 - Wikis ready for creation (tracking)

------
I believe this requires a script to be written to automatically change "alot" of things for it to work.

Comment 1 Shinjiman 2009-07-29 10:49:32 UTC

Adding bug 8217 to here.

Comment 2 Andrew Garrett 2009-07-29 16:39:00 UTC

Adding 'tracking' keyword.

Comment 3 Casey Brown 2010-04-17 00:56:56 UTC

(In reply to comment #0)
> Tracking bug for wikis that have requests to be renamed/moved to other
> subdomains.
> 

I tried to clean this up by adding the ones that were missing but were discussed.

> I believe this requires a script to be written to automatically change "alot"
> of things for it to work.

Can we at least list all the things that would need to be changed here on this bug?  That would probably be the easiest way to start things.

Comment 4 Pavel Selitskas [wizardist] 2010-06-02 16:47:15 UTC

Who does maintain this block of bugs? There are bugs being unresolved (but having appropriate consensus) for more than 3 years.

Comment 5 Brion Vibber 2010-10-04 19:38:43 UTC

Was asked to chime in with some background detail on this topic; a few items off the top of my head that'd need poking:

* if not changing DB name, may need to trace down various things making assumptions about dbname <-> key/language/subdomain
* if changing DB name, need to make sure anything using the dbname as a key for shared data like CentralAuth is either transitioned or isn't harmed by the change
* need to make sure interwiki keys / entries are kept or forwarded
* need to make sure hostnames forward correctly (from MW config or does this need loving in the web server config?)
* if keeping old DB location, may need to force cache clear of everything with full URLs etc
* if changing DB name, need to either make sure the versions of MySQL in use support renaming databases (and confirm this doesn't have any replication problems) or that the process of moving tables from one DB to another doesn't explode
* if changing dbname, check also for things that key on dbname *outside* of mediawiki itself, such as global config files, assignment of dbname <-> cluster, etc

Comment 6 Roan Kattouw 2010-12-03 21:26:12 UTC

(In reply to comment #5)
> Was asked to chime in with some background detail on this topic; a few items
> off the top of my head that'd need poking:
> 
> * if not changing DB name, may need to trace down various things making
> assumptions about dbname <-> key/language/subdomain
These assumptions seem to be all over the place, so I haven't even tried to tackle them.

> * if changing DB name, need to make sure anything using the dbname as a key for
> shared data like CentralAuth is either transitioned or isn't harmed by the
> change
> * need to make sure interwiki keys / entries are kept or forwarded
> * need to make sure hostnames forward correctly (from MW config or does this
> need loving in the web server config?)
> * if keeping old DB location, may need to force cache clear of everything with
> full URLs etc
> * if changing DB name, need to either make sure the versions of MySQL in use
> support renaming databases (and confirm this doesn't have any replication
> problems) or that the process of moving tables from one DB to another doesn't
> explode
> * if changing dbname, check also for things that key on dbname *outside* of
> mediawiki itself, such as global config files, assignment of dbname <->
> cluster, etc
Sounds like we need to ask Domas for advice here.

Comment 7 Krinkle 2011-04-06 16:02:25 UTC

Just for the record (to avoid it being forgotten):

"Think about toolserver as well."

That may or may not need seperate attention.

Comment 8 Antoine "hashar" Musso (WMF) 2012-10-23 17:08:54 UTC

JeLuF wrote a documentation on our wikitech at https://wikitech.wikimedia.org/view/Rename_a_wiki . It never got tested nor fully reviewed though.

Comment 9 とある白い猫 2012-10-31 19:49:34 UTC

(In reply to comment #8)
> JeLuF wrote a documentation on our wikitech at
> https://wikitech.wikimedia.org/view/Rename_a_wiki . It never got tested nor
> fully reviewed though.

Wouldn't it be better to just copy the database (without locking) first to minimize downtime? That way all edits up till the first run would be coppied before actions afterwards can be applied to a locked wiki?

This might not be worth the time for smaller wikis but I imagine renaming larger ones would take a considerable amount of time.

Comment 10 Platonides 2012-11-03 17:55:35 UTC

Instead of a database dump, the ibd could be copied (I am assuming innodb_file_per_table, but without it, it would be quite hard to split clusters).

Comment 11 Gerard Meijssen 2013-02-18 20:34:50 UTC

Is there any clue what it takes to get the ball rolling on renaming projects? These projects are waiting for this to happen for multiple years now.
Thanks,
     Gerard

Comment 12 Daniel Zahn 2013-03-21 18:43:55 UTC

so, we tried this recently in a sprint. (Reedy/mutante), and we could fix one closed wiki (pa.us to pa-us), but for the others, renaming the databases turned out to be a bigger issue. Reedy talked to Asher about renaming them and it seemed all possible.. until we ran into external storage nodes. We could not rename there because they are read-only and then Reedy had to roll things back. Ideas to work-around it?

Comment 13 Daniel Zahn 2013-03-21 18:56:40 UTC

talked to Asher, it is possible but not easy. a DBA would have to do some scripting to fix the external storage issue

Comment 14 Platonides 2013-03-21 19:24:32 UTC

Does ES contain a reference to the source db? I thought the renamed wiki could keep the pointer to the ES.
In fact, $wgDBname doesn't appear on includes/externalstore

Comment 15 Asher Feldman 2013-08-28 21:47:32 UTC

$wgDBname is used as the mysql database name in external store.  we then shard across generically named tables under each wiki db (i.e. blobs_cluster25).  

the es1 databases (containing ES shards up to blobs_cluster24) are no longer replicated, so as part of the rename, each wiki db must be renamed on each of those servers individually.

Comment 16 Philippe Verdy 2013-12-02 12:12:07 UTC

Do we really need to rename a wiki, when all we need is to change the existing DNS entry to become an alias (CNAME) ?

This way all URLs continue working, we just get a deprecated synonym and we all all the time to change the various wikis and local templates to use the new interwiki code.

For HTTPS, it may be needed to adjust the certificate to be valid on the new domain and the old one (aliased to the new one).

This would apply to:
- be-x-old > be-tarask
- zh-classical > lzh
- zh-yue > yue
- zh-min-nan > nan

We'll then have years to use the new prefered interwiki codes, and all external sites will continue working with their existing URLs.

Smart search engines will detect the aliasing of the old domain, but they may require that the server returns some server identification to make sure they have identical contents and replace their past references in their caches.

HTTP also offers a header to indicate this to the client, simply by using HTTP redirects from the old URL to the new one where only the domain name is changed or by using cache-control headers specifying the prefered equivalent canonical URL to keep in cache, if we don't want to reduce the performance by forcing clients to reemit their request.

Comment 17 Gerard Meijssen 2013-12-02 13:10:55 UTC

The short answer is yes. 

Changing the DNS is worthwhile in its own right as it solves some immediate visual issues. 

There have been all too many discussions in the past, changing the DNS does not change the names of the dumps. This point has been made repeatedly.

One other aspect is that several codes have been used illegitimately and, it does not free up the names for legitimate use.
Thanks,
    GerardM

Comment 18 Philippe Verdy 2013-12-02 13:29:29 UTC

names of dumps are very minor issues, this is technical and does not affect lots of existing contents in the database itself. As they are just local filenames, hosted on Linux servers, just some basic symbolic links will solve this minor aspect on these servers.

There are also very few users of these dumps that need them via automated tools. These tools are changed easily, their users have enough technical skills and involvement on the projects to know that they should use another name after some reasonnably small delay (they will be alerted of the change by the technical news on Meta, that they should read regularly or should have subscribed for their email or Wiki account).

Occasional users of dumps that just select them in a list will not be affected.

For the very few codes that are used illegitimately, like "nrm" for Norman, the other language coded with "nrm" is still for a minority language, for which we have never seen any demand, so we have ample time to do it immediately, and keep the old alias for long, until the time when it will start being needed and used.

Our immediate need is to allow fast resolution of BCP47 violations with the existing content and propagation of these codes in other non-Wikimedia projects that don't like this pollution (when they need these codes for their own projects or localizations), even if we keep aliases on Wikimedia projects for interwikis (and we(ll locally use resolvers to replace the codes where appropriate for BCP47 conformance, for example in HTML/XML/CSS lang attributes, or for better indexing of our contents by search engines, to avoid them performing wrong guesses about the effective language used).

Wikiemdia domains are perfectly valid with their codes. But interwiki codes cause a problem in Wikidata when they are used as if they were legitimate standard language codes. We urgently need this cleanup of interwiki codes in Wikidata, because this affects many other non-Wikimedia projects using the contents of Wikidata.

Comment 19 Shinjiman 2013-12-03 03:20:51 UTC

As far as I know "nan" was made as redirect to "zh-min-nan" (bug 8217 comment 10)

It should make it the other way round, i.e. "zh-min-nan" -> "nan".

The other names that needs to make them as the DNS redirect as well, as the first step of the name transition.

Comment 20 Deryck Chan 2013-12-04 13:48:32 UTC

As I understand from the discussion so far, there are four diferent things that we are planning to be "rename", listed in ascending order of difficulty:

1. Change the interwiki prefixes (old-style / hard-coded links)
Trivial, because it just involves *adding* interwiki links. In most cases the old links should remain valid for the sake of backward compatibility.

2. Change the Wikidata language code for centralised interwiki links

3. Change the domain names of the wikis
More difficult, though Philippe seems to have thought it out.

4. Rename the database

It seems that the process is stalled because 4 cannot be done safely at the moment, but for the be-tarask, zh-min-nan, zh-yue, and zh-classical 4 is far less unimportant than 1-3. Most editors and users won't encounter the database names very often.

A reasonable approach would be to single out those four wikis with wrong language codes, and perform 1-3 without 4, so the editors are happy. The renaming of the database can be done at a later point in time under the hood.

Comment 21 Alex Monk 2014-05-20 13:14:19 UTC

(In reply to Daniel Zahn from comment #12)
> so, we tried this recently in a sprint. (Reedy/mutante), and we could fix
> one closed wiki (pa.us to pa-us), but for the others, renaming the databases
> turned out to be a bigger issue. Reedy talked to Asher about renaming them
> and it seemed all possible.. until we ran into external storage nodes. We
> could not rename there because they are read-only and then Reedy had to roll
> things back. Ideas to work-around it?
(In reply to Daniel Zahn from comment #13)
> talked to Asher, it is possible but not easy. a DBA would have to do some
> scripting to fix the external storage issue

Was any progress made with this? Is Sean Pringle aware of it?

Comment 22 Sam Reed (reedy) 2014-05-22 21:59:42 UTC

(In reply to Alex Monk from comment #21)
> Is Sean Pringle aware of it?

Yes

Comment 23 Philippe Verdy 2014-07-16 03:09:02 UTC

(In reply to Shinjiman from comment #19)
> As far as I know "nan" was made as redirect to "zh-min-nan" (bug 8217
> comment 10)
> 
> It should make it the other way round, i.e. "zh-min-nan" -> "nan".

Same case for the redirect to the other way: "zh-yue" -> "yue".

One problem is that existing templates or modules are looking for fallbacks to "zh" for missing translations by just looking for the presence of the "zh-" prefix.

The complete BCP47 mechanism for language fallbacks remains to be implemented in core MediaWiki (and then used in Lua, #language, the Translate extension (including Translate admin interfaces), and various Lua modules, and various legacy templates using #if or #switch).

And we need to port language fallback lists registered in each locale with CLDR data. But here also the CLDR data are largely incomplete. We should then still use our own more complete fallback rules even if some of them are criticable notably when they are for minority languages as spoken in different countries with very different major languages!

E.g. the Fallback for "gsw" should be "de" before "fr" in Germany and Swizerland, but it should be "fr" before "de" in France; if a country code is appended:
* "gsw-de" will only fallback to "de" ("fr" will be ignored, except possibly near Aachen, or in Kehl and in Saarland very near from the French border), before English
* "gsw-ch" will fallback to "de" before "fr" and then "en";
* "gsw-fr" (Alsacien) will fallback only to "fr" before English (German is not considered acceptable locally).

Existing CLDR libraries also lacks support for the metadata in the IANA subtag registry, when it suggests replacement codes for deprecated codes (that are still conforming, and never deleted, even of their use is strongly discouraged for new data, e.g. "jw" is still used instead of "jv", or "iw" instead of "he", in many Java applications and in Android's version of Dalvik VM that has no decent support of BCP47)

We also need a way to detect legacy BCP47 codes with unclear replacement, such as "bi" (the only language family encoded with 3-letters in ISO 639-5 which is also inherited with a 2-letter code from ISO 639-1, in addition to the 3-letter code that was also added in ISO 639-2 long before, when it was incorrectly considered as an isolated langage and today not even as a macrolanguage like "qu" Quechua that is hardly a macroanguage and should become a family too in ISO 639-5)

The implicit replacement subtags for many redundant ISO 639-2 and ISO 639-3 codes that have equivalent mappings to ISO 639-1 recommended by BCP47) should be implemented without troubles, and other 3-letter codes rom ISO639-2(B) also have a recommanded mapping to ISO639-2(T) if there's no ISO 639-1 (this is compatible with the single code used either in ISO639-3 or ISO639-5).

Comment 24 Andre Klapper 2014-07-16 10:14:37 UTC

Most of the last comment is unrelated to this report.

Comment 25 Philippe Verdy 2014-07-16 10:58:59 UTC

(In reply to Andre Klapper from comment #24)
> Most of the last comment is unrelated to this report.
Does your own comment add anything relevant? I just recall that renaming cannot be done simply on a single code, we need to handle the legacy usage by hadling them as equivalents until everthing is fixed (this cans be difficult to check without using trackers to measure their remaining usage, spread across multiple wikis including wikis in other languages).

Comment 26 Andre Klapper 2014-07-16 11:14:04 UTC

(In reply to Philippe Verdy from comment #25)
> (In reply to Andre Klapper from comment #24)
> > Most of the last comment is unrelated to this report.
> Does your own comment add anything relevant?

It might save people some time reading. 
Please file separate aspects as separate issues.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links

afeldman
allen4names
brion
bug
bugs
bugzilla+org.wikimedia
dacuetu
danny.b
deryckchan
domas.mituzas
dzahn
federicoleva
filemon.fm
Gerard.meijssen
jforrester
krenair
ktc
malafaya
mfwarburg
millosh
mxn
nickanc.wiki
niklas.laxstrom
nsaa.wikipedia
p.selitskas
Platonides
raimond.spekking
roan.kattouw
robinp.1273
runabh
shinjiman
springle
Thehelpfulonewiki
to.aru.shiroi.neko
trijnstel
verdy_p
waldir
wikimedia.bugs