Last modified: 2014-07-25 12:19:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32723, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30723 - Import should always use original wiki's namespace names in log entries and trim namespaces it doesn't know in the target title to allow manual choice
Import should always use original wiki's namespace names in log entries and t...
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Export/Import (Other open bugs)
unspecified
All All
: Low enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
https://test.wikipedia.org/w/index.ph...
:
: 5770 (view as bug list)
Depends on: 40192 41969
Blocks: Wikisource
  Show dependency treegraph
 
Reported: 2011-09-03 07:28 UTC by Doug
Modified: 2014-07-25 12:19 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Doug 2011-09-03 07:28:20 UTC
Reference the latin wikisource import log: http://la.wikisource.org/wiki/Specialis:Acta/import, there is a problem with transwiki importing.  Most imports indicate the wrong sending space, using instead the local name for the namespace, so the backlinks don't work.  Example, several imports from fr.ws pagespace, the proper namespace is "page" on fr, but the log recorded "pagina" the latin namespace name.  The same occurred with page, user, and template imports from english wikisource.  HOWEVER, a transwiki from fr.ws template space of the template "{{Page}}" properly recorded the sending location as "fr:Modèle:Page" but placed it in the wrong local page "Formula:Modèle:Page".  It should have moved it from "fr:Modèle:Page" to the local page "Formula:Page".
Comment 1 Doug 2012-04-16 07:22:22 UTC
Increasing importance to "high" this is really a rather serious matter as it means that logs do not give correct information on the source of an imported page.  For text of works this is trivial but for templates, etc. it could have licensing implications.  It also means that transwiki importing is essentially broken as many imports go to the wrong place.
Comment 2 Nemo 2012-08-23 23:11:40 UTC
This is a MediaWiki, not a Wikimedia bug, and part of the general issue of "metadata" (not) generated or considered by import, see also bug 5770.
Comment 3 Doug 2012-11-08 00:47:52 UTC
This problem also exists at mul.ws, it was just discovered because transwiki import didn't work there at all until recently.  See the log at mul: http://wikisource.org/wiki/Special:Log/import
Comment 4 Nemo 2012-11-08 07:11:07 UTC
The import logs on mul.source and la.source show different problems.
The underlying limit is that MediaWiki doesn't load the localised namespace names in all languages and can't possibly know the names of extra namespaces and local namespace aliases (which are wiki-specific configuration), nor it can use namespace IDs to do translations as the same ID can be used for different things (even across different Wikisources).
I've done some tests to (hopefully) show the problem better on test.wiki: https://test.wikipedia.org/w/index.php?title=Special:Log/import&dir=prev&offset=20121009224105&limit=6&type=import&user= ; note that "Hilfe" is defined locally as extra namespace for "Help" in German (separate from local Help:).

I think we can't expect Special:Import to be able to resolve all namespace issues, but it should definitely avoid to create pages like [[Template:Vorlage:SperrSchrift]]  and let the importer fix things.
Comment 5 Nemo 2012-11-08 07:13:06 UTC
*** Bug 5770 has been marked as a duplicate of this bug. ***
Comment 6 Purodha Blissenbach 2012-11-08 08:59:08 UTC
How about taking a stepwise approach like:
1) Show a list of all namespacenames in the import and in the local wiki with an automatically generated mapping suggestion.
2) Allow the importer to adjust the mapping.
3) Do the final import.

The downsides:
A) An uploaded file has to be preserved over some time including possibly multiple data submissions by the importer.
B) The import file has to be read twice. It has to be read and analyzed in its entirity during the 1st scan already since the the list of original namespaces in the beginning does not deal with possible occurrences of namespacenamealiases embedded in page data. Those need to be part of the mapping, however.

The good sides:
- Most flexible.
- Often used mappings can be preserved and automagically be recalled by the import process.
- Step 1) could by the way reveal some statistics to the importer, allowing to not import implausible data.
Comment 7 Nemo 2012-11-08 10:28:36 UTC
Purodha, what you're asking is an entirely new import/convert interface, even more obscure than the current one and to be built from scratch. Please open another bug for that.
Comment 8 Purodha Blissenbach 2012-11-10 15:10:49 UTC
Oh, I did not mean to make this much fuzz :-)
Reported as bug 41969
Comment 9 Doug 2012-11-23 04:19:11 UTC
I'm not really sure that this is an enhancement.  Yes, it technically changes an existing function but the existing function *does not* function, it leaves faulty log entries which arguably violate our license and as I note in comment 1, essentially means that transwiki import doesn't work as designed and frequently puts imports in the wrong namespace or a pseudo namespace.
Comment 10 Andre Klapper 2013-05-17 14:33:18 UTC
If this technically really depends on bug 41969 (which is low priority), this also needs to be low priority.

> Increasing importance to "high" this is really a rather serious matter as it
> means that logs do not give correct information on the source of an imported
> page.

To me this seem to not directly affect urgency, but maybe severity.
Comment 11 Siddhartha Ghai 2013-10-22 14:57:13 UTC
Bug 40192 seems similar to this one.
Comment 12 Nemo 2013-10-22 22:40:12 UTC
(In reply to comment #11)
> Bug 40192 seems similar to this one.

Yes, they could be considered duplicates but the proposed solution in this bug is slightly more general.
Comment 13 Gerrit Notification Bot 2014-07-25 12:19:12 UTC
Change 149293 had a related patch set uploaded by TTO:
Proper namespace handling for WikiImporter

https://gerrit.wikimedia.org/r/149293

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links