Last modified: 2012-09-07 18:57:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32513, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30513 - Redirect tag is not resolved in XML dump file
Redirect tag is not resolved in XML dump file
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Ariel T. Glenn
: patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-22 20:09 UTC by Diederik van Liere
Modified: 2012-09-07 18:57 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
This patch resolves the page_is_redirect field in the xml dump file. It will add an attribute name to the <redirect> tag so it should be backwards compatible. (795 bytes, patch)
2011-08-22 20:40 UTC, Diederik van Liere
Details
Updated patch (1.12 KB, patch)
2011-08-26 00:34 UTC, Diederik van Liere
Details
Patch adds a new namespace tag and determines redirect target. (1.08 KB, patch)
2011-11-09 02:27 UTC, Diederik van Liere
Details
New patch adds export-0.6.xsd and fixes minor != null check (9.03 KB, patch)
2011-11-09 18:44 UTC, Diederik van Liere
Details
Updated history section of export-0.5.xsd, export-0.6.xsd and added explicit string conversion to title object. (9.68 KB, patch)
2011-11-09 19:12 UTC, Diederik van Liere
Details
Fixes issues as suggested by Platonides (1.15 KB, patch)
2011-11-10 18:07 UTC, Diederik van Liere
Details

Description Diederik van Liere 2011-08-22 20:09:28 UTC
Hi,

Currently, the page_is_redirect field from the page table is not resolved. This means that in the XML files there is a tag <redirect> present if page_is_redirect is 1 and such a tag is absent if page_is_redirect is 0. It would be tremendously useful to add an attribute to the <redirect> tag that would include the final destination of the redirection.


Best,

Diederik
Comment 1 Diederik van Liere 2011-08-22 20:40:59 UTC
Created attachment 8962 [details]
This patch resolves the page_is_redirect field in the xml dump file. It will add an attribute name to the <redirect> tag so it should be backwards compatible.

A huuuuuuuge thanks to Trevor who basically wrote this patch so all credits go to him if this works and all complaints should go to me in case this is a bad idea.
Comment 2 Roan Kattouw 2011-08-22 22:02:20 UTC
Patch looks good to me.
Comment 3 Brion Vibber 2011-08-23 18:09:25 UTC
Some issues:

* no matching update to XML schema version & definition
* redirect lookup _may_ not be fully synchronized with the page data snapshot
* what does this do in the null case? No field or empty field? If this is an implicit choice, it should probably be made more explicit.
* relies on implicit string conversion of Title object which... appears to give text formatting by default. Wouldn't hurt to make this explicit.
Comment 4 Mark A. Hershberger 2011-08-25 15:54:23 UTC
Diederik, Trevor,

Could you update the patch with respect to Brion's review in comment #3 so that it can be applied?
Comment 5 Diederik van Liere 2011-08-26 00:34:09 UTC
Created attachment 8973 [details]
Updated patch

Updated patch; fixes issues as mentioned by Brion.
Comment 6 Diederik van Liere 2011-08-26 00:35:03 UTC
New patch fixes issues 1,3 and 4. 
Brion: could you elaborate on issue 2 and possible ways to handle that? I am not sure what to do about it.
Comment 7 Diederik van Liere 2011-08-26 00:41:21 UTC
And this patch also included a separate namespace tag, sorry for not mentioning.
Comment 8 Sumana Harihareswara 2011-09-30 16:00:43 UTC
Added the "patch" and "need-review" keywords; Mark hopes to get someone to review the patch soon.
Comment 9 Sumana Harihareswara 2011-11-02 23:24:49 UTC
Diederik, could you re-upload your updated patch and ensure it's in the right format (and that it applies cleanly to trunk as is)?  Right now, attachment # 8973 [details] is in a format such that Bugzilla won't show the diff automatically...
Comment 10 Diederik van Liere 2011-11-09 02:27:18 UTC
Created attachment 9394 [details]
Patch adds a new namespace tag and determines redirect target.

As requested by Sumana, updated to most recent version of trunk. Patch also includes version increment.
Comment 11 Diederik van Liere 2011-11-09 18:44:09 UTC
Created attachment 9400 [details]
New patch adds export-0.6.xsd and fixes minor != null check

Based on IRC feedback, made changes, including adding a new export-0.6.xsd file.
Comment 12 Platonides 2011-11-09 18:47:28 UTC
Note for the committer: export-0.6.xsd should be added with export-0.5.xsd as history.

Stylistic issue: Add spaces around $redirect !== null
You're still using strval($redirect) (Brion point #4)

As I said on irc, I'm not convinced about adding the new <ns> tag, as that information is already available.
Comment 13 Diederik van Liere 2011-11-09 19:12:04 UTC
Created attachment 9401 [details]
Updated history section of export-0.5.xsd, export-0.6.xsd and added explicit string conversion to title object.
Comment 14 Aaron Schulz 2011-11-09 20:39:29 UTC
The old patches should be marked obsolete.
Comment 15 Aaron Schulz 2011-11-09 20:40:50 UTC
Overall patch seems sane (it will still need and get more code review once committed).
Comment 16 Platonides 2011-11-09 23:07:53 UTC
(In reply to comment #13)
> Created attachment 9401 [details]
> and added explicit string conversion to title object.

No. We didn't mean you should change $title->getPrefixedText() to strval( $title->getPrefixedText() ) ), which is redundant.
But to use $redirect->getPrefixedText() instead of strval( $redirect )

The if condition could be changed to $redirect instanceOf Title && $redirect->isValidRedirectTarget() to make sure it's appropiate (just copied the codnition from Title.php:396).
Comment 17 Aaron Schulz 2011-11-09 23:20:35 UTC
(In reply to comment #16)
> The if condition could be changed to $redirect instanceOf Title &&
> $redirect->isValidRedirectTarget() to make sure it's appropiate (just copied
> the codnition from Title.php:396).

Yep, that would also make using getPrefixedText() OK since getRedirectTarget() can return a string and not a Title.
Comment 18 Diederik van Liere 2011-11-10 18:07:10 UTC
Created attachment 9414 [details]
Fixes issues as suggested by Platonides

The previous patch is still needed and is not obsolete.
Comment 19 Brion Vibber 2011-11-22 19:00:29 UTC
Marking the namespace id number is possibly not quite sufficient without specifying the prefix up in the namespaces section. Probably enough for search purposes, but consider adding aliases up in the namespaces section in <siteinfo> as well.

To resolve the Lucene search indexer's issue with this (bug 32376) which causes the search prefix problem (bug 31697), the Lucene indexer will also need to be updated to *use* the field.
Comment 20 Brion Vibber 2011-11-22 19:21:24 UTC
Per IRC disucssion with Ariel a simpler one-off fix for the search stuff is:
* per (bug 32376) switch to using the canonical names in XML export (so 'Usuario:' not 'Usuaria:'); the MWSearch prefix search implementation will rewrite the titles to gendered form in the actual search results.

Looking over the patch:
+	Version 0.6 adds a separate namespace tag, locates the
+	redirect target and strips of the namespace from the title. 

it doesn't actually appear to be stripping the namespace from the <title>, it's still using getPrefixedText()... and if it did strip it that would be a serious breaking change in the format.

Can you clarify what this is meant for?
Comment 21 Diederik van Liere 2012-01-24 01:22:51 UTC
The current solution (as in trunk) is fine: it leaves the title as is and adds a new namespace tag. This tiny bit of redundancy (having the namespace as an integer in a separate tag and the fully qualified namespace in the title) will make it much more easier to select articles form a specific namespace. In particular, with the added gender specific namespaces it becomes only harder to parse out the namespace that an article belongs to. So the current is that there are no incompatible changes, just for each article an extra namespace tag.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links