Last modified: 2012-09-07 18:57:47 UTC
Hi, Currently, the page_is_redirect field from the page table is not resolved. This means that in the XML files there is a tag <redirect> present if page_is_redirect is 1 and such a tag is absent if page_is_redirect is 0. It would be tremendously useful to add an attribute to the <redirect> tag that would include the final destination of the redirection. Best, Diederik
Created attachment 8962 [details] This patch resolves the page_is_redirect field in the xml dump file. It will add an attribute name to the <redirect> tag so it should be backwards compatible. A huuuuuuuge thanks to Trevor who basically wrote this patch so all credits go to him if this works and all complaints should go to me in case this is a bad idea.
Patch looks good to me.
Some issues: * no matching update to XML schema version & definition * redirect lookup _may_ not be fully synchronized with the page data snapshot * what does this do in the null case? No field or empty field? If this is an implicit choice, it should probably be made more explicit. * relies on implicit string conversion of Title object which... appears to give text formatting by default. Wouldn't hurt to make this explicit.
Diederik, Trevor, Could you update the patch with respect to Brion's review in comment #3 so that it can be applied?
Created attachment 8973 [details] Updated patch Updated patch; fixes issues as mentioned by Brion.
New patch fixes issues 1,3 and 4. Brion: could you elaborate on issue 2 and possible ways to handle that? I am not sure what to do about it.
And this patch also included a separate namespace tag, sorry for not mentioning.
Added the "patch" and "need-review" keywords; Mark hopes to get someone to review the patch soon.
Diederik, could you re-upload your updated patch and ensure it's in the right format (and that it applies cleanly to trunk as is)? Right now, attachment # 8973 [details] is in a format such that Bugzilla won't show the diff automatically...
Created attachment 9394 [details] Patch adds a new namespace tag and determines redirect target. As requested by Sumana, updated to most recent version of trunk. Patch also includes version increment.
Created attachment 9400 [details] New patch adds export-0.6.xsd and fixes minor != null check Based on IRC feedback, made changes, including adding a new export-0.6.xsd file.
Note for the committer: export-0.6.xsd should be added with export-0.5.xsd as history. Stylistic issue: Add spaces around $redirect !== null You're still using strval($redirect) (Brion point #4) As I said on irc, I'm not convinced about adding the new <ns> tag, as that information is already available.
Created attachment 9401 [details] Updated history section of export-0.5.xsd, export-0.6.xsd and added explicit string conversion to title object.
The old patches should be marked obsolete.
Overall patch seems sane (it will still need and get more code review once committed).
(In reply to comment #13) > Created attachment 9401 [details] > and added explicit string conversion to title object. No. We didn't mean you should change $title->getPrefixedText() to strval( $title->getPrefixedText() ) ), which is redundant. But to use $redirect->getPrefixedText() instead of strval( $redirect ) The if condition could be changed to $redirect instanceOf Title && $redirect->isValidRedirectTarget() to make sure it's appropiate (just copied the codnition from Title.php:396).
(In reply to comment #16) > The if condition could be changed to $redirect instanceOf Title && > $redirect->isValidRedirectTarget() to make sure it's appropiate (just copied > the codnition from Title.php:396). Yep, that would also make using getPrefixedText() OK since getRedirectTarget() can return a string and not a Title.
Created attachment 9414 [details] Fixes issues as suggested by Platonides The previous patch is still needed and is not obsolete.
Marking the namespace id number is possibly not quite sufficient without specifying the prefix up in the namespaces section. Probably enough for search purposes, but consider adding aliases up in the namespaces section in <siteinfo> as well. To resolve the Lucene search indexer's issue with this (bug 32376) which causes the search prefix problem (bug 31697), the Lucene indexer will also need to be updated to *use* the field.
Per IRC disucssion with Ariel a simpler one-off fix for the search stuff is: * per (bug 32376) switch to using the canonical names in XML export (so 'Usuario:' not 'Usuaria:'); the MWSearch prefix search implementation will rewrite the titles to gendered form in the actual search results. Looking over the patch: + Version 0.6 adds a separate namespace tag, locates the + redirect target and strips of the namespace from the title. it doesn't actually appear to be stripping the namespace from the <title>, it's still using getPrefixedText()... and if it did strip it that would be a serious breaking change in the format. Can you clarify what this is meant for?
The current solution (as in trunk) is fine: it leaves the title as is and adds a new namespace tag. This tiny bit of redundancy (having the namespace as an integer in a separate tag and the fully qualified namespace in the title) will make it much more easier to select articles form a specific namespace. In particular, with the added gender specific namespaces it becomes only harder to parse out the namespace that an article belongs to. So the current is that there are no incompatible changes, just for each article an extra namespace tag.