Last modified: 2011-11-20 17:30:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13166 - New Hook TitleUserCase for manipulation of names in MW.
New Hook TitleUserCase for manipulation of names in MW.
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 13639
  Show dependency treegraph
 
Reported: 2008-02-26 14:46 UTC by joshua bacher
Modified: 2011-11-20 17:30 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patches the Title class, adds a new hook to manipulate Title dbkeys (522 bytes, patch)
2008-02-26 14:46 UTC, joshua bacher
Details
A second patch (862 bytes, text/plain)
2008-03-04 02:39 UTC, joshua bacher
Details
The last patch was broken. (857 bytes, patch)
2008-03-04 13:59 UTC, joshua bacher
Details

Description joshua bacher 2008-02-26 14:46:02 UTC
Created attachment 4677 [details]
Patches the Title class, adds a new hook to manipulate Title dbkeys

We need for our project (http://bowiki.net) the possibility to use our own naming conventions for MediaWiki (MW) pages. 

We need to identify pages by there names against names that are additionaly stored in another software that is accessed via a special protocol, and we need to be sure, that independent of cases, objects with the same pronouncation may be identified. 

We therefor need to have a standard method to deal with them on both sides (the external program AND the MW). We need to apply a strtolower and a ucfirst on each page name.

The MW has no such functionality implemented currently. That's what i've learnde from discussing this issue on the MW-IRC (also found nothing appropriate on the MW developer documentation and MW homepage itself).

We solved to manipulate names with changing the $dbkey (just in place where ucfirst would be applied) variable in the Title class of the MW-API (see patch). The patch provided adds a Hook to also deal with that.

We would really be happy to see the additional functionality in a future MW version.

Thanks in advanve

Isnogud@#mediawiki aka Joshua Bacher
Comment 1 Roan Kattouw 2008-03-03 16:23:01 UTC
Modified version of patch applied in r31505
Comment 2 Brion Vibber 2008-03-03 19:55:11 UTC
Reverted in r31519
Comment 3 joshua bacher 2008-03-03 20:24:18 UTC
Well. In response to your svn comment: 

It would be sufficient for us, to only access the $dbkey variable, the Instance itself is completely uninteresting. 

I think it's a real nice feature that people might like to use, to have there own control for the names in the wiki. 

As we are using an ontology and a case sensitive reasoner, we absolutely need some kind of control level, how names are treated, to be sure, that the names match.

We use a similar syntax to define relations between pages then the semantic media wiki does. but we want to be sure that if a page was used in a link with a different case typing, that the wiki automatically knows about the object mentioned in the link:

If we have a page called Someidentifier and a user uses something like SomeIdendifier in a relation

[[relation::SomeIdentifier]] 

we want to have the SomeIdentifier to show up as a known page. In combination with a redirect, a user following the link will then be sent to the Someidentifier page.

I think that there is no way, to handle this situation, otherwise.

So, what do you think?

thanks isnogud
Comment 4 joshua bacher 2008-03-04 02:39:05 UTC
Created attachment 4692 [details]
A second patch
Comment 5 joshua bacher 2008-03-04 02:46:13 UTC
Hello again,

i was rethinking about the problem, as it didn't let me sleep. stared a little bit on the code and here is what i found:


1. I would think that adding a big Hook to the beginning to the secureAndSplit function is a good idea, here is why:

The Code in SecureAndSplice is quite important, it checks for some stuff. If we place the manipulation a little bit earlier, the developer would also gain from all the checkings that are going on there for his dbkey.

A good place to set the Hook is - in my opinion - the place after we split NS and dbkey. There we are able to manipualte both. i think this is at line 1840: Namespace is now set dbkey also, ready to be manipulated, lets do it.

As far as i can see, secureAndSplit just does stuff on NS and title, so if a hook is placed there, it should use $this, $dbkey, and $ns at the suggested line. but that slightly causes problems, because there is no recheck if we passed a new NS after this line. so we just have to set it new.


2. I would think that a functionality that manipulates the dbkey conflict with the
wgCapitals, since we choose a manual manipulation with using the hook.

If one decides Manipulation of the Title he should be automatically excluded from using the internal ucfirst. But is free to use it in  there own function.


3. A suggestion for a better name for the Hook!
if we change the namespace OR if we directly check the dbkey, on both actions we manipulate the TitleDBKey so maybe a good name is to call it: AlternativeTitleDBkey - directly access and alter the dbkey and the namespace for the title object.


Well, i created a patch according to my suggestion here. added the following hook, and added a checking for the wgCapitals manipulation.
                                                               
if (if (!array_key_exists('AlternativeTitleDBkey', $wgHooks))) {
          if(wfRunHooks('AlternativeTitleDBKey', array( $this, &$dbkey, &$ns ))== true){
                $this->mNamespace=$ns;
          }else return false;
}

just check the attached patch. Tpatch.p

Thanks in advance
Comment 6 Daniel Friesen 2008-03-04 10:49:28 UTC
I would like to note that a rewrite of the Title class may be done.

It was partially discussed in wikitech-l, mostly between me and Simetrical.

The idea was primarily to introduce a ''real title'' in addition to the current page title which is basically a db key.
This would mean that you could create or move a page as [[_main_Page]], and while it would still be the same as [[Main Page]] and going two the two different pages would lead to the same page. The title would actually show itself as [[_main_Page]] and it would stay that way. In other words, _'s, and various other characters that are normally normalized strictly would actually become valid for use without normalization or need for use of {{DISPLAYTTILE:...}}.

The part relevant to this bug is that in addition to that, the idea was to create a normalization function which would be extensible. In other words, rather than this ugly hack (And yes, this is an ugly hack, worse than DISPLAYTITLE), a extension could easily extend/hook into this normalization function to provide exactly what you are trying to do, but in a robust and clean way.

Additionally, looking over the comments inside the "[Wikitech-l] [MediaWiki-CVS] SVN:  [31519] trunk/phase3" discussion on secureAndSplit I have a feeling that I am likely going to section off secureAndSplit into more intuitive parts of what it actually does (Splitting interwiki, disallowing illegal characters, normalizing case, rip out dangerous or other unicode characters which shouldn't be there, split namespace, pull out dangerous ../ and ./ sequences and disallow tilde(~) sequences, limit title length, etc...) And rather than sticking it all inside of a single function, I'm likely going to do it in a sort of sequence or list of actions to apply to the title with data on how and where to apply it stored in what will probably be an array or list of some sort. That way you can insert extra things to do at any point in the process, and also remove or replace parts of the sequence (Such as the case normalization).
This of course, will completely void out that hook and break anything using it because there will be no sane place for it to exist.
Comment 7 joshua bacher 2008-03-04 13:59:25 UTC
Created attachment 4693 [details]
The last patch was broken.

The former patch was broken:

'if (if (' <- lookst nasty
Comment 8 joshua bacher 2008-03-04 14:43:30 UTC
> Additionally, looking over the comments inside the "[Wikitech-l]
> [MediaWiki-CVS] SVN:  [31519] trunk/phase3" discussion on secureAndSplit I have
> a feeling that I am likely going to section off secureAndSplit into more
> intuitive parts of what it actually does (Splitting interwiki, disallowing
> illegal characters, normalizing case, rip out dangerous or other unicode
> characters which shouldn't be there, split namespace, pull out dangerous ../
> and ./ sequences and disallow tilde(~) sequences, limit title length, etc...)
> And rather than sticking it all inside of a single function, I'm likely going
> to do it in a sort of sequence or list of actions to apply to the title with
> data on how and where to apply it stored in what will probably be an array or
> list of some sort. That way you can insert extra things to do at any point in
> the process, and also remove or replace parts of the sequence (Such as the case
> normalization).

thats a real good idea. as i could read from the code, splitAndSecure is indeed a function that may be split up into different functionality parts. well, the idea with the list of things is indeed a good thing. so what came to my mind was that you introduce smth. like a global array and store the different checks or transformations there. but it will not be sufficient to only apply that on the title, you need that for the namespaces too, right? Maybe a global array is not to good, since one could go there and delete checks that are pretty important. 

The tricky part here, is how to deal with the priority of jobs. Sounds like a good job for a priority queue http://en.wikipedia.org/wiki/Priority_queue.

> This of course, will completely void out that hook and break anything using it
> because there will be no sane place for it to exist.

that would be a real cool thing. i would be happy to give a helping hand on that.

maybe i start implementing a priority queue. what do you devolopers think?

you may also contact me on #mediawiki irc (isnogud) 

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links