Last modified: 2013-08-22 14:54:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 38622 - Merge the Wikidata ContentHandler branch
Merge the Wikidata ContentHandler branch
Status: VERIFIED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.20.x
All All
: High normal (vote)
: ---
Assigned To: Tim Starling
:
Depends on: 39509 40179 40180 40639 40653 40654 40766 40767
Blocks: 17503 36450 40137
  Show dependency treegraph
 
Reported: 2012-07-23 23:18 UTC by Rob Lanphier
Modified: 2013-08-22 14:54 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rob Lanphier 2012-07-23 23:18:06 UTC
A new ContentHandler class as well as other modifications currently reside in the Wikidata branch here:

https://gerrit.wikimedia.org/r/#/q/status:merged+project:mediawiki/core+branch:Wikidata,n,z

This work needs to go through one or more review/iteration cycles, and then merged into master for deployment.  This bug will be used to track the conversation and ownership.  (Step 1: Tim will outline the steps that he believes need to happen prior to merge, which previously may have only been captured in IRC.)
Comment 1 Rob Lanphier (RobLa) 2012-07-29 00:43:43 UTC
From Daniel Kinzler: 
> Things got a bit stuck for a bit over the last 4 weeks, because I was busy
> traveling to Wikimania and having a look at New York. I have picked up work on
> the branch last week, merged in the latest master and did some refactoring that
> Tim requested (moved several functions from ContentHandlers to Content).

> It would be really helpful to get this review done soon, so we can code the
> extension against the actual current master. Also, several other projects, like
> Geodata, could benefit a lot from the ContentHandler stuff.

Full message:
http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/62552
Comment 2 Tim Starling 2012-08-01 13:45:23 UTC
In June, I suggested a couple of changes, which don't seem to have been implemented yet:

* Make $wgContentHandlers purely configuration, store cache separately
* Use strings for the model and format instead of integers. Store as strings in the database except where they are equal to the page default, in which case null should be stored.
Comment 3 Daniel Kinzler 2012-08-01 14:03:57 UTC
(In reply to comment #2)
> In June, I suggested a couple of changes, which don't seem to have been
> implemented yet:

Both have been implemented in June:

commit 906a1ba51f149d659e270597e4b964cc8c550357
Author: daniel <daniel.kinzler@wikimedia.de>
Date:   Mon Jun 25 23:30:51 2012 +0200

    [bug 37746] string ids for content model and format.
    
    The content model is stored as a varbinary(32), the format
    as varbinary(64).
    
    If the standard model resp. format is used, null is written
    to the database instead of the actual id, saving space.
    
    Change-Id: I32659b49a9ad3cb8ecae9019562cff7de42b65f9

commit 38828a93a17e63abb85442c90f04f6cac20034ec
Author: daniel <daniel.kinzler@wikimedia.de>
Date:   Thu Jun 14 12:33:16 2012 +0200

    Use class-static var for storing ContentHandler singletons.
    
    $wgContentHandlers will be used for configuration only, and
    is no longer changed during operation.
    
    It's no longer possible to supply instances directly in $wgContentHandlers,
    it only takes class names now.
    
    Change-Id: Ifafb08d8638eff0d6965cd92fadbd9071c74de37
 

I also moved some functionality from ContentHandler to COntent, as you requested:


commit 8d280dde9b405880d9d4a3a724c6b0872ef03dc1
Author: daniel <daniel.kinzler@wikimedia.de>
Date:   Mon Jul 23 23:52:34 2012 +0200

    Moved getParserOutput to Content interface.
    
    On Tim's request, this change moved getParserOutput() and getSecondaryDataUpdates()
    from the ContentHandler to the Content interface.
    
    Change-Id: Ia654aa8710a242ba5fe7a4eb528e6a6449035f59

commit c8e633f1e38d8fee671fc5453536adccaee616be
Author: daniel <daniel.kinzler@wikimedia.de>
Date:   Mon Jul 23 22:54:25 2012 +0200

    moved getDeletionUpdates to Content interface
    
    Change-Id: I1b46b1d663f8efd609aeb1b63cb07ee1a0a00c33
Comment 4 Tim Starling 2012-08-01 23:43:12 UTC
I can't find any of those revisions in the Gerrit web UI. I can see 38828a9 in the history of the Wikidata branch, but the other two aren't there. What command did you use to push them? Maybe you pushed them directly into refs/heads/Wikidata and then dereferenced them with a rebase.
Comment 5 Daniel Kinzler 2012-08-02 08:22:24 UTC
Bah, I hate gerrit. I did not disconnect or rebase anything. When I do a fresh clone of core and checkout the Wikidata branch, these commit are there. I have no idea why Gerrit does not find them. I can't see them on gitweb either, which really confuses me.

I did push to the branch directly, yes. I was working on it by myself, and when I started, it was an SVN branch. So I just kept working like this. Perhaps it would be better to do all changes via gerrit from now on. 

If you think that would be the way to go, just disable direct push. However, I may end up self-approving edits, because there's really nobody to review this stuff (well, except you I guess).
Comment 6 Antoine "hashar" Musso (WMF) 2012-08-02 08:52:55 UTC
so bugzilla craft an URL to search the commit in Gerrit.  If the commit was pushed directly to the repo (bypassing Gerrit), it does not know anything about the sha1.
Comment 7 Daniel Kinzler 2012-08-02 08:54:33 UTC
(In reply to comment #6)
> so bugzilla craft an URL to search the commit in Gerrit.  If the commit was
> pushed directly to the repo (bypassing Gerrit), it does not know anything about
> the sha1.

Yea, but why doesn't gitweb find it? search doesn't turn it up, and I can't find it in the branche's log either. Confusing.
Comment 8 jeremyb 2012-08-05 05:19:58 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > In June, I suggested a couple of changes, which don't seem to have been
> > implemented yet:
> 
> Both have been implemented in June:
> 
> commit 906a1ba51f149d659e270597e4b964cc8c550357
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Mon Jun 25 23:30:51 2012 +0200
> 
>     [bug 37746] string ids for content model and format.
> 
>     The content model is stored as a varbinary(32), the format
>     as varbinary(64).
> 
>     If the standard model resp. format is used, null is written
>     to the database instead of the actual id, saving space.
> 
>     Change-Id: I32659b49a9ad3cb8ecae9019562cff7de42b65f9

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=906a1ba51f149d659e270597e4b964cc8c550357

> commit 38828a93a17e63abb85442c90f04f6cac20034ec
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Thu Jun 14 12:33:16 2012 +0200
> 
>     Use class-static var for storing ContentHandler singletons.
> 
>     $wgContentHandlers will be used for configuration only, and
>     is no longer changed during operation.
> 
>     It's no longer possible to supply instances directly in $wgContentHandlers,
>     it only takes class names now.
> 
>     Change-Id: Ifafb08d8638eff0d6965cd92fadbd9071c74de37

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=38828a93a17e63abb85442c90f04f6cac20034ec

> I also moved some functionality from ContentHandler to COntent, as you
> requested:
> 
> commit 8d280dde9b405880d9d4a3a724c6b0872ef03dc1
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Mon Jul 23 23:52:34 2012 +0200
> 
>     Moved getParserOutput to Content interface.
> 
>     On Tim's request, this change moved getParserOutput() and
> getSecondaryDataUpdates()
>     from the ContentHandler to the Content interface.
> 
>     Change-Id: Ia654aa8710a242ba5fe7a4eb528e6a6449035f59

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=8d280dde9b405880d9d4a3a724c6b0872ef03dc1


> commit c8e633f1e38d8fee671fc5453536adccaee616be
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Mon Jul 23 22:54:25 2012 +0200
> 
>     moved getDeletionUpdates to Content interface
> 
>     Change-Id: I1b46b1d663f8efd609aeb1b63cb07ee1a0a00c33

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=c8e633f1e38d8fee671fc5453536adccaee616be
Comment 9 jeremyb 2012-08-05 05:24:02 UTC
damnit, bugzilla broke my URLs. either take them out of the email notification or append the commit id onto <https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=>

One last BZ linkification test: <https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=906a1ba51f149d659e270597e4b964cc8c550357>
Comment 10 Rob Lanphier 2012-08-09 16:58:39 UTC
I'm having the same problem as Tim.  I can see:

> commit 906a1ba51f149d659e270597e4b964cc8c550357
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Mon Jun 25 23:30:51 2012 +0200

...and

> commit 38828a93a17e63abb85442c90f04f6cac20034ec
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Thu Jun 14 12:33:16 2012 +0200

...but I can't see:
> commit 8d280dde9b405880d9d4a3a724c6b0872ef03dc1
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Mon Jul 23 23:52:34 2012 +0200

...or:
> commit c8e633f1e38d8fee671fc5453536adccaee616be
> Author: daniel <daniel.kinzler@wikimedia.de>
> Date:   Mon Jul 23 22:54:25 2012 +0200
Comment 11 jeremyb 2012-08-09 17:14:28 UTC
(In reply to comment #10)
> I'm having the same problem as Tim.  I can see:

Did you try any of my links? (from the BZ email notif not the web UI)

I just tried them all and they all worked.
Comment 12 Daniel Kinzler 2012-08-09 19:32:23 UTC
(In reply to comment #10)
> I'm having the same problem as Tim.  I can see:

see *where*? Bugzilla generates links to Gerrit. Gerrit doesn't know these commits because they did not go in via Gerrit. They are in Git though, and in GitWeb (if you can find them - use jeremy's links).
Comment 13 Daniel Kinzler 2012-08-09 19:33:28 UTC
To reiterate: to review this branch, forget about gerrit. just clone core and change check out the Wikidata branch. Then check the log for the changes i mentioned. If they are not there, let me know.
Comment 14 Rob Lanphier 2012-08-09 20:09:38 UTC
Alright, my apologies, now I see them.  For whatever reason, my local copy didn't have the latter two revs, even though I had fetched (or so I thought).  I nuked that repo, checked out a fresh copy, and now I see everything.  I'll pester Tim to look at this.
Comment 15 Tim Starling 2012-08-10 07:54:33 UTC
I could see those revisions once I deleted the local branch and recreated it. Here's a few review comments for you.

There are still a lot of long lines. Pretty much every time there is a comment on the end of a line, it makes the line too long, so you should probably not do that.

What are the changes to DairikiDiff.php for? The code changes don't match the commit message.

In DifferenceEngine.php:

} elseif( Hooks::isRegistered( 'ArticleViewCustom' )
		&& !wfRunHooks( 'ArticleViewCustom', array( ContentHandler::getContentText( $this->mNewContent ), $this->mNewPage, $out ) ) ) { #NOTE: deperecated hook, B/C only

This is a common pattern. I suggest you provide a wrapper function for running legacy hooks so that code like this will be shorter and more legible. Something like:

ContentHandler::runLegacyHook( 'ArticleViewCustom', array( $this->mNewContent, 
$this->mNewPage, $out ) )

with runLegacyHook() something along the lines of

function runLegacyHook( $hookName, $args ) {
    foreach ( $args as $key => $value ) {
        if ( $value instanceof Content ) {
            $args[$key] = ContentHandler::getContentText( $value );
        }
    }
    return wfRunHooks( $hookName, $args );
}

#XXX: generate a warning if $old or $new are not instances of TextContent?
#XXX: fail if $old and $new don't have the same content model? or what?

These are good questions, and there are many more fixme comments like them throughout the Wikidata code. I can't approve it with these kinds of flaws remaining. If nothing will ever call this function with anything other than TextContent, then throwing an exception is the right thing to do. Diffing the serialized representation does not seem right to me, for non-text content.

#XXX: text should be "already segmented". what does that mean?

It means that $wgContLang->segmentForDiff() should have been run on it. The comment is an error introduced by me in r13029, you certainly should *not* segment the input of that function.

In Article.php:

$contentHandler = ContentHandler::getForTitle( $this->getTitle() );
$de = $contentHandler->createDifferenceEngine( $this->getContext(), $oldid, $diff, $rcid, $purge, $unhide );

Surely the content type used to get the diff engine should depend on the content of the revisions involved, not just the title. Same issue throughout commit 88f8ab8e90a77f1d51785ba63f2eac10599c3063.
Comment 16 Daniel Kinzler 2012-08-22 17:02:56 UTC
Thanks for your comments Tim.

I have addressed some of the issues you mentioned in my recent commits to gerrit. I have self-approved some cosmetic changes, others are pending review - could you have a look?

https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+branch:Wikidata+is:open,n,z

In particluar, I have done the following:

* cleaned up long lines
* addressed, removed or reworded many FIXME and XXX comments
* implemented ContentHandler::runLegacyHooks, and made use of it in several places,  cleaning up crufty code.


The changes in DairikiDiff where made in order to make this class more flexible and reusable by by other kinds of content. I still think that would be nice, but it's not needed for Wikibase/Wikidata, since we ended up implementing our own Diff infrastructure. 

So, this is what I'll do:
* revert the changes to DairikiDiff
* make sure the DifferenceEngine is created via the revision's ContentHandler, not the Title's.
Comment 17 Tim Starling 2012-09-27 00:00:19 UTC
The two "ALTER TABLE revision" patches should be merged into a single query, to improve the time taken to update. Same with the archive table. All of the schema changes can be in one patch file, there's no requirement for each field to be in its own file.
Comment 18 denny vrandecic 2012-09-30 10:29:41 UTC
Following Rob's suggestion:

>>> Someone could create a faux commit (with "DO NOT MERGE" on it) which
>>> is not a merge commit, but a fully squashed, rebased single commit.
>>> That would give a target for inline review comments.  I'm not
>>> volunteering to do that myself, but anyone could do that, and drop a
>>> comment on bug 38622.

That faux commit is here:

https://gerrit.wikimedia.org/r/#/c/25736/
Comment 19 Rob Lanphier 2012-10-03 01:08:31 UTC
Hi Denny, thanks for the faux commit!  Is there a merge commit ready to go for this yet, or is there more blocker fixing happening first?
Comment 20 Daniel Kinzler 2012-10-03 16:32:27 UTC
(In reply to comment #19)
> Hi Denny, thanks for the faux commit!  Is there a merge commit ready to go for
> this yet, or is there more blocker fixing happening first?

I think that's a question for Tim, really. I think we are basically good to go - I have not yet resolved everything he mentioned, but I would not consider any of these issues blockers in the hard sense. Anyway, I expect to be able to address most of the stuff by Monday.
Comment 21 Rob Lanphier 2012-10-03 18:06:24 UTC
Hi Daniel, let me ask the question another way:  when are you planning on submitting a merge request?  If *you* believe you've addressed all of the blockers, you should submit the merge request.  If Tim disagrees, he'll put a -2, but don't wait for his (or anyone's) preapproval before submitting the merge request.
Comment 22 Daniel Kinzler 2012-10-03 18:52:10 UTC
Hi Rob. Ok, we'll do that. I was sticking to the original plan of Tim merging the branch bypassing gerrit, but a merge request is probably helpful to get input from others as well. 

Anyway, I expect that we'll do that early next week.
Comment 23 Sam Reed (reedy) 2012-10-10 00:02:29 UTC
Was done earlier today
Comment 24 denny vrandecic 2013-08-22 14:54:12 UTC
Closed older resolved bugs as verified.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links