Last modified: 2014-08-13 17:21:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71310, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69310 - Splitting should be done after alignment at Special:PageMigration?
Splitting should be done after alignment at Special:PageMigration?
Status: PATCH_TO_REVIEW
Product: MediaWiki extensions
Classification: Unclassified
Translate (Other open bugs)
unspecified
All All
: High major (vote)
: ---
Assigned To: Pratik Lahoti
:
Depends on:
Blocks: 65740
  Show dependency treegraph
 
Reported: 2014-08-08 20:09 UTC by Pratik Lahoti
Modified: 2014-08-13 17:21 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Pratik Lahoti 2014-08-08 20:09:12 UTC
Right now, first the translation units are split on (any) headers and then the alignment is done on h2 level at Special:PageMigration. As aligning has a consequence of 'collapsing' the above sections to perform a match, this defeats the purpose of splitting. So, first alignment should be done on h2 level, and then the splitHeaders() function can be called on the resulting translation units.
Comment 1 Nemo 2014-08-08 23:38:08 UTC
This bug report lacks steps to reproduce and a user story like "As a translation administrator, I want ... so that ...".

Suggested example: https://www.mediawiki.org/w/index.php?title=Manual:LocalSettings.php/ru&oldid=611990 vs. https://www.mediawiki.org/w/index.php?title=Manual:LocalSettings.php&oldid=1095587
Comment 2 Pratik Lahoti 2014-08-09 12:39:56 UTC
To reproduce:

1. Go to Special:PageMigration
2. Enter Manual:LocalSetting.php in title field and ru in language code field.
3. Press the Import button

I. Observed:

1. The translation units are imported
2. Due to h2 alignment, many units (containing h3+ headers) are collapsed into one

II. Expected:

1. The units should be split on headers after the alignment on h2 is done.

This avoids many sections getting collapsed into one if there was no h2 header in them, like for the example above.
Comment 3 Gerrit Notification Bot 2014-08-09 12:46:33 UTC
Change 153064 had a related patch set uploaded by BPositive:
Splitting of units on headers done after h2 alignment at Special:PageMigration

https://gerrit.wikimedia.org/r/153064
Comment 4 Nemo 2014-08-11 17:36:58 UTC
Thanks for comment 2 and the patch, but I'm not going to spend time looking into it until I'm sure we have a shared understanding of the problem.

----

19.22 < Nemo_bis> 19.19 < Nemo_bis> So what would you look for if you needed a second one [page] to test with?
19.22 < Nemo_bis> The steps to reproduce ideally are able to be followed on any wiki by scratch
19.23 < Nemo_bis> A good portion, if not the majority, of the work needed to fix a bug is describing it well
19.23 < Nemo_bis> Isolate the steps to reproduce
19.23 < Nemo_bis> Produce a minimal test case
19.23 < Nemo_bis> Ensure the test case covers the actual issues that originated the report
19.24 < Nemo_bis> That page is huge but to describe the problem we actually only need to look at the TOC

[Only then]

19.23 < Nemo_bis> Devise a solution
19.23 < Nemo_bis> Implement it
19.23 < Nemo_bis> Test it against the test case
19.23 < Nemo_bis> (Build a unit test)

----

So your next task for today is figuring out this bug in a more general way: again, steps to reproduce (and minimal test case), user story.

There is also https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools/Design , which was never really polished. You may need to update this page in order to make it clear to yourself and others what the expected behaviour is and why it failed here.
Comment 5 Pratik Lahoti 2014-08-11 19:03:24 UTC
As a user of Special:PageMigration, a minimal requirement from my side would be that the imported translation units are well separated from each other. And even if there is no 100% correct alignment, it should be easy for me to align them using the add, swap and delete features.

As per https://bugzilla.wikimedia.org/show_bug.cgi?id=66162, the translation units are adjusted by adding empty units or collapsing multiple units into one so that the headers are aligned on h2 level.

For examples like https://www.mediawiki.org/wiki/Manual:LocalSettings.php vs https://www.mediawiki.org/w/index.php?title=Manual:LocalSettings.php/ru&oldid=611990 , in which the number of h2 headers and the h3 headers under them don't match (due to newer sections getting added or change in layout of source page), just aligning on h2 level would collapse all the previous sections into one huge chunk. For the example quoted above, ==Security== would get aligned with ==Настройки БД== as per the h2 alignment, collapsing all the h3 headers (1.1 to 1.13) in a single unit.

As a user, it would be tedious to do the splitting manually and create corresponding units. The Special page would indeed be useless for this particular case. So, we could avail this automation provided by https://gerrit.wikimedia.org/r/#/c/136334/ so that headers and their text are split and new units would be available from the start itself. Given that this Special page goes hand in hand with Special:PagePreparation, and PagePreparation takes care of having headers as separate translation units, this should require less adjustment as it would only be a matter of moving all the units together.

As per the current code, this splitting is done before alignment. But aligning results into collapsing of units for cases like above, which defeats the purpose of splitting itself. For cases in which there are no headers in the collapsed text, doing this before or after has the same effect. But, by doing this later, we cover cases in which there were h3+ headers that got collapsed into one and separate them out.

So, to summarize and reproduce this bug:

1. Go to Special:PageMigration on your wiki
2. Enter the input fields such that the page would have h3+ headers and unequal number of h2 headers which should result into collapsing of units. You can easily find this observing the TOC for both the pages.
3. Try adjusting the units and see how painful it would be if there were many h3+ headers that got collapsed into one.

Expected (or rather desired):

1. The units containing headers should be separated so that adjusting takes less efforts from user's perspective.
Comment 6 Nemo 2014-08-12 17:53:27 UTC
What would happen then if the old version of the translation page had no double newlines between the h2? Does alignment still work?
Comment 7 Pratik Lahoti 2014-08-12 18:44:39 UTC
We had discussed this on IRC (wish I had the logs). You had clearly told me that there would be double newlines atleast between two h2's and such situations are rare. We had discussed this during working on h2 patch when I had misinterpreted it.

However, if we still wish to cover this rare case as well, the splitting function would need to be applied before as well as after the headers are aligned.
Comment 8 Nemo 2014-08-13 17:21:33 UTC
You should wish for up to date docs at https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools/Design , not for chat logs. :-)

It's fine if you've considered this drawback, you just had forgotten to say.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links