Last modified: 2013-10-10 18:09:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27203, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25203 - byteoffset of action=parse is broken when manually specifying headers using <h1> syntax
byteoffset of action=parse is broken when manually specifying headers using <...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low minor (vote)
: ---
Assigned To: Nobody - You can work on this!
http://de.wikipedia.org/w/api.php?act...
:
: 43584 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-17 17:11 UTC by Umherirrender
Modified: 2013-10-10 18:09 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Umherirrender 2010-09-17 17:11:06 UTC
The byteoffsets of the page [[de:Wikipedia:Testseite]] are all the same and at the end of the wikipage (see url).

That is useless. Is there a way to get the right byteoffsets?

Thanks.
Comment 1 Bawolff (Brian Wolff) 2010-09-17 23:54:05 UTC
It appears to stop giving correct byte offsets after encountering the first header made using <h1> (or h2, h3, etc) syntax instead of the normal ==header== syntax.
Comment 2 Roan Kattouw 2010-09-18 10:06:56 UTC
Yes, it seems to have trouble with the fact that the page uses an <h2> followed by a =header= .
Comment 3 Sam Reed (reedy) 2011-02-21 23:42:47 UTC
Changing component to Page Rendering/Parsing

The api isn't at fault here, its only displaying what the parser output says there is
Comment 4 Brad Jorsch 2013-01-02 16:31:08 UTC
It's sort of API, in that this feature in the parser seems to have been added solely to support returning this information in the API.

Odd that "byteoffset" is actually the offset in Unicode codepoints.

The problem is that the code pulls out all the <h#> tags from the parsed HTML, but uses the parsed-to-DOM representation of the original wikitext to try to calculate the byteoffset. This parsed-to-DOM representation, however, doesn't include DOM structure for any raw <h#> tags from the original wikitext, so when it tries to find the DOM node for one of those it searches to the end of the wikitext without finding it. Which also screws up all subsequent headers.

Roan, it looks like you added this back in 2009, any ideas here? Otherwise I'll just put together a patch that skips trying to calculate byteoffset when $sectionIndex === false.
Comment 5 Brad Jorsch 2013-01-02 16:31:26 UTC
*** Bug 43584 has been marked as a duplicate of this bug. ***
Comment 6 Gerrit Notification Bot 2013-10-09 15:09:33 UTC
Change 88750 had a related patch set uploaded by Anomie:
Handle raw <h#> when calculating $rawtoc

https://gerrit.wikimedia.org/r/88750
Comment 7 Gerrit Notification Bot 2013-10-10 18:08:27 UTC
Change 88750 merged by jenkins-bot:
Handle raw <h#> when calculating $rawtoc

https://gerrit.wikimedia.org/r/88750

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links