Last modified: 2013-10-10 18:09:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27203, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 25203 - byteoffset of action=parse is broken when manually specifying headers using <h1> syntax


Summary:	byteoffset of action=parse is broken when manually specifying headers using <...

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low minor (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://de.wikipedia.org/w/api.php?act...
Whiteboard:
Keywords:

Duplicates:	43584 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2010-09-17 17:11 UTC by Umherirrender
Modified:	2013-10-10 18:09 UTC (History)
CC List:	8 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Umherirrender 2010-09-17 17:11:06 UTC

The byteoffsets of the page [[de:Wikipedia:Testseite]] are all the same and at the end of the wikipage (see url).

That is useless. Is there a way to get the right byteoffsets?

Thanks.

Comment 1 Bawolff (Brian Wolff) 2010-09-17 23:54:05 UTC

It appears to stop giving correct byte offsets after encountering the first header made using <h1> (or h2, h3, etc) syntax instead of the normal ==header== syntax.

Comment 2 Roan Kattouw 2010-09-18 10:06:56 UTC

Yes, it seems to have trouble with the fact that the page uses an <h2> followed by a =header= .

Comment 3 Sam Reed (reedy) 2011-02-21 23:42:47 UTC

Changing component to Page Rendering/Parsing

The api isn't at fault here, its only displaying what the parser output says there is

Comment 4 Brad Jorsch 2013-01-02 16:31:08 UTC

It's sort of API, in that this feature in the parser seems to have been added solely to support returning this information in the API.

Odd that "byteoffset" is actually the offset in Unicode codepoints.

The problem is that the code pulls out all the <h#> tags from the parsed HTML, but uses the parsed-to-DOM representation of the original wikitext to try to calculate the byteoffset. This parsed-to-DOM representation, however, doesn't include DOM structure for any raw <h#> tags from the original wikitext, so when it tries to find the DOM node for one of those it searches to the end of the wikitext without finding it. Which also screws up all subsequent headers.

Roan, it looks like you added this back in 2009, any ideas here? Otherwise I'll just put together a patch that skips trying to calculate byteoffset when $sectionIndex === false.

Comment 5 Brad Jorsch 2013-01-02 16:31:26 UTC

*** Bug 43584 has been marked as a duplicate of this bug. ***

Comment 6 Gerrit Notification Bot 2013-10-09 15:09:33 UTC

Change 88750 had a related patch set uploaded by Anomie:
Handle raw <h#> when calculating $rawtoc

https://gerrit.wikimedia.org/r/88750

Comment 7 Gerrit Notification Bot 2013-10-10 18:08:27 UTC

Change 88750 merged by jenkins-bot:
Handle raw <h#> when calculating $rawtoc

https://gerrit.wikimedia.org/r/88750

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links